How long is the DP-100 exam?

The DP-100 exam is approximately 120 minutes long with 40-60 questions covering machine learning engineering on Azure.

What skills does the DP-100 exam test?

The DP-100 exam tests skills in designing ML solutions, building and training models, deploying and operationalizing ML workloads, and implementing responsible AI practices on Azure.

DP-100 Azure Data Scientist Associate Exam Guide

By Macdara Ó Murchú · Founder, AzurePrep·Last reviewed 6 April 2026·12 min read·2,528 words

The DP-100 certification validates your ability to design, build, and deploy machine learning solutions on Microsoft Azure. This is the longest Azure exam at approximately 120 minutes and tests advanced technical skills in data science and machine learning engineering. If you're preparing for this certification, understanding the exam structure, domains, and required services is critical to passing successfully.

What DP-100 Tests

The DP-100 exam evaluates your competency in implementing end-to-end machine learning workflows on Azure. This goes beyond theoretical knowledge and requires hands-on experience with Azure Machine Learning (AML) workspace, model training, pipeline orchestration, and production deployment.

The exam focuses on practical scenarios where you must:

Design machine learning solutions that meet specific business requirements
Prepare and explore datasets using Azure tools and Python
Train and validate models with appropriate algorithms
Prepare models for production deployment
Implement retraining pipelines and monitoring strategies

This DP-100 study guide emphasizes that the certification is not about basic cloud knowledge. Instead, it tests deep understanding of machine learning workflows, Azure-specific ML tools, and enterprise considerations like scalability, security, and model governance.

5Exam domainsData science skill areas

$165Exam cost (USD)Associate level

60%ML engineering focusvs pure data science tasks

Who Should Take DP-100

The DP-100 certification is designed for professionals actively working with machine learning on Azure:

Data Scientists building predictive models and implementing data science solutions
Machine Learning Engineers creating production-grade ML systems
AI Practitioners designing end-to-end AI solutions
Cloud Architects specializing in data and ML workloads
Analytics Engineers transitioning into ML engineering roles

Prerequisites include practical experience with Python, familiarity with machine learning concepts (supervised/unsupervised learning, model evaluation), and basic Azure knowledge. If you're new to Azure, completing AZ-900 (Azure Fundamentals) beforehand helps contextualize cloud concepts.

Exam Format and Scoring

The DP-100 exam is administered through Pearson VUE testing centers or online proctoring. Here are the key details:

Duration: 120 minutes (the longest Azure exam)
Question Count: Approximately 40-60 questions
Question Types: Multiple choice, multiple select, case studies, and drag-and-drop scenarios
Passing Score: 700 out of 1000 points
Certification Level: Associate-level (requires no prior Azure certifications)
Cost: Typically $165 USD (varies by region)

The exam includes case study questions where you read a business scenario and answer multiple questions based on that context. These require careful reading and understanding of requirements, constraints, and technical trade-offs.

Data / AI Path

DP-900Data FundamentalsFUND

DP-100Data Scientist AssocASSOC

DP-300Database AdministratorASSOC

AI Engineering Path

AI-900AI FundamentalsFUND

AI-102AI Engineer AssociateASSOC

DP-100Data Scientist AssocASSOC

DP-100 Exam Domains and Weighting

Understanding the exam domains is essential for focused preparation. Here's the breakdown:

Domain	Weight	Key Focus Areas
Design and prepare a machine learning solution	20-25%	Solution design, data collection, Azure ML workspace setup
Explore data and train models	35-40%	EDA, feature engineering, model selection, hyperparameter tuning
Prepare a model for deployment	20-25%	Model evaluation, registration, packaging, containerization
Deploy and retrain a model	10-15%	Deployment targets, endpoints, monitoring, retraining pipelines

The "Explore data and train models" domain carries the heaviest weight, so allocate significant study time to data preprocessing, feature engineering, and model training techniques.

Domain 1: Design and Prepare a Machine Learning Solution (20-25%)

This domain tests your ability to approach machine learning problems systematically.

Solution Design

You must understand how to translate business requirements into ML solutions. Key concepts include:

Problem Framing: Distinguishing between classification, regression, clustering, and time-series forecasting problems
Data Requirements: Identifying what data you need, data quality standards, and collection strategies
Feasibility Assessment: Evaluating whether a problem is solvable with available data and resources
Success Metrics: Defining appropriate evaluation metrics aligned with business goals

Azure ML Workspace Setup

The Azure ML workspace is your central hub for machine learning projects:

Creating and configuring workspaces
Understanding workspace components (datastores, compute resources, experiments)
Configuring authentication and role-based access control (RBAC)
Linking Azure services like Azure Storage and Key Vault

Questions in this domain often ask you to choose the right workspace configuration for specific scenarios or troubleshoot connection issues.

Data Collection and Preparation Strategy

Before building models, you must establish data pipelines:

Connecting to data sources (Azure Blob Storage, Data Lake, SQL Database, Spark)
Defining data schemas and validation rules
Planning for incremental data ingestion
Handling sensitive data with encryption and PII protection

Domain 2: Explore Data and Train Models (35-40%)

This is the largest exam domain and tests your practical ML skills extensively.

Exploratory Data Analysis (EDA)

EDA is foundational to effective modeling. You should know how to:

Load and inspect data using pandas DataFrames
Identify missing values and appropriate imputation strategies
Detect and handle outliers using statistical methods
Analyze feature distributions and correlations
Visualize data patterns (histograms, scatter plots, correlation matrices)

The DP-100 study guide emphasizes that EDA findings should drive your feature engineering decisions. Skipping thorough EDA often leads to poor model performance.

Feature Engineering and Preprocessing

Feature engineering significantly impacts model performance:

Scaling and Normalization: Using StandardScaler or MinMaxScaler for algorithms sensitive to feature magnitude
Encoding Categorical Variables: One-hot encoding, label encoding, or target encoding depending on cardinality and algorithm choice
Creating Derived Features: Polynomial features, interaction terms, and domain-specific transformations
Handling Imbalanced Data: Resampling techniques (oversampling, undersampling), class weights, and appropriate evaluation metrics
Missing Data Strategies: Mean/median imputation, forward-fill for time series, or dropping columns with excessive missingness

Model Selection and Training

You must understand when to use different algorithms:

Regression: Linear regression, Ridge/Lasso for regularization, gradient boosting (XGBoost, LightGBM)
Classification: Logistic regression, decision trees, random forests, SVM, neural networks
Clustering: K-means, hierarchical clustering, DBSCAN
Time Series: ARIMA, Prophet, exponential smoothing

Hyperparameter Tuning

Fine-tuning model parameters is critical:

Grid Search: Exhaustive search over specified parameter ranges
Random Search: Sampling random parameter combinations
Bayesian Optimization: Using probabilistic models to guide search efficiently
Cross-Validation: K-fold cross-validation to ensure robust parameter selection

Azure ML's HyperDrive service automates hyperparameter tuning with various sampling strategies and early termination policies.

Automated Machine Learning (AutoML)

AutoML handles algorithm selection and hyperparameter tuning automatically:

Training multiple algorithms in parallel
Performing feature engineering automatically
Comparing model performance on validation sets
Handling classification, regression, time-series forecasting, and NLP tasks

Understanding when AutoML is appropriate versus when you need manual control is important. AutoML excels for baseline models and standard problems but may require customization for complex scenarios.

Azure ML Training Components

You must know how to:

Create training scripts using ScriptRunConfig
Use the Estimator API for simplified training job submission
Monitor training runs in the Azure ML workspace
Log metrics, parameters, and artifacts to track experiments
Use MLflow for experiment tracking and model management

Domain 3: Prepare a Model for Deployment (20-25%)

This domain covers model evaluation, registration, and packaging for production use.

Model Evaluation and Validation

Before deploying, rigorously validate your model:

Classification Metrics: Accuracy, precision, recall, F1-score, AUC-ROC, confusion matrices
Regression Metrics: Mean squared error (MSE), root mean squared error (RMSE), mean absolute error (MAE), R-squared
Validation Strategy: Train/test splits, cross-validation, stratified sampling for imbalanced data
Business Metrics: Translating technical metrics to business impact and ROI

Model Registration and Versioning

Azure ML provides model registry capabilities:

Registering trained models in the workspace
Managing multiple model versions
Adding metadata, tags, and properties for searchability
Tracking model lineage (which training run, parameters, data)

The model registry enables version control and rollback if deployed models perform poorly in production.

Model Packaging and Containerization

Preparing models for deployment requires:

Creating entry scripts (score.py) that define scoring logic
Specifying dependencies in conda environments or pip requirements files
Building container images with model artifacts and runtime dependencies
Testing containerized models locally before deployment

MLflow Integration

MLflow is increasingly important in Azure ML:

Logging models in MLflow format
Registering MLflow models in Azure ML
Using MLflow's model serving capabilities
Tracking experiments with MLflow APIs

Understanding MLflow enhances reproducibility and interoperability across different platforms.

Domain 4: Deploy and Retrain a Model (10-15%)

This smaller domain covers production deployment and ongoing model maintenance.

Deployment Targets

You must know when to use different deployment options:

Azure Container Instances (ACI): Development and testing, low-traffic endpoints, serverless containers
Azure Kubernetes Service (AKS): High-traffic production, autoscaling, GPU inference, enterprise requirements
Azure App Service: Web apps and APIs requiring custom frameworks
Batch Endpoints: Offline scoring for large datasets
Managed Online Endpoints: Azure's newer managed inference service with built-in autoscaling

Inference Optimization

Deploying models efficiently requires:

Model optimization techniques (quantization, pruning) to reduce latency
Batch inference for high-volume scoring scenarios
Caching predictions when appropriate
Load testing before production deployment

Monitoring and Logging

Production models require continuous monitoring:

Application Insights integration for performance tracking
Custom logging in scoring scripts
Tracking prediction latency, error rates, and throughput
Alerting when metrics exceed thresholds

Retraining Pipelines

Models degrade over time due to data drift. You must implement retraining:

Scheduling periodic retraining jobs using Azure ML pipelines
Detecting data drift to trigger retraining automatically
A/B testing new model versions before full rollout
Implementing canary deployments for gradual transitions

Key Azure Services for DP-100

Azure Machine Learning Workspace

The central resource for all ML operations:

Compute resources (training clusters, inference clusters)
Datastore connections for data access
Experiment tracking and run history
Model and environment registry
Pipeline orchestration

Azure ML Pipelines

Orchestrating multi-step workflows:

Creating reproducible ML workflows
Parameterizing pipelines for different scenarios
Publishing pipelines as REST endpoints for automated triggering
Conditional execution and parallel steps

Compute Resources

Different compute options for different workloads:

Compute Clusters: Scalable training on VMs with auto-scaling
Compute Instances: Single-user development environments with Jupyter
Attached Compute: Using existing Kubernetes clusters or Spark pools
Serverless Compute: Running training jobs without managing infrastructure

Designer (Low-Code ML)

A visual tool for building ML pipelines:

Drag-and-drop interface for creating workflows
Pre-built modules for common ML tasks
Useful for learning and prototyping
Limited compared to code-first approaches for complex scenarios

Python Libraries and Tools

scikit-learn

Essential for classical ML:

Preprocessing: StandardScaler, OneHotEncoder, PolynomialFeatures
Model selection: train_test_split, cross_val_score, GridSearchCV
Algorithms: LogisticRegression, RandomForestClassifier, SVC
Metrics: classification_report, confusion_matrix, roc_auc_score

pandas

Data manipulation and analysis:

Loading CSV, parquet, and other data formats
DataFrame operations: filtering, grouping, aggregation
Handling missing values: fillna, dropna
Feature creation and transformation

PyTorch and TensorFlow (Basics)

Deep learning frameworks:

Understanding neural network architectures
Training basic models with standard frameworks
Transfer learning with pre-trained models
Not as heavily tested as classical ML, but important for advanced scenarios

NumPy

Numerical computing:

Array operations and linear algebra
Random number generation
Efficient numerical computations underlying other libraries

Study Plan for DP-100

A structured approach improves preparation efficiency.

8-12 Week Study Schedule

Weeks 1-2: Foundations
- Complete Azure fundamentals knowledge (AZ-900 level)
- Review ML concepts: supervised/unsupervised learning, validation strategies
- Set up Azure account and explore Azure ML workspace UI

Weeks 3-4: Azure ML Core Concepts
- Create Azure ML workspace and understand components
- Complete Microsoft Learn modules on Azure ML
- Practice using compute instances and submitting training jobs

Weeks 5-6: Data Exploration and Preprocessing
- Work with real datasets using pandas and NumPy
- Practice EDA techniques and visualization
- Implement feature engineering pipelines
- Use Azure ML datastore and dataset features

Weeks 7-8: Model Training
- Build and train models with scikit-learn
- Implement hyperparameter tuning with HyperDrive
- Experiment with AutoML for different problem types
- Practice logging metrics and artifacts

Weeks 9-10: Model Evaluation and Deployment
- Develop comprehensive evaluation strategies
- Register models in Azure ML registry
- Create entry scripts and conda environments
- Deploy to ACI and AKS
- Test deployed endpoints

Weeks 11-12: Advanced Topics and Practice
- Design and implement ML pipelines
- Create retraining workflows
- Study case studies from exam dumps
- Take full-length practice tests

Hands-On Experience is Non-Negotiable

Theory alone won't pass DP-100. You must:

Build an end-to-end ML project in Azure ML workspace
Train multiple models and compare performance
Deploy a model to a managed endpoint
Implement a retraining pipeline
Work with real data and handle practical challenges

Setting up your own Azure ML workspace costs minimal money with free tier benefits. Practice on live Azure resources, not just simulators.

Study Resources and Practice Tests

Microsoft Official Resources

Microsoft Learn modules on Azure ML (free)
Azure ML documentation with code samples
Microsoft Azure certifications page with exam updates
Official study guides from Microsoft

Practice Tests

The DP-100 study guide approach emphasizes that practice tests reveal knowledge gaps:

Take practice tests throughout your study period, not just before the exam
azureprep.com offers free Azure practice questions across 40+ certifications, including comprehensive DP-100 practice tests
Use practice tests to identify weak domains and adjust your study focus
Aim for 85%+ on practice tests before scheduling the real exam

Community Resources

Azure ML blog posts from Microsoft engineers
Kaggle competitions for practical ML experience
GitHub repositories with Azure ML examples
Reddit communities like r/learnprogramming and r/Azure

Common Exam Pitfalls to Avoid

Not Prioritizing Hands-On Work
Many candidates study theory but struggle with practical questions. Spend 50% of your preparation time in the Azure ML workspace actually building solutions.

Ignoring Feature Engineering
The largest exam domain heavily emphasizes data preparation. Weak feature engineering knowledge will cost you points.

Misunderstanding Deployment Options
Know the differences between ACI, AKS, and managed endpoints. Questions often ask which is appropriate for specific scenarios.

Overlooking Retraining Strategies
Model maintenance in production is critical. Understand data drift detection and automated retraining approaches.

Rushing Through Case Studies
Case study questions require careful reading. Identify constraints and requirements before selecting answers.

Scheduling Your Exam

Book your exam strategically:

Schedule 2-3 weeks after achieving 85%+ on practice tests
Allow buffer time for review if you're not reaching target scores
Consider exam center location and availability
Reschedule if you're not ready rather than failing

Retakes are allowed, but passing on the first attempt demonstrates true competency.

Final Preparation Week

In your final week before the exam:

Review weak domains identified by practice tests
Do light review of Azure services (don't introduce new concepts)
Get adequate sleep starting 3 days before the exam
Avoid cramming which increases errors
Familiarize yourself with the testing center or proctoring software

The DP-100 study guide ultimately tests your ability to design, build, and deploy real machine learning solutions on Azure. Success requires combining theoretical knowledge with extensive hands-on experience. Use azureprep.com practice tests throughout your preparation to identify gaps, focus your studying, and build confidence before exam day.