I recently took the AWS Certified Machine Learning – Specialty and wanted to share my preparation with anyone planning to certify. In my opinion, this is the second most difficult AWS exam with the most challenging being the AWS Solution Architect Professional exam.

AWS Machine Learning – Specialty Exam Details:

There are 65 questions on the AWS Machine Learning – Specialty exam and you are expected to complete within 3 hours.

The exam is quite unique in the sense that it is the only AWS exam that has non-related AWS questions. There are mainly three kinds of questions on the exam: general ML questions, questions on SageMaker and questions on other AWS services.

You need hands-on ML experience as well as knowledge of Amazon SageMaker and AWS ML services to pass the exam. Having data analytics experience is a plus.

According to the exam guide for AWS Machine Learning Specialty, the candidate should have experience developing, architecting, or running ML/deep learning workloads on the AWS Cloud, along with:

  • The ability to express the intuition behind basic ML algorithms
  •  Experience performing basic hyperparameter optimization
  • Experience with ML and deep learning frameworks
  • The ability to follow model-training best practices
  • The ability to follow deployment and operational best practices

The exam is made up of 4 domains. These are Data engineering, Exploratory Data Analysis, Modeling and Machine Learning Implementation and Operations.

Data Engineering – 20%

The Data Engineering domain deals with data lakes, ingesting and transforming data. Services that are tested in this domain include the Kinesis family of services, S3, Database Migration Service, IoT, EMR (Spark), Glue, Athena, Step Functions and AWS Batch.

Consider the below topics for this domain:

  • Storage options for data
  • Use cases for Kinesis Video Streams, Data Streams, Data Analytics and Kinesis Firehose and how they integrate with each other and other AWS services
  • Batch and stream processing
  • AWS services for processing and transforming data
  • Services for orchestrating data processing
  • Partitioning and data formats

Exploratory Data Analysis – 24%

This domain focuses on cleaning data, preparing and visualizing data. Services in this domain include Glue, EMR, QuickSight, SageMaker Ground Truth and Mechanical Turk,

Consider the below topics for this domain:

  • Techniques for dealing with missing values and imputing them
  • Techniques for numerical, text and image feature engineering
  • Dataset formats and modes supported by different algorithms
  • Knowledge of tools for data preparation
  • Probability distribution and their application
  • Knowledge of scaling, normalizing, binning and transforming features
  • One hot-encoding and other encoding types
  • Types of visualizations for analyzing data to make informed decisions
  • How to perform feature selection
  • Options available for labeling data
  • Dealing with Outliers and unbalanced data

Modelling – 36%

This domain has the most questions on the exam as well as some general ML concepts. It deals with identifying ML solutions for business problems, training models, hyperparameter optimization and evaluating machine learning models.

Consider the below topics for this domain:

  • Identify use cases that are appropriate for ML and those that are not
  • Know the difference between Machine Learning and Deep Learning
  • Knowledge of the various types of ML and DL
  • Machine Learning Frameworks and algorithms
  • Automatic Model Tuning in SageMaker
  • Built-in SageMaker Algorithms and their use cases
  • Dataset formats supported by algorithms
  • L1 and L2 Regularization
  • Confusion Matrix,Recall,Precision,F1 score, ROC and AUC
  • Knowledge of SageMaker architecture and integrations
  • Hyperparameters and objective metrics for SageMaker algorithms
  • Select appropriate model for a given use case
  • How to use SageMaker to build and train ML models
  • How to use your own model for training and inference
  • Instance types for training and inference
  • Docker folder structure required for SageMaker

Machine Learning Implementation & Operations – 20%

The final domain tests the candidate on deploying models and identifying AWS AI services for business use cases. It also covers monitoring and security of ML solutions.

Consider the below topics for this domain:

  •  AWS AI services including Rekognition, Textract, Translate, Polly, Lex etc.
  • How to secure your notebook instances
  • Deploying machine learning models and solutions
  • Implement monitoring for your ML solutions
  • Types of SageMaker endpoints
  • Performing inference at the edge(Neo and Greengrass)
  • Knowledge of inference endpoints and production variants
  • SageMaker instance types and managed spot training


I have tried to list as many topics as possible but this exam is non-exhausted. I suggest you access your skills and spend more time on areas you identify as your weakness.

My target score for this exam was 950 but I scored 881. Most importantly, I passed.

Resources Used

Preparation Courses:

  1. Linux Academy
  2. Frank Kane & Stephane Maarek – Udemy
  3. ACloudGuru

Practice Test:

  1. Whizlabs

AWS Training & Certification Digital courses

  1. The Elements of Data Science
  2. Exam Readiness: AWS Certified Machine Learning – Specialty 
  3. Developing Machine Learning Applications 
  4. Process Model: CRISP-DM on the AWS Stack 
  5. Speaking Of: Machine Translation and Natural Language Processing (NLP) 
  6. Build a Text Classification Model with AWS Glue and Amazon SageMaker
  7. Deep Dive on Amazon Rekognition: Building Computer Visions Based Smart Applications
  8. Machine Learning Terminology and Process

AWS Whitepapers

  1. Deep Learning on AWS
  2. Power Machine Learning at scale – Mapping Parallelized Modeling-to-HPC Infrastructure on AWS

Other Resources

  1. Evaluating Machine Learning Models by Alice Zheng
  2. Towards Data Science – Quick Start to Multi GPU Deep Learning on AWS SageMaker Using TF Distribute
  3. Towards Data Science – Various Ways To Evaluate a Machine Learning Model’s Performance
  4. Towards Data Science – Brewing up custom ML models on AWS SageMaker

Have questions? Participate in AWS discussions on our Forums. Click here.

Author: Emmanuel Koomson is a Product Owner, Solution Architect, 13X AWS Certified and a life long learner. You can connect with him on LinkedIn.

More from the author:

  1. How to Prepare for AWS Certified Database Specialty
  2. How to Prepare for AWS Certified Data Analytics Specialty