INTRODUCTION
Machine Learning is a vast field. Entering this field starting from purely academic foundations is daunting - but recommended if you are serious about deep specialization in the field.
You do not have to approach it linearly though and go through years of university-level education before you can start dabbling in it and building realworld solutions that utilize AI/ML.
It is not just a marketing pitch when AWS says they aim to put the power of AI and ML in the hands of everyone. The breadth of offerings is quite complete and caters to builders at any level of prior experience with ML – from cutting edge ML researchers to business users who just want the results.
(Source: AWS)
The AWS Machine Learning Specialty Certification gives you an opportunity to review your overall understanding of this area from a conceptual as well as from a practical perpective – obviously with an emphasis on implementations that utilize AWS services. Having said that, the exam is not limited to testing your AWS knowledge. Majority of the exam actually focuses on general concepts of Machine Learning, Artificial Intelligence, and Deep Learning.
With this in mind, I would try to cover study resources for the Machine Learning as a field in general, as well as specific resources for the AWS certification exam. Depending on which areas you are already comfortable with you can skim over those parts below.
To get familiar with the exam structure, one would do well to start with a review of the official exam outline and sample questions published by AWS here.
- Homepage: https://aws.amazon.com/certification/certified-machine-learning-specialty/
- Exam Guide [PDF]: https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Exam-Guide.pdf
- Sample Questions [PDF]: https://d1.awsstatic.com/training-and-certification/docs-ml/AWS-Certified-Machine-Learning-Specialty_Sample-Questions.pdf
STUDY RESOURCES
For Conceptual and mathematical basis and intuition behind Machine Learning, you cannot go wrong with this (though it relies on slightly dated Octave as a tool):
- Machine Learning on Coursera by Andrew Ng
- https://www.coursera.org/learn/machine-learning
This book by Aurélien Géron is widely considered to be one of the best introductions to the topic with hands-on learning exercises using current tools:
- Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow: Concepts, Tools, and Techniques to Build Intelligent Systems (2nd Edition) by Aurélien Géron
- https://www.amazon.com/Hands-Machine-Learning-Scikit-Learn-TensorFlow/dp/1492032646/
This is recommended for a more thorough undertanding of the linear algebra … among the most clear and intuitive explanations I ever heard.
- 3Blue1Brown - Essence of Linera Algebra
- https://www.youtube.com/playlist?list=PLZHQObOWTQDPD3MizzM2xVFitgF8hE_ab
For an explanation of various Machine Learning algorithms and statistics concepts I recommend the YouTube channel StatQuest with Josh Starmer. He has a quirkly style especially with the intros, but once you get used to his style the explanations and illustrations are very useful.
- StatQuest with Josh Starmer
- https://www.youtube.com/c/joshstarmer/playlists
To thoroughly undestand AWS SageMaker, you would definitely need to do hands-on exercises. Preferably run a few of the example notebooks provided by AWS tat illustrate how SageMaker works. Even if you do not have the time to run every example alogrithm, reading through the provided notebooks will give you a good sense of the workflow, API calls, hyperparameters etc.
You would also benefit from reading good chunks of the SageMaker developer guide.
- https://docs.aws.amazon.com/sagemaker/latest/dg/whatis.html
- https://docs.aws.amazon.com/sagemaker/latest/dg/sagemaker-dg.pdf [PDF]
If you have a Kindle ebook reader handy, you can try getting the kindle copy of the developer guide. Sometimes reading it like a book without the distraction or multiple browser tabs or the urge to immediately jump off and try someting out can help you complete a chapter.
Some other study resources –
- SageMaker Deep Dive by Emily Webber
- SageMaker Hands-on by Julien Simon
See more links in the last section below.
EXAM PREP
Here is a high-level breakup of the topics you will need to review from an exam readiness perspective. I have indicated in parenthesis some services that do not prominently feature in the exam (yet) but are recommended for a broader understanding of current capabilities.
-
Domain 1: Data Engineering
- Types of Data, Data Movement, Data Preparation & Transformation
- Commonly used tools and technologies like Apache Spark, PySpark, EMR, Hadoop, Hive
- AWS services
- Data Storage: S3, (Lake Formation)
- Data Movement: Kinesis, (DMS)
- Meta Data: AWS Glue Crawlers, AWS Glue Data Catalog
- Data Transformation: AWS Glue ETL
- Data Processing: Amazon EMR
- Others: AWS Step Functions, AWS Batch, AWS Data Pipelines
-
Domain 2: Exploratory Data Analysis
- Data Visualization techniques, Data distributions,
- Handling missing data through imputation, Scaling and Normalization, Handling outliers and imbalanced data, Encoding
- Commonly used tools and technologies like Python, Pandas, Numpy, Matplotlib, Seaborn, Jupyter Notebooks
- Feature Engineering, Dimensionality Reduction, PCA, t-SNE
- AWS services
- Athena
- QuickSight
- Labeling using SageMaker Ground Truth
-
Domain 3: Modeling
- Concepts of Machine Learning and Deep Learning
- Train-Validation-Test
- Types of Machine Learning - supervised, unsupervised, reinforcement
- Various commonly used Machine Learning alorithms - from simple Linera Regression, K-NN to Gradient Boosted Trees, CNN & RNNs
- Evaluating Model Accuray; Tuning Models
- Typical model evaluation tehniques and metrics - RMSE, ROC/AUC, Confusion Matrix, Recall / Sensitivity, Specificity, Recall, Accuracy, F1 Score
- SageMaker built-in algorithms
- AI Services offered by AWS
- Translate, Transcribe, Lex, Polly, Rekognition
- Comprehend, Forecast, Textract, Personalize
- A2I
-
Domain 4: Machine Learning Implementation and Operations
- Using Docker
- Using custom training and inference code with Amazon SageMaker
- Use of GPU instances, distributed training
- Spot Instances
- Elastic Inference
- Security, IAM, VPC endpoints
- Encryption at rest and in transit, KMS
- autoscaling, production variants, endpoints
- inference at the edge
The following courses on Udemy were very useful to review both the ML concepts as well as the AWS services that make upthe AI/ML stack.
AWS Certified Machine Learning Specialty 2020 - Hands On!
- Stephane Maarek and Frank Kane
- https://www.udemy.com/course/aws-machine-learning/
Machine Learning, Data Science and Deep Learning with Python
(There is overlap between these courses but the latter covers a few additional Data Engineering and ML/DL concepts that was useful review.)
Look for the frequent deals and discounts on Udemy if upi do not have a subscription through employer. Do not ever pay full price for courses on Udemy.
I read good things about the ACloudGuru course as well by Scott and Brock though I did not have the opportunity or time to check them out. Please check if it suits your learning plan.
Free Exam Readiness Course on AWS Training. This has a good number of sample scenarios and quetsions as well as exam tips.
- Exam Readiness: AWS Certified Machine Learning - Specialty
- https://www.aws.training/Details/eLearning?id=42183
(Note the same Exam Readiness material is presented as a 4 hour session in re:invent 2020 [Link Here]. But I found that this specific presentation of the material was not great. Walkthroughs of sample questions was especially ineffective.)
I also had the opportunity to attend training sessions conducted by a team of experts from AWS covering these topics in a lot of depth, with hands-on labs, spreadh over a three week period. This definitely helped cement many of the concepts relevant for the exam.
PRACTICE EXAMS
Beyond the official sample questions, it is strongly recommended that you take at least one practice exam to validate your readiness for the real exam.
Official Practice Exam is available through AWS CertMetrics site for a fee ($40). But if you have ever taken any earlier AWS exam, you would have a ‘free practice exam’ voucher in the benefits section of your AWS certification profile that you can claim and use while booking the practice exam. This exam only has 20 questions so it will not give you the experience of a full 3 hour exam. Also you will not receive a detailed breakdown of which questions you got wrong and what the correct answers are. So be prepared to make a note of the questions and topics where you wil need further review, as you are taking this test.
Udemy Practice Test - Frank Kane [one warmup test and full test] https://www.udemy.com/course/aws-machine-learning-practice-exam/
Udemy Practice Test - Abhishek Singh [one warmup test and two full tests] https://www.udemy.com/course/aws-certified-machine-learning-specialty-full-practice-exams/
I scored 78%, 67%, and 75% in the three full-time practice tests.
As you can see I didn’t do great on any of the practice exams, but these helped me understand some gaps as well as the level of details I need on some of the topcis.
Weak areas for me fluctuated between Data Engineering, Exploratory Data Analysis and Modeling. So I had to review some concepts and details more or less across all doamins.
Also practice sitting in one place without any food / snack or
THE EXAM
The questions on the exam seem to be evenly split between [1] those that require you to identify one specific ML concept or AWS service feature and correctly pick an option base don the usecase described [2] those that require you to think about combining multiple ML concepts and/or AWS services to come up with a solution to the scenario
Even in the former category, the questions are well beyond the recall level questions that you would typically see in a Associate level exam.
Here is a sampling of the breadth of topics one can expect to see on the exam:
-
General Machine Learning Concepts
- (spanning concepts Data Wrangling, Common methods of Exloratory Data Analysis, Deep Learning, Model Evaluation)
- interpreting a Confusion Matrix
- stratified K-Fold Cross Validation (for imbalanced data)
- Bayes Classifier and Naive Bayes
- Statisticsl Independence and Conditional Independence
- overfitting and underfitting, and the many ways to deal with them
- Plot of Residuals
- ARIMA, DeepAR+
- Multi Layer Perceptrons (MLP)
- Types of layers in Deep Learning
- t-SNE and scatterplot
- Transfer Learning
- Early stopping
- Collaborative Filtering for Recommendation Systems
- appropriate data preprocessing for Word2Vec
- Support Vectorm Machines (SVM) with Linear and Non-Linear basis functions
- appropriate methods for imutation of missing data
- when is it appropriate to drop some features (coorelation and linear dependence)
- scaling, binning, log transform, handling outliers
- MAPE
- oversampling minority class and other ways if handling imbalanced data
- ROC curves
- RNN: Vanishing Gradient Problem, Choosing an appropriate Activation Function, LSTM, ResNet
-
Building ML solutions on AWS
- XGBoost Hyperparameters
- Amazon AI Services
- Forecast and types of input
- Transcribe & Translate languages supported
- Rekognition for image sand video
- Comprehend
- A2I (and how it differs from Ground Truth)
- custom inference code on Docker, packaging them for use with SageMaker
- use of Horovod for distributing Tensorflow
- how to specify GPU for training
- usage of spot instances for GPU in training
- Elastic Inference use cases
- use case for AWS IoT Greengrass (inference at the edge)
- AWS services thcat can be used for moving and transforming data (Kinesis, Glue ETL)
- input data formats for SageMaker built-in algorithms
- handling SageMaker training when training data is too large (local training with smaller data set to validate appropach, pipe mode for full training on SageMaker)
- FindMatches ML Transform in AWS Glue
- Random Cut Forest for anomaly detection in Kinesis Data Analytics
- understanding the limits of Kinesis and determining how many shards to provision based on estimated volume and size of data to be ingested
- Uses cases for Seq2Seq, LDA, NTM, K-Means
- Handling GZIP compression in S3 and Kinesis Data Analytics + Lambda
- Using Kinesis to transfrom data format to Parquet/ORC when writing to S3
- AWS Batch and step Functions
- Usage of Lambda and cloudwatch for event-driven ETL Workflows
- Security
- VPC endpoints
- KMS and server side encryption in S3
- how to use IAM roles and policies, and SageMaker execution role to restrict access
- Restricting internet access to notebooks
- encrypting inter-node communications
I generously flagged questions for review in my first pass which took about 2 hours - that is about 2 minutes per quetsion on an average. I flagged not ony questions that I was unsure of, but also a few of the verbose ones that I wanted to later do a fresh reading to ensure I did not misread any key phrase or modifier. With every review if flagged questions, I removed flags on questions that I corrected my repsonses on or felt confident about. The number of flagged questions went from 35 in the first pass to 20+ in second pass and about 9 in the final third pass.
At this point I was confident that I had enough questions in the bag and proceeded to finish the test.
TEST EXPERIENCE
This was my second “online proctored” test-taking experience with AWS certification exams.
I had read horror stories on forums like reddit – ranging proctors not turning up, technical support not responding, exams stalling in the middle, or exams revoked due to silly reasons.
My first experience with remote testing was a few months ago and I did have some iniial hiccups with the tetsing software weirdly complaining about network connectivity when there was rock solid connection both wired and wirless. The subsequent experience was smooth however.
This time as well i was expecting some hiccups but the experience was smooth from start to finish this time. If PearsonVuew listened to earlier feedback and improved their systems,well done!
The Pearson OnVUE testing software is reasonably well designed though it could be a little better.
The webpage says you can use keyboard shortcuts CTRL + and CTRL - to increase and decrease the text size during the xam. I found that this did NOT work that way.
So please be aware that if you have a very high resolution screen, the text might render a bit small. Thankfully you will notice this when you do a system test so it should not take you by surprise.
I noticed that “pinch-zoom” on your laptop’s trackpad will allow you to zoom in, but it zooms the whole screen (not just the text size) as if it were a photo. So when you zoom in, you no longer see the header and navigation buttons which is weird. I would avoid this unless absolutely necessary.
The test policy says touchscreens are explicity prohibited. So even if your laptop has a touch-screen I would avoiud using it as such, just to be sure.
You are not allowed touse any scratch paper or make notes during the online proctored exam. This is not a serious issue though it would help to make simple notes and work out the match for questions about the confusion matrix and metrics like precision, recall and the F1 score for example.
You can use the “whiteboard” feature within the OnVUEW testing software. Here you can practice beforehand how that tool works – https://home.pearsonvue.com/Standalone-pages/Whiteboard.aspx
I used this to become familiar with the tool. During the exam I had enough time to jot down something like the below to aid me in easily thinking about some of the questions around confusion matrix and the metrics that come out of it.
RESULTS
Exam PASS/FAIL result is shown immediately after you finish the test. The Acclaim badge landed overnight and I was able to download the scorecard from certmetrics site thereafter.
I scored much better in the real exam (955/1000) that in all the practice tests. All the rounds of reviewing and reading up on topics has definitely helped.
The scorecard also has a break down of how many actual questions from each domain were on your test and how you did in each of those domains.
Overall it was a very rewarding journey and i am sure you will enjoy it too.
SOME USEFUL LINKS
AWS Online Proctored Exams on Pearson OnVUE https://home.pearsonvue.com/Clients/Amazon-Web-Services/Online-Proctored.aspx
Julien Simon Model Deploment Scenarios in SageMaker (Production Variants & Multi-Model Endpoints) https://www.youtube.com/watch?v=dT8jmdF-ZWw (Also: https://www.youtube.com/watch?v=Vnkx_hOWLio)
Training with SageMaker on local machine https://www.youtube.com/watch?v=K3ngZKF31mc
Andrew Ng - Learning Curves https://www.youtube.com/watch?v=ISBGFY-gBug
Bagging (Bootstrap aggregating) https://www.youtube.com/watch?v=2Mg8QD0F1dQ
Hyperparameter Tuning with SageMaker https://www.youtube.com/watch?v=ynYnZywayC4
What is a Tensor https://www.youtube.com/watch?v=f5liqUk0ZTw
3Blue1Brown - Intuition behaind “Determinant” of a Matrix https://www.youtube.com/watch?v=Ip3X9LOh2dk
StatQuest -
- t-SNE - https://www.youtube.com/watch?v=NEaUSP4YerM
- SVM - https://www.youtube.com/watch?v=efR1C6CvhmE
- Ridge (L2) and Lasso (L1) Regularization Visualized - https://www.youtube.com/watch?- v=Xm2C_gTAl8c
- Ridge (L2) Regularization - https://www.youtube.com/watch?v=Q81RR3yKn30
- Lasso (L1) Regularization - https://www.youtube.com/watch?v=NGf0voTMlcs
- PCA - https://www.youtube.com/watch?v=FgakZw6K1QQ
- Naive Bayes - https://www.youtube.com/watch?v=O2L2Uv9pdDA
- Decision Trees - https://www.youtube.com/watch?v=7VeUPuFGJHk
- Random Forests - https://www.youtube.com/watch?v=J4Wdy0Wc_xQ
- K-Means Clustering - https://www.youtube.com/watch?v=4b5d3muPQmA
- K-Nearest Neighbors - https://www.youtube.com/watch?v=HVXime0nQeI
3Blue1Brown - Binomial Distribution https://www.youtube.com/watch?v=8idr1WZ1A7Q
StatQuest - ROC and AUC https://www.youtube.com/watch?v=4jRBRDbJemM&t=394s
Amazon SageMaker Built-in Algorithms (Playlist) https://www.youtube.com/playlist?list=PLtgJR0xD2TPf9PAEm7LP82_-Oznho5Ms_
Amazon SageMaker Technical Deep Dive Series (Playlist) https://www.youtube.com/playlist?list=PLhr1KZpdzukcOr_6j_zmSrvYnLUtgqsZz