Quick Notes - DOMAIN 6: AWS Certified AI Practitioner
- Aman Bansal
- Nov 10
- 4 min read
Updated: Nov 12
If you are prepping for the AWS Certified AI Practitioner https://aws.amazon.com/certification/certified-ai-practitioner/, these notes should be enough to get the fundamentals for the exam.
Domain 6: Optimizing Foundation Models
Explore two techniques to improve the performance of a foundation model (FM):
Retrieval Augmented Generation (RAG) - Provide specific knowledge dataset for the tuning of the model.

Fine-Tuning - Although foundation models are highly versatile, they often require fine-tuning to tailor their broad capabilities to specific applications or to enhance their performance in particular domains. Fine-tuning is critical because it helps to do the following:
Increase specificity
Improve accuracy
Reduce biases
Boost efficiency
The different fine-tuning approaches:
Instruction tuning: This approach involves retraining the model on a new dataset that consists of prompts followed by the desired outputs. This is structured in a way that the model learns to follow specific instructions better. This method is particularly useful for improving the model's ability to understand and execute user commands accurately, making it highly effective for interactive applications like virtual assistants and chatbots.
Reinforcement learning from human feedback (RLHF): This approach is a fine-tuning technique where the model is initially trained using supervised learning to predict human-like responses. Then, it is further refined through a reinforcement learning process, where a reward model built from human feedback guides the model toward generating more preferable outputs. This method is effective in aligning the model’s outputs with human values and preferences, thereby increasing its practical utility in sensitive applications.

Adapting models for specific domains: This approach involves fine-tuning the model on a corpus of text or data that is specific to a particular industry or sector. An example of this would be legal documents for a legal AI or medical records for a healthcare AI. This specificity enables the model to perform with a higher degree of relevance and accuracy in domain-specific tasks, providing more useful and context-aware responses.
Transfer learning: This approach is a method where a model developed for one task is reused as the starting point for a model on a second task. For foundational models, this often means taking a model that has been trained on a vast, general dataset, then fine-tuning it on a smaller, specific dataset. This method is highly efficient in using learned features and knowledge from the general training phase and applying them to a narrower scope with less additional training required.
Continuous pretraining: This approach involves extending the training phase of a pre-trained model by continuously feeding it new and emerging data. This approach is used to keep the model updated with the latest information, vocabulary, trends, or research findings, ensuring its outputs remain relevant and accurate over time.
Key steps in fine-tuning data preparation:
The following list walks through the key steps in fine-tuning data preparation:
Data curation: Although it is a continuation, this involves a more rigorous selection process to ensure every piece of data is highly relevant. This step also ensures the data contributes to the model's learning in the specific context.
Labeling: In fine-tuning, the accuracy and relevance of labels are paramount. They guide the model's adjustments to specialize in the target domain.
Governance and compliance: Considering fine-tuning often uses more specialized data, ensuring data governance and compliance with industry-specific regulations is critical.
Representativeness and bias checking: It is essential to ensure that the fine-tuning dataset does not introduce or perpetuate biases that could skew the model's performance in undesirable ways.
Feedback integration: For methods like RLHF, incorporating user or expert feedback directly into the training process is crucial. This is more nuanced and interactive than the initial training phase.
Model Evaluation:
When evaluating the performance of language models, especially those involved in generating or transforming text, specific metrics can be used. These metrics are made to assess the quality of the output, compared to a human-written standard. Three commonly used metrics for this purpose are Recall-Oriented Understudy for Gisting Evaluation (ROUGE), Bilingual Evaluation Understudy (BLEU), and BERTScore.
ROUGE is a set of metrics used to evaluate automatic summarization of texts, in addition to machine translation quality in NLP.
ROUGE is widely used because it is not complex. It is interpretable, and correlates reasonably well with human judgment, especially when evaluating the recall aspect of summaries. The evaluations assess how much of the important information in the source texts is captured by the generated summaries.
BLEU is a metric used to evaluate the quality of text that has been machine-translated from one natural language to another. Quality is calculated by comparing the machine-generated text to one or more high-quality human translations.
Unlike ROUGE, which focuses on recall, BLEU is fundamentally a precision metric.
BERTScore uses the pretrained contextual embeddings from models like BERT to evaluate the quality of text-generation tasks. BERTScore computes the cosine similarity between the contextual embeddings of words in the candidate and the reference texts. This is unlike traditional metrics that rely on exact matches of N-grams or words.
Appropriate evaluation metric that can capture the semantic similarity between the model's output and human-generated reference texts.
Reference: AWS SkillBuilder https://skillbuilder.aws/learn