AI Study Guide
AI Study Guide
1
Describe Artificial Intelligence workloads and considerations
(15–20%)
Identify Features of Common AI Workloads
AI workloads refer to the different types of tasks or problems AI can help solve. The most common AI workloads
include:
2
• Code generation • Code completion tools
• Multimodal capabilities • Conversational AI systems
• Fine-tuning/prompt engineering • Design tools for creative professionals
Each of these workload types requires different infrastructure considerations, specialized model architectures, and
varied approaches to data processing and optimization.
-Can generate text, images, audio, video, code, -Image Generation: Creating realistic or
and more. artistic images from text descriptions.
3
Identify Computer Vision Workloads
Computer Vision enables computers to understand and interpret visual information. Some common workloads
include:
4
Feature Description Examples Underlying Key Challenges
Techniques/Models
Input Data Images (static, single Still photographs, Convolutional Neural Handling
frames), videos surveillance footage, Networks (CNNs), variations in
(sequences of frames), medical scans, satellite Recurrent Neural lighting, pose,
real-time video streams. imagery, output from Networks (RNNs) for occlusion,
cameras on robots or video, Transformers viewpoint, scale,
drones. and image quality.
Core Tasks Object detection Autonomous driving CNN architectures Real-time
(locating and classifying (detecting pedestrians, (e.g., ResNet, YOLO, processing
objects), image traffic signs), medical Faster R-CNN, Mask requirements,
classification (assigning diagnosis (identifying R-CNN), Vision need for large
a label to an entire tumors), security Transformers (ViT) labeled datasets,
image), image surveillance robustness to
segmentation (pixel- (identifying intruders), adversarial
level classification), robotic manipulation attacks.
facial recognition, pose (object grasping),
estimation. augmented reality.
Output Bounding boxes around Coordinates and labels Probability scores, Ensuring the
detected objects, class of detected cars, a coordinate values, accuracy and
labels for images or label indicating "cat," a pixel assignments. reliability of the
objects, pixel-wise highlighted region of a output in critical
segmentation masks, tumor in a scan, the applications.
facial identities, key name of a recognized
points representing person, joint angles of
body or object pose. a human body.
Key Data augmentation Using techniques like Optimization for Ethical
Consideratio (creating variations of rotation, scaling, and speed and efficiency, considerations
ns existing data), transfer cropping to increase model interpretability related to privacy
learning (leveraging the size and diversity in some applications. and bias in facial
pre-trained models), of training data; fine- recognition.
computational tuning models trained
resources (GPU on large image
acceleration), real-time datasets.
processing needs.
5
Identify Natural Language Processing (NLP) Workloads
NLP enables machines to understand, interpret, and generate human language.
6
• Advanced information retrieval
7
Identify Document Processing Workloads
Document processing uses AI to extract data from structured or unstructured documents.
Data Extraction Pulling relevant data from structured Extracting invoice totals
documents
Table Extraction Extracting and structuring tabular data from Pulling rows of data from contracts
documents
Validation & Ensuring documents meet specified criteria Checking regulatory compliance in
Compliance filings
8
Feature Description Examples Underlying Key Challenges
Techniques/Models
Input Data Documents in various Invoices, contracts, legal Optical Character Handling variations
formats: PDFs, scanned documents, financial Recognition (OCR), in document
documents (images), reports, medical records, Computer Vision for layout, image
Word documents, application forms. layout analysis, Natural quality in scanned
spreadsheets. Often Language Processing documents, and
involves a combination for text understanding. the presence of
of text and visual tables and forms.
elements (tables,
forms).
Core Tasks Information extraction Extracting amounts and CNNs for visual feature Integrating OCR
(identifying and dates from invoices, extraction, RNNs and with NLP
extracting specific data classifying documents as Transformers for effectively,
fields), document "invoice" or "receipt," sequence processing, maintaining
classification automatically filling specialized models for accuracy despite
(categorizing fields in a database from table and form document
documents), data entry a form, extracting data understanding. variations,
automation, table from tables in a report, handling complex
extraction, form identifying required document
understanding, signatures in a legal structures.
compliance checking. document.
Output Structured data (key- A JSON object containing Structured data Ensuring data
value pairs, tables), "Invoice Number": "INV- formats (JSON, CSV), accuracy and
document categories, 123", "Total Amount": database entries, completeness,
automated data "$100.00"; a label "Legal annotations. dealing with noise
entries, extracted Contract"; data inserted and errors
tables, filled forms, into database fields; a introduced by OCR.
flags indicating structured
compliance or non- representation of a
compliance. table; a completed
digital form.
Key OCR accuracy, layout Using Tesseract or Robustness to different Automating
Consideratio analysis to understand Google Cloud Vision API document formats and complex workflows
ns document structure, for OCR, employing quality, maintaining involving multiple
handling both textual computer vision models data privacy and document types.
and visual information, to identify tables and security.
rule-based systems sections, using regular
combined with machine expressions and NLP
learning models. models for information
extraction.
9
Identify Features of Generative AI Workloads
Generative AI refers to AI that can create new content (text, code, images, audio, video).
10
Feature Description Examples Underlying Key Challenges
Techniques/Models
Input Data Large datasets of the A large corpus of text for Generative Adversarial Ensuring the
type of content to be language models, a Networks (GANs), quality, coherence,
generated (text, dataset of images for Variational and relevance of
images, audio, etc.). image generation, audio Autoencoders (VAEs), the generated
Can also take prompts recordings for music Transformer models (for content.
or conditions to guide generation, source code text, image, and audio
the generation process. for code generation. Text generation), Diffusion
prompts like "a cat Models.
wearing a hat."
Core Tasks Text generation, image Writing articles, creating Generator and Controlling the
generation, audio realistic images from Discriminator networks generation process
synthesis, video descriptions, generating (in GANs), encoder and to produce desired
generation, code music in a specific style, decoder networks (in outputs, avoiding
generation, synthetic creating short video clips, VAEs), attention mode collapse (in
data generation, 3D generating Python mechanisms (in GANs), ensuring
model generation. functions, creating Transformers). diversity in
artificial data for training generated content.
other AI models,
generating 3D models of
objects.
Output New, original content A generated news article, Generated sequences, Evaluating the
that resembles the a photorealistic image of pixel arrays, audio "quality" and
training data: text a landscape, a newly waveforms, code originality of
passages, images, composed musical piece, strings, synthetic data generated content,
audio samples, video a short animation, a points, 3D mesh data. addressing ethical
clips, code snippets, Python function for concerns related to
synthetic datasets, 3D sorting a list, a table of misuse (e.g.,
models. artificially generated deepfakes).
customer data, a 3D
model of a chair.
Key Model architecture Using large language Computational Interpretability of
Consideratio selection (GAN, VAE, models with specific resources required for generative models
ns Transformer, Diffusion), prompting strategies, training and inference, and understanding
training data quality training GANs with careful the risk of generating how they create
and quantity, loss hyperparameter tuning, biased or harmful new content.
functions that using diffusion models for content.
encourage realistic and high-quality image
diverse generation, generation.
techniques for
controlling the
generation process
(e.g., conditioning).
11
Identify guiding principles for responsible AI
Principle Description / Considerations
Fairness AI systems should treat all people fairly. Consider avoiding biases in data and algorithms that
can lead to discrimination based on gender, race, age, etc.
Reliability & AI should perform reliably and safely in all expected conditions. Include testing, validation, and
Safety fail-safes to handle unexpected behavior or inputs.
Privacy & AI systems must ensure data privacy and be secure against unauthorized access or misuse.
Security Implement data protection policies, encryption, and compliance with regulations.
Inclusiveness AI systems should be designed to empower everyone and be usable by people with diverse
backgrounds and abilities (e.g., accessibility features).
Transparency Users should understand how and why an AI system makes decisions. Use explainable AI
methods and provide documentation or model interpretability.
Accountability Organizations and developers must be accountable for the AI systems they build. Assign clear
responsibility and ensure mechanisms for auditing and redress.
12
• Testing with diverse user groups
• Avoiding exclusionary design patterns
Transparency AI systems should be understandable • Explainable AI methods
and explainable in appropriate • Clear documentation of model capabilities and
context limitations
• Understandable user interfaces
• Disclosure of AI involvement in interactions
• Accessible explanations of decision processes
Accountability Organizations should be accountable • Clear governance structures
for their AI systems and their impacts • Human oversight of critical decisions
• Audit trails for significant decisions
• Mechanisms for redress when systems cause harm
• Regular ethical impact assessments
13
Describe fundamental principles of machine learning on Azure
(15-20%)
Identify common machine learning techniques
Describe how training and validation datasets are used in machine learning
Describe data and compute services for data science and machine learning
14
Identify common machine learning techniques
Category Description Examples
Supervised Learning - Learns from labeled data (input-output - Regression: House price prediction
pairs). - Classification: Spam detection,
- Objective: Map input to known outputs. Image recognition
- Two types: Regression & Classification.
Unsupervised Learning - Learns from unlabeled data. - Clustering: Customer segmentation
- Objective: Find hidden - Dimensionality Reduction: PCA for
patterns/structures. visualization
- Two types: Clustering & Dimensionality
Reduction.
Reinforcement Learning - Learns by interacting with an - Game AI (e.g., AlphaGo)
environment. - Robotics control
- Objective: Maximize reward through - Autonomous driving
actions.
- Uses agents, states, actions, rewards.
Regression ML Scenarios - Predict continuous numerical values. - Predicting temperature
- Forecasting stock prices
Classification ML - Predict discrete categories/labels. - Fraud detection
Scenarios - Sentiment analysis
Clustering ML Scenarios - Group similar data points without labels. - Customer grouping
- Market segmentation
Deep Learning Features - Deep neural networks (CNNs, RNNs, - Image classification
etc.). - Speech recognition
- Handles unstructured data (images, text, - Natural Language Processing
audio).
- Requires large datasets and high
computational power.
Transformer Architecture - Self-attention mechanism. - BERT
Features - Processes entire input sequences in - GPT (like me!)
parallel. - Translation systems
- Used for NLP tasks.
15
• Azure Automated ML for classification
Clustering Groups similar data • Customer segmentation • Azure Machine Learning clustering
points together without • Anomaly detection algorithms
prior labels • Document categorization • Azure Databricks K-means clustering
• Network traffic analysis • Azure Synapse Analytics clustering
• Image segmentation • Azure Cognitive Search semantic
clustering
Deep Learning Techniques
16
transactions. building fraud detection classification models.
Clustering Grouping customers based on Azure Synapse Analytics and Azure Machine Learning can
their purchasing behavior. be used to perform customer segmentation using
clustering algorithms.
Segmenting images into While not strictly clustering in the traditional sense, Azure
different regions or objects. Computer Vision can identify and segment objects in
images. Azure Machine Learning supports clustering on
image features.
Identifying patterns in network Azure Network Watcher and Azure Sentinel can leverage
traffic for anomaly detection. machine learning, including clustering, for security
analysis.
Grouping similar documents Azure Cognitive Search can perform semantic search and
together for topic discovery. clustering of documents. Azure Machine Learning can also
be used for custom topic modeling.
Deep Learning Automatic feature extraction Azure Machine Learning supports various deep learning
Features from raw data (e.g., images, frameworks like TensorFlow, PyTorch, and ONNX, enabling
text, audio). automatic feature learning.
Ability to learn complex, Azure GPUs and optimized compute instances in Azure
hierarchical representations of Machine Learning accelerate the training of deep neural
data. networks.
Scalability to handle large Azure Machine Learning provides distributed training
datasets and complex models. capabilities across multiple GPUs and nodes.
End-to-end learning, directly Azure Machine Learning simplifies the deployment of
mapping inputs to outputs. trained deep learning models for inference.
Transformer Self-attention mechanism, Azure Machine Learning supports Transformer-based
Architecture allowing the model to weigh the models for natural language processing tasks.
Features importance of different parts of
the input sequence.
Parallel processing of the input Azure's infrastructure enables efficient training and
sequence. inference of Transformer models.
Positional encoding to Azure Cognitive Services for Language often utilizes
understand the order of Transformer architectures under the hood.
elements in the sequence.
Encoder-decoder structure for Azure Machine Learning facilitates the fine-tuning and
sequence-to-sequence tasks. deployment of Transformer models for tasks like
translation and text generation.
Contextual understanding of Azure OpenAI Service provides access to powerful
input data. Transformer-based language models.
17
Describe core machine learning concepts
Identify features and labels in a dataset for machine learning
In machine learning, a dataset is usually structured like a table with rows and columns. Each row is a data point (or
sample), and each column is a variable (or attribute).
Features are the input variables — the data we use to make predictions.
Label (also called the target) is the output we want to predict.
Example:
Ag Salar Owns Will Buy
e y House Product
25 40,00 No No
0
45 85,00 Yes Yes
0
35 60,00 No Yes
0
Features: Age, Salary, Owns House
Label: Will Buy Product (this is what we want the model to predict)
18
Describe how training and validation datasets are used in machine learning
When training a machine learning model, the dataset is usually split into two (or sometimes three) parts:
✅ Training Dataset
This is the part of the data used to "teach" the model.
The model learns patterns and relationships between features and labels.
✅ Validation Dataset
Used to evaluate the model’s performance during training.
Helps to tune the model and avoid overfitting (when a model performs well on training data but poorly on
new data).
It's not shown to the model during training, only used to check how well it's generalizing.
(Optional) Test Dataset
Used after training and validation to give a final unbiased evaluation of the model.
Training Dataset: This is used to train the machine learning model, teaching it how to identify patterns and
relationships between features and labels. It’s essentially where the model learns.
Validation Dataset: This is used during the training phase to check how well the model is performing. It
helps in fine-tuning parameters and avoiding overfitting (when the model performs well on training data
but poorly on new data).
In practice, we also use a test dataset, which is separate from the training and validation datasets, to evaluate the
model's performance on unseen data.
Visualization Idea
Here’s a simple conceptual diagram to illustrate how data is split and used in machine learning:
Dataset Split:
1. Training Dataset → The model learns patterns.
2. Validation Dataset → The model is checked and refined.
3. Test Dataset → The model is evaluated on unseen data.
Here's a concise table that captures the key information about features, labels, and datasets in machine learning:
19
Describe Azure Machine Learning capabilities
Feature Description
Algorithm Selection Automatically tests and selects the best algorithm for the task.
Feature Engineering Creates and transforms input features to boost model performance.
Hyperparameter Tuning Optimizes algorithm settings (like learning rate, depth, etc.).
Model Evaluation Compares models using metrics (accuracy, precision, recall, etc.).
Task Types Supports classification, regression, and time-series forecasting.
Ease of Use No need for deep data science knowledge.
Efficiency Faster and more efficient than manual model training.
Capability Description
Algorithm selection Automatically tests multiple algorithms to find the best for your
data
Feature engineering Creates and selects the most relevant features from raw data
Hyperparameter tuning Optimizes model parameters to improve performance
Cross-validation Ensures models generalize well to unseen data
Model evaluation Compares metrics across different models
Model explanation Provides interpretability features to understand model decisions
Time series forecasting Specialized capabilities for time-dependent data
NLP tasks Support for text classification and other natural language
processing tasks
Capability/Service Description
Automated Data Automatically handles missing values, encodes categorical features, scales
Preprocessing & Feature numerical features, performs text featurization, and selects/engineers
Engineering relevant features.
Algorithm Selection Automatically tries a range of suitable machine learning algorithms for the
given task.
Hyperparameter Tuning Automatically searches for optimal hyperparameter settings for selected
algorithms.
Model Training & Trains multiple models in parallel and automatically evaluates their
Evaluation performance using appropriate metrics.
Best Model Selection Identifies and recommends the best-performing model based on the chosen
metric.
Explainability Provides insights into feature importance and model interpretability.
Integration with Azure ML Seamlessly integrates with Azure Machine Learning workspace and other
Azure services.
Support for Various ML Supports classification, regression, time series forecasting, computer vision,
Tasks and natural language processing tasks.
Scalability & Efficiency Leverages Azure's compute infrastructure for parallel processing, reducing
training time.
MLOps Integration Facilitates the operationalization of AutoML models through registration,
20
deployment, and monitoring options.
21
Data and Compute Services for Data Science and Machine Learning
Azure provides robust services for managing data and compute resources:
1. Data Services:
o Azure Blob Storage: Stores large-scale, unstructured data for training models.
o Azure Data Lake: Handles big data analytics and provides a scalable, secure platform.
o Azure SQL Database: Manages structured data for ML training and predictions.
2. Compute Services:
o Azure Machine Learning Compute Clusters: Scalable clusters for distributed training and inference
tasks.
o Azure Kubernetes Service (AKS): Manages containerized workloads for training and deploying ML
models.
o Azure Virtual Machines: Offers flexible computing environments tailored to your workload.
Service Type Azure Service Description
Data Storage Azure Blob Storage Stores large datasets like CSVs, images, etc.
Azure Data Lake Optimized for analytics on big data.
Azure SQL Database Stores structured, queryable data.
Compute Azure ML Compute Instance Pre-configured VM for development and testing.
Azure ML Compute Cluster Auto-scalable compute power for training.
Azure Kubernetes Service Used for scalable deployment and hosting models.
(AKS)
Inference Clusters Specialized for running deployed models.
22
Azure Virtual Machines (VMs) Customizable infrastructure as a service, including GPU-enabled
options.
Azure Databricks Compute Managed Spark clusters with CPU and GPU options for data processing
and ML.
Azure Container Instances (ACI) Fast and simple way to run Docker containers without managing
infrastructure.
Azure Kubernetes Service (AKS) Fully managed Kubernetes service for deploying and scaling
containerized applications.
Azure Functions Serverless compute service for on-demand code execution.
Azure Data Science Virtual Machines Pre-configured VMs with popular data science and ML tools.
(DSVM)
Model Management and Deployment in Azure Machine Learning
Model Management and Deployment in Azure Machine Learning
Azure Machine Learning provides comprehensive tools for managing and deploying models:
Model Management:
o Version Control: Tracks different versions of models for comparison and auditing.
o Registry: Stores and organizes models in a centralized repository.
o Monitoring: Observes model performance and detects drifts over time.
Deployment:
o Endpoint Creation: Deploys models as REST APIs for integration with applications.
o Scalability: Ensures models can handle varying loads through scaling mechanisms.
o Deployment Options: Offers real-time (online) and batch (offline) inference options.
o Azure Kubernetes Service: Supports deployment on containers for efficient scaling and
orchestration.
Capability Description
Model Registry Central store to manage and version models.
Versioning Tracks different versions of a model.
Metadata Tracking Records training data, parameters, and metrics.
Real-time Deployment REST API deployment using AKS or managed endpoints.
Batch Deployment Run predictions on large data sets on a schedule.
MLOps Integration Enables CI/CD pipelines for training and deployment.
Monitoring Tracks performance and detects data drift.
23
Model Versioning Automatic tracking of different versions of a registered model.
Metadata Tracking Ability to add tags and properties to models for organization and information.
Lineage Tracking Tracks the origin of models, including experiments and data.
Model Profiling (Preview) Provides insights into the input data expected by a model.
Model Deployment in Azure ML
Managed Online Endpoints (ACI) Fully managed real-time inference for testing and low-scale deployments using
Azure Container Instances.
Managed Online Endpoints (AKS) Fully managed real-time inference for high-scale, production deployments using
Azure Kubernetes Service.
Managed Batch Endpoints Enables batch inference on large volumes of data.
Deployment to Azure Compute (VMs, Flexible options to deploy models to various Azure compute resources.
Functions, App Service, IoT Edge)
Packaging and Containerization Simplifies the creation of Docker containers for consistent deployments.
Environment Management Allows defining and managing software dependencies for model execution.
Integration with MLOps Pipelines Enables automated deployment through Azure Machine Learning Pipelines and
Azure DevOps.
Model Monitoring Tools to monitor model performance, detect data drift, and identify issues in
deployed models.
Traffic Management & Blue/Green Allows controlled rollout of new model versions for online endpoints.
Deployments
Feature Description
Content Creation Generates text, images, audio, video, or code.
Pretrained & Fine-tunable Trained on large datasets; can be customized for specific tasks.
Prompt-driven Uses natural language input (prompts) to generate content.
Multimodal Capabilities Can work across different data types like text-to-image.
Self-learning/ Predicts next element in a sequence (e.g., next word or pixel).
Autoregressive
Feature Description
Content Generation Generates diverse outputs such as text, images, music, or videos based on
input prompts.
Context Understanding Analyzes input context to produce coherent and relevant responses.
Fine-Tuning Capability Customizes models for specific use cases or domains through additional
training.
Creativity & Variability Produces unique and varied outputs, often blending imagination with real-
world data.
Multi-Modal Integration Combines data types (e.g., text and images) for richer output capabilities.
Few-Shot & Zero-Shot Responds effectively with limited examples or none at all during input
Learning prompts.
Feature/Scenario/ Description
Consideration
Learning Data Distributions Models learn the underlying probability distribution of the training data.
Generating Novel Data Creates new, synthetic data instances that resemble the training data but
were not explicitly present.
Sampling from Latent Many models learn a lower-dimensional representation and generate data by
Spaces sampling from it.
Diverse Output Capabilities Can generate various data types, including images, text, audio, video, 3D
models, and synthetic data.
Conditioning on Input Can generate outputs based on specific prompts or conditions.
Scalability and Complexity Modern models can be very large with billions of parameters, enabling high
realism but requiring significant compute.
Continuous Improvement Performance can improve with more and better training data.
Versatile Architectures Utilizes various neural network architectures like GANs, VAEs, Transformers,
Diffusion Models, and Autoregressive Models.
24
Common Scenarios for Generative AI
These models are widely used across industries for a variety of purposes:
Examples:
1. Text Generation – Chatbots, content writing, summarization (e.g., Copilot, ChatGPT).
2. Image Generation – Design, marketing, art creation (e.g., DALL·E, Midjourney).
3. Code Generation – Assisting developers with code suggestions or generation (e.g., GitHub Copilot).
4. Translation and Language Tasks – Translating documents, answering questions, generating language-
specific content.
5. Personalization – Recommender systems, custom marketing messages.
6. Data Augmentation – Creating synthetic data for model training or testing.
Scenario Application
Content Writing articles, designing graphics, composing music, or generating
Creation animations.
Customer Answering queries, creating automated chat responses, or resolving issues
Support via bots.
Product Design Assisting in prototyping and brainstorming creative solutions for new
products.
Language Translating text across multiple languages while maintaining contextual
Translation accuracy.
Education & Developing interactive learning materials, summarizing topics, or providing
Training explanations.
Healthcare Supporting diagnostics, generating medical reports, or simplifying patient
communication.
Content Creation Writing articles, generating artwork, composing music, creating animated
(Text, Image, Audio, Video) clips.
Data Augmentation and Creating artificial datasets for training other AI models or for privacy-
Synthesis preserving data sharing.
Drug Discovery and Material Generating novel molecular structures and designing new materials.
Science
Product Design and Generating new product designs and creating 3D models.
Prototyping
Fashion and Entertainment Generating new fashion designs and creating virtual environments/characters.
Personalization Generating personalized content recommendations and customized user
experiences.
Education and Research Generating explanations and examples, assisting in scientific discovery.
Code Generation Assisting developers by generating code snippets or entire functions.
25
26
Responsible AI Considerations for Generative AI
Consideration Description
Fairness Ensuring unbiased outputs across diverse users and contexts.
Privacy Safeguarding user data and maintaining confidentiality during input and
generation.
Reliability & Guaranteeing outputs are accurate, safe, and free from harmful content.
Safety
Transparency Providing clear explanations of model behavior and limitations.
Accountability Defining responsibility for misuse or harmful outcomes from generated
outputs.
Inclusiveness Designing generative AI to benefit users with varying needs, preferences,
and accessibility.
Bias and Fairness Generative models can inherit and amplify biases, leading to unfair outputs. Requires
diverse data and bias mitigation techniques.
Transparency and Understanding the generation process is challenging; efforts are needed to improve
Explainability interpretability.
Accountability Determining responsibility for generated content is complex; clear guidelines are
needed.
Privacy and Security Risk of leaking sensitive data from training sets and potential for malicious use (e.g.,
deepfakes). Requires privacy-preserving techniques and security measures.
Safety and Ensuring models do not generate harmful, unsafe, or unreliable content. Requires
Robustness safeguards and robustness against attacks.
Human Oversight Maintaining human involvement in development and deployment to align with values
and Control and prevent unintended consequences.
Societal Impact Considering broader societal implications, including impact on jobs and information
spread.
Intellectual Property Complex issues surrounding the ownership and rights of AI-generated content.
Misinformation and Potential for generating realistic fake content for malicious purposes; requires detection
Deepfakes and mitigation strategies.
27
Identify generative AI services and capabilities in Microsoft Azure
28
Analogy The AI factory where you build and The source of the powerful The marketplace or catalog
run your AI products. AI "engines" you can use. where you shop for AI
"engines."
29
Describe features of computer vision workloads on Azure (15–
20%)
30