Class Xii Ai Worksheet Booklet Part2 2023-2024
Class Xii Ai Worksheet Booklet Part2 2023-2024
BANGALORE EAST
WORKSHEETS
2023 – 2024
ARTIFICIAL INTELLIGENCE
NAME :
CLASS: XII SECTION :
CONTENTS
PAGE TEACHER’S
[Link]. TOPIC
NO. SIGN
1. UNIT 1: CAPSTONE PROJECT 1
A capstone project is a culminating assignment, on which students usually work on during their
final year in school or at the end of the academic program. It requires different intellectual activities.
This project helps young people learn how to find and analyze information and how to work with it
efficiently.
AI project follows the following six steps:
1) Problem definition i.e. Understanding the problem
2) Data gathering
3) Feature definition
4) AI model construction
5) Evaluation & refinements
6) Deployment
Here, you should gain an empathetic understanding of the problem you’re trying to solve,
typically through user research. Empathy is crucial to a human-centered design process such as
1
design thinking because it allows you to set aside your own assumptions about the world and gain
real insight into users and their needs.
It’s time to accumulate the information gathered during the Empathize stage. You then analyze
your observations and synthesize them to define the core problems you and your team have
identified. These definitions are called problem statements. You can create personas to help keep
your efforts human-centered before proceeding to ideation.
Now, you’re ready to generate ideas. The solid background of knowledge from the first two
phases means you can start to “think outside the box”, look for alternative ways to view the
problem and identify innovative solutions to the problem statement you’ve
created. Brainstorming is particularly useful here..
This is an experimental phase. The aim is to identify the best possible solution for each problem
found. Your team should produce some inexpensive, scaled-down versions of the product (or
specific features found within the product) to investigate the ideas you’ve generated. This could
involve simply paper prototyping.
Evaluators rigorously test the prototypes. Although this is the final phase, design thinking is
iterative: Teams often use the results to redefine one or more further problems. So, you can return
to previous stages to make further iterations, alterations and refinements – to find or rule out
alternative solutions.
Overall, you should understand that these stages are different modes which contribute to the entire
design project, rather than sequential steps. Your goal throughout is to gain the deepest
understanding of the users and what their ideal solution/product would be.
3. Analytic approach:
Those who work in the domain of AI and Machine Learning solve problems and answer questions
through data every day. They build models to predict outcomes or discover underlying patterns, all to
gain insights leading to actions that will improve future outcomes.
2
• If the question is to determine probabilities of an action, then a predictive model might be
used.
• Statistical analysis applies to problems that require counts: if the question requires a yes/ no
answer, then a classification approach to predicting a response would be suitable.
4. Data requirement:
In this phase the data requirements are revised and decisions are made as to whether or not the
collection requires more or less data. Once the data ingredients are collected, the data scientist will
have a good understanding of what they will be working with.
Techniques such as descriptive statistics and visualization can be applied to the data set, to
assess the content, quality, and initial insights about the data. Gaps in data will be identified and
plans to either fill or make substitutions will have to be made.
3
5. ‘Modeling Approach:
Data Modelling focuses on developing models that are either descriptive or predictive.
• An example of a descriptive model might examine things like: if a person did this, then
they're likely to prefer that.
• A predictive model tries to yield yes/no, or stop/go type outcomes. These models are based on
the analytic approach that was taken, either statistically driven or machine learning driven.
The data scientist will use a training set for predictive modelling. A training set is a set of
historical data in which the outcomes are already known. The training set acts like a gauge to
determine if the model needs to be calibrated. In this stage, the data scientist will play
around with different algorithms to ensure that the variables in play are actually required.
Constant refinement, adjustments and tweaking are necessary within each step to ensure the
outcome is one that is solid. The framework is geared to do 3 things:
• First, understand the question at hand.
• Second, select an analytic approach or method to solve the problem.
• Third, obtain, understand, prepare, and model the data.
The train-test split is a technique for evaluating the performance of a machine learning algorithm.
It can be used for classification or regression problems and can be used for any supervised
learning algorithm.
The procedure involves taking a dataset and dividing it into two subsets. The first subset is used to
fit the model and is referred to as the training dataset. The second subset is not used to train the
model; instead, the input element of the dataset is provided to the model, then predictions are
made and compared to the expected values. This second dataset is referred to as the test dataset.
4
• Train Dataset: Used to fit the machine learning model.
• Test Dataset: Used to evaluate the fit machine learning model.
The objective is to estimate the performance of the machine learning model on new data: data not
used to train the model.
How to Configure the Train-Test Split:
The procedure has one main configuration parameter, which is the size of the train and test sets.
This is most commonly expressed as a percentage between 0 and 1 for either the train or test
datasets. For example, a training set with the size of 0.67 (67 percent) means that the remainder
percentage 0.33 (33 percent) is assigned to the test set. There is no optimal split percentage.
You must choose a split percentage that meets your project’s objectives with considerations that
include:
Computational cost in training the model.
Computational cost in evaluating the model.
Training set representativeness.
Test set representativeness.
Machine learning is an iterative process. You will face choices about predictive variables to use, what
types of models to use, what arguments to supply those models, etc. We make these choices in a data-
driven way by measuring model quality of various alternatives.
Performance metrics like classification accuracy and root mean squared error can give you a clear
objective idea of how good a set of predictions is, and in turn how good the model is that generated
them.
This is important as it allows you to tell the difference and select among:
5
Different transforms of the data used to train the same machine learning model.
Different machine learning models trained on the same data.
Different configurations for a machine learning model trained on the same data.
All the algorithms in machine learning rely on minimizing or maximizing a function, which we call
“objective function”. The group of functions that are minimized are called “loss functions”. A loss
function is a measure of how good a prediction model does in terms of being able to predict the
expected outcome. A most commonly used method of finding the minimum point of function is
“gradient descent”. Think of loss function like undulating mountain and gradient descent is like
sliding down the mountain to reach the bottom most point.
Loss functions can be broadly categorized into 2 types: Classification and Regression Loss.
Classification:
Log Loss:
Log Loss is the most important classification metric based on probabilities. It’s hard to interpret raw
log-loss values, but log-loss is still a good metric for comparing models. For any given problem, a
lower log loss value means better predictions.
6
Focal Loss:
A Focal Loss function addresses class imbalance during training in tasks like object detection.
Focal loss applies a modulating term to the cross-entropy loss in order to focus learning on hard
misclassified examples. It is a dynamically scaled cross entropy loss, where the scaling factor
decays to zero as confidence in the correct class increases. Intuitively, this scaling factor can
automatically down-weight the contribution of easy examples during training and rapidly focus the
model on hard examples.
Exponential Loss:
The exponential loss is convex and grows exponentially for negative values which makes it more
sensitive to outliers. The exponential loss is used in the AdaBoost algorithm (statistical
classification meta-algorithm). The principal attraction of exponential loss in the context of additive
modeling is computational
Hinge Loss:
The hinge loss is a specific type of cost function that incorporates a margin or distance from the
classification boundary into the cost calculation. Even if new observations are classified correctly,
they can incur a penalty if the margin from the decision boundary is not large enough.
KL Divergence Loss:
KL divergence in simple term is a measure of how two probability distributions (say ‘p’ and ‘q’) are
different from each other. So this is exactly what we care about while calculating the loss function.
Here ‘q’ is the probability distribution that the neural network model will predict whereas ‘p’ is the
true distribution (in case of multiclass classification problem ‘p’ is the one-hot encode vector and ‘q’
is the softmax output from the dense layer).
Regression:
Quantile Loss:
A quantile is the value below which a fraction of observations in a group falls. For example, a prediction
for quantile 0.9 should over-predict 90% of the times.
8. RMSE (Root Mean Squared Error)
In machine Learning when we want to look at the accuracy of our model we take the root mean
square of the error that has occurred between the test values and the predicted values
mathematically:
For a single value:
Let a= (predicted value- actual value) ^2
7
Let b= mean of a = a (for single value)
Then RMSE= square root of b
For a wide set of values RMSE is defined as follows:
Mean Square Error (MSE) is the most commonly used regression loss function. MSE is the sum of
squared distances between our target variable and predicted values.
MSE is sensitive towards outliers and given several examples with the same input feature values, the optimal
prediction will be their mean target value. This should be compared with Mean Absolute Error, where the
optimal prediction is the median. MSE is thus good to use if you believe that your target data, conditioned on
the input, is normally distributed around a mean value, and when it’s important to penalize outliers extra
much.
8
DELHI PUBLIC SCHOOL BANGALORE - EAST
ARTIFICIAL INTELLIGENCE
CAPSTONE PROJECT - WORKSHEET
NAME: CLASS:XII SEC: DATE:
9
I. Only a
II. Only b
III. Both a and c
IV. Both b and d
6. The primary way to collect data is .
a. Experiment
b. Survey
c. Interview
d. Observation
7. Which one does NOT belong with Regression loss?
a. Log Loss
b. Mean Absolute Error
c. Log cosh Loss
d. Quantile Loss
8. Regression function predict a and classification predicts a label.
a. Output
b. Quantity
c. Loss
d. Logic
9. Which of the following are common split percentages between Train & Test data?
a. Train : 50% , Test : 50%
b. Train : 5% , Test : 95%
c. Train : 67% , Test : 33%
d. Train : 80% , Test : 20%
I. a and b
II. c and d
III. a, c and d
IV. a, b and c
10. Which stage in Design Thinking missing [Prototype, Ideate, Test, Define] ?
a. Evaluation
b. Empathies
c. Evolution
d. Enrichment
10
B. Fill in the blanks:
1. The dataset is used to evaluate the model and adjust it as necessary.
2. means handling missing or invalid values, removing duplicates, applying
correct formats after the data has been collected.
3. cannot be a negative value.
4. The data scientist will use for predictive modeling.
C. State whether the following statements are true or false.
1. The problem solving methodology is iterative in nature.
2. There are 2 types of loss functions namely regression losses and classification losses.
3. Cross validation techniques divides the provided dataset into 2 subsets namely training dataset
and testing dataset.
4. Historical data in which the desired outcome is already known as .
5. The methodology for model building and deployment is an process.
D. Answer the following:
1. Define Capstone project
2. List down the various steps under AI project
3. What do you mean by Design thinking? List down & explain the stages of design thinking.
4. Write the steps which are involved in Problem decomposition.
5. Write the Train-Test split procedure in Python.
6. List down some of the sources from where the data can be gathered for data analysis?
7. What is Cross-validation?
8. What is the use of loss function? What are all 2 different categories of loss functions?
9. Define the following:
a. MSE (Mean Squared Error)
b. RMSE ( Root Mean Square Error)
10. What are hyper parameters? Explain with an example.
*******************
11
DELHI PUBLIC SCHOOL BANGALORE - EAST
ARTIFICIAL INTELLIGENCE
UNIT 2 - MODEL LIFE CYCLE
NAME: CLASS:XII SEC: DATE:
AI Project Cycle Class has the following three phases:
Phase I:
1. Problem Scoping
2. Data Acquisition
3. Data Exploration
Phase II:
1. Evaluation
2. Data Modelling
Phase III:
1. Deployment
2. Feedback
Problem Scoping:
Before beginning to build a solution, it is critical to first understand the problem description and business limitations.
Whenever we are starting any work, certain problems always associated with the work or
process. These problems can be small or big, sometimes we ignore them, sometimes we
need urgent solutions. Problem scoping is the process by which we figure out the
problem that we need to solve.
2. The 4Ws
1. Who? – Refers that who is facing a problem and who are the stakeholders of the problem
2. What? – Refers to what is the problem and how you know about the problem
12
3. Where? – It is related to the context or situation or location of the problem
4. Why? – Refers to why we need to solve the problem and what are the benefits to the
stakeholders after solving the problem
After understanding and writing the problems, set your goals, and make them your AI project target.
Write your goals for your selected theme.
Suppose you have selected theme of agriculture then write how AI will help farmers to solve their
problems.
Your final problem statement will look likes the following table:
Who Stakeholders
When Context/Situation
Decide the mature age for the crop and determine its time
Take the crop on time and supply against market demand on time
1. Data: Data refers to the raw facts , figures, or piece of facts, or statistics collected for reference
or analysis.
2. Acquisition: Acquisition refers to acquiring data for the project.
Classification of Data:
Now Observe the following diagram to for the data classification, we will discuss each of them in detail:
13
Basic Data:
Basically, data is classified into two categories:
14
1. Numeric Data: Mainly used for computation. Numeric data can be classified into the following:
o Discrete Data: Discrete data only contains integer numeric data. It doesn’t have any decimal or
fractional value. The countable data can be considered as discrete data. For example 132
customers, 126 Students etc.
o Continuous Data: It represents data with any range. The uncountable data can be represented in
this category. For example 10.5 KGS, 100.50 Kms etc.
2. Text Data: mainly used to represent names, collection of words together, phrases, textual information
etc.
– Text
– Sound
Numbers
– Videos
– Images
Structural Classification:
The data which is going to be feed in the system to train the model or already fed in the system can
have a specific set of constraints or rules or unique pattern can be considered as structural data.
1. Structured Data: As we discussed the structured data can have a specific pattern or set of rules. These
data have a simple structure and stores the data in specific forms such as tabular form. Example, The
cricket scoreboard, Your school time table, Exam datasheet etc.
2. Unstructured Data: The data structure which doesn’t have any specific pattern or constraints as well as
can be stored in any form is known as unstructured data. Mostly the data that exists in the world is
unstructured data. Example, Youtube Videos, Facebook Photos, Dashboard data of any reporting tool etc.
3. Semi-Structured Data: It is the combination of both structured and unstructured data. Some data can
have a structure like a database whereas some data can have markers and tags to identify the structure of
data.
Other Classification:
This classification is sub divided into the following branches:
1. Time-Stamped Data: This structure helps the system to predict the next best action. It is following a
specific time-order to define the sequence. This time can be the time of data captured or processed or
collected.
2. Machine Data: The result or output of a specific program, system or technology considered as machine
data. It consists of data related to a user’s interaction with the system like the user’s logged-in session
data, specific search records, user engagement such as comments, likes and shares etc.
3. Spatiotemporal Data: The data which contains information related to geographical location and time is
considered as spatiotemporal data. It records the location through GPS and time-stamped data where the
event is captured or data is collected.
15
4. Open Data: It is freely available data for everyone. Anyone can reuse this kind of data.
5. Real-time Data: The data which is available with the event is considered as real-time data.
6. Big Data: You may hear this word most often. The data which cannot be stored by any system or
traditional data collection software like DBMS or RDBMS software can be considered as Big data. Big
data itself a very deep topic.
• Now, as you interact with the authorities, you get to know that some people are allowed to enter the area
where the diamond is kept.
• Some of them being – the maintenance people; officials; VIPs, etc.
• Now, your challenge is to make sure that no unauthorised person enters the premises.
• For this, you: (choose one)
o Get photographs of all the authorised people.
o Get photographs of all the unauthorised people.
o Get photographs of the premises in which the diamond has been kept.
o Get photographs of all the visitors
Data Features:
Data features refer to the type of data you want to collects. Here three terms are associated with this:
1. Training Data: The collected data through the system is known as training data. In other words
the input given by the user in the system can be considered as training data.
2. Testing Data: The result data set or processed data is known as testing data. In other words, the
output of the data is known as testing data.
3. Validation set: Data the model has not been trained on and used to tune hyperparameters
16
Structured Semi structured Unstructured
Having a pattern, usually stored in No well-defined structures but Without any structure or
tabular form and accessed by some categorized data using some not defined in any
applications like MS excel or DBMS meta tags framework
– Employees data of a company – HTML Page – Audio Video file
– Result dataset of a board – CSV Files – Social Media posts
Data Exploration:
Data Exploration refers to the techniques and tools used to visualize data through complex statistical
methods.
• Quickly get a sense of the trends, relationships and patterns contained within the data.
• Define strategy for which model to use at a later stage.
• Communicate the same to others effectively.
• To visualise data, we can use various types of visual representations.
Modelling:
AI Modelling refers to developing algorithms, also called models which can be trained to get intelligent
outputs. That is, writing codes to make a machine artificially intelligent.
Types of AI models:
17
Rule-Based model refers to setting up rules and training the model accordingly. It follows an algorithm
or code to train, test and validate data.
Learning-based refer to identifying the data by its attributes and behaviour and training the model
accordingly. There is no code or algorithm to train, test and validate the data. It learns from past
behavior and attributes received from data.
Types of learning:
There are three types of learning:
1. Supervised
2. Unsupervised
3. Reinforcement
Testing / Evaluation:
Once a model has been created and trained, it must be properly tested to calculate the model’s efficiency
& performance. As a result, the model is evaluated using Testing data & the model’s efficiency is
assessed.
The set of measurements will differ depending on the problem you are working on. For regression
problems, MSE or MAE are commonly used. For a balanced dataset, accuracy may be useful choice for
evaluating a classification model. For imbalanced datasets, F1 Score is useful.
A separate validation dataset is used for evaluation during training. It monitors how well our model
generalises, avoiding bias & overfitting.
While the fundamental testing concepts are fully applicable in AI development projects, there are
additional considerations too. These are as follows:
• The volume of test data can be large, which presents complexities.
• Human biases in selecting test data can adversely impact the testing phase, therefore, data
validation is important.
• Your testing team should test the AI and ML algorithms keeping model validation, successful
learnability, and algorithm effectiveness in mind.
• Regulatory compliance testing and security testing are important since the system might deal
with sensitive data, moreover, the large volume of data makes performance testing crucial.
• You are implementing an AI solution that will need to use data from your other systems,
therefore, systems integration testing assumes importance.
• Test data should include all relevant subsets of training data, i.e., the data you will use for
training the AI system.
• Your team must create test suites that help you validate your ML models.
Finally, we come to the model deployment stage. This means that we must implement it in an environment with a
web interface or some kind of application where the new data can flow and our ML models can show the analysis
in the new interface.
18
An artificial intelligence solution that predicts energy consumption for energy providers would take related data,
analyze it, and send its prediction to a web portal or app for companies to view and act on. Such tools simplify the
decision-making process for end-users.
However, just because you’ve launched your AI solution live doesn’t mean the project is done. As in the previous
steps, an equally important part is monitoring, reviewing, and making sure that your solution continues to deliver
the desired results.
Most likely some adjustments and alterations will be required. This will depend on your customer and staff
feedback or on trial and error. New data may be entered into the model to ensure that the results are accurate and
up-to-date.
19
DELHI PUBLIC SCHOOL BANGALORE - EAST
ARTIFICIAL INTELLIGENCE
UNIT 2 - MODEL LIFE CYCLE - WORKSHEET
NAME: CLASS:XII SEC: DATE:
20
b. Matplotlib
c. PyCharm
d. Scikit-learn
7. Which of the following statements is/are incorrect?
a. The volume of test data can be large, which presents complexities.
b. Testing team should test the AI & ML algorithms keeping model validation, successful
learn ability and algorithm effectiveness in mind.
c. Test data should include all irrelevant subsets of training data, that is the data you will
use for training the AI system.
i. All are incorrect
ii. b
iii. c
iv. a, b & c
B. Answer the following:
1. Explain the following:
a. Data Exploration
b. Modeling
2. Explain the terms over fitting, under fitting and perfect fit in terms of model testing.
3. What should be considered during testing phase?
4. Explain briefly the 4Ws of problem scoping.
C. Fill in the blanks:
1. During phase, you need to evaluate the various AI development platforms.
2. The refers to the type of data you want to collect.
3. For a balanced dataset, is a useful choice for evaluating a classification
model.
4. Two sources of authentic data are and .
5. The block in 4W Problem Canvas refers to the stakeholders.
6. The is used to assist us in identifying the important factors associated with
the problem.
D. Match & choose:
a. Open languages - i. ML techniques, GAN framework
b. Open frameworks - ii. Python, Scala
c. Approaches - iii. Watson, Azure
d. Tools to help - iv. Scikit-learn, TensorFlow
21
DELHI PUBLIC SCHOOL BANGALORE - EAST
ARTIFICIAL INTELLIGENCE
UNIT III: STORYTELLING THROUGH DATA
NAME: CLASS:XII SEC: DATE:
A well-told story is an inspirational narrative that is crafted to engage the audience across boundaries and
cultures, as they have the impact that isn’t possible with data alone. Data can be persuasive, but stories are
much more. They change the way that we interact with data, transforming it from a dry collection of
“facts” to something that can be entertaining, engaging, thought provoking, and inspiring change.
Each data point holds some information which maybe unclear and contextually deficient on its own. The
visualizations of such data are therefore, subject to interpretation (and misinterpretation). However, stories
are more likely to drive action than are statistics and numbers. Therefore, when told in the form of a
narrative, it reduces ambiguity, connects data with context, and describes a specific interpretation –
communicating the important messages in most effective ways. The steps involved in telling an effective
data story are given below:
• Understanding the audience
• Choosing the right data and visualisations
• Drawing attention to key information
• Developing a narrative
• Engaging your audience
Finally, when narrative and visuals are merged together, they can engage or even entertain an
audience. When you combine the right visuals and narrative with the right data, you have a data
story that can influence and drive change.
22
By the numbers: How to tell a great story with your data?
Presenting the data as a series of disjointed charts and graphs could result in the audience struggling to
understand it – or worse, come to the wrong conclusions entirely. Thus, the importance of a narrative
comes from the fact that it explains what is going on within the data set.
Some easy steps that can assist in finding compelling stories in the data sets are as follows:
• It is an effective tool to transmit human experience. Narrative is the way we simplify and make
sense of a complex world. It supplies context, insight, interpretation—all the things that make
data meaningful, more relevant and interesting.
• No matter how impressive an analysis, or how high-quality the data, it is not going to compel
change unless the people involved understand what is explained through a story.
• Stories that incorporate data and analytics are more convincing than those based entirely on
anecdotes or personal experience.
• It helps to standardize communications and spread results.
• It makes information memorable and easier to retain in the long run.
23
DELHI PUBLIC SCHOOL BANGALORE - EAST
ARTIFICIAL INTELLIGENCE
UNIT III: STORYTELLING THROUGH DATA – WORKSHEET
10. Consider the following, analyze the data and convert it into tabular form.
********
24