0% found this document useful (0 votes)
58 views20 pages

Class 9 Unit 1 AI Project Cycle Notes New

The document outlines the AI Project Cycle, detailing its components including Problem Scoping, Data Acquisition, Data Exploration, Modelling, and Evaluation. It explains each step's significance, such as understanding the problem, collecting reliable data, and creating models, while also discussing various machine learning approaches like supervised, unsupervised, and reinforcement learning. Additionally, it emphasizes the importance of evaluating AI models through predictions and reality comparisons to enhance their effectiveness.

Uploaded by

gsanushree321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views20 pages

Class 9 Unit 1 AI Project Cycle Notes New

The document outlines the AI Project Cycle, detailing its components including Problem Scoping, Data Acquisition, Data Exploration, Modelling, and Evaluation. It explains each step's significance, such as understanding the problem, collecting reliable data, and creating models, while also discussing various machine learning approaches like supervised, unsupervised, and reinforcement learning. Additionally, it emphasizes the importance of evaluating AI models through predictions and reality comparisons to enhance their effectiveness.

Uploaded by

gsanushree321
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 20

Unit 1 AI PROJECT CYCLE

.Project Cycle is a step-by-step process to solve problems using proven scientific


methods and drawing inferences about them
Components of Project Cycle
Components of the project cycle are the steps that contribute to completing the Project.
Components of the AI Project Cycle:
Problem Scoping Understanding the problem

Data Acquisition Collecting accurate and reliable data

Data Exploration Arranging the data uniformly

Modelling Creating Models from the data

Evaluation Evaluating the project

Problem Scoping
Problem Scoping refers to understanding a problem finding out various factors which affect
the problem, define the goal or aim of the project.
4Ws Of Problem Scoping

The 4W’s of Problem Scoping are Who, What, Where, and Why. This Ws helps in
identifying and understanding the problem in a better and efficient manner.
1. Who – “Who” part helps us in comprehending and categorizing who all are affected
directly and indirectly with the problem and who are called the Stake Holders

2. What – “What” part helps us in understanding and identifying the nature of the problem
and under this block, you also gather evidence to prove that the problem you have
selected exists.
3. Where – “Where” does the problem arise, situation, and location?

4. Why – “Why” is the given problem worth solving.


Problem Statement Template (PST)

The Problem Statement Template helps us to summarize all the key points into one
single.

Template so that in the future, whenever there is a need to look back at the basis of the
problem, we can take a look at the Problem Statement Template and understand the key
elements of it.

Data Acquisition
Data Acquisition is the process of collecting accurate and reliable data to work
with. Data Can be in the format of the text, video, images, audio, and so on and it can be
collected from various sources like interest, journals, newspapers, and so on.
Data Sources

Surveys

1. Survey is one of the method to gather data from the users for the second stage of ai project
cycle that is data acquisition.
2. Survey is a method of gathering specific information from a sample of people. for
Example a census survey is conducted every year for analyzing the population.
3. Surveys are conducted in particular areas to acquire data from particular people.
Web Scraping

1. Web Scraping means collecting data from web using some technologies.
2. We use it for monitoring prices, news and etc.
3. For example using Programming we can Do webscrapping. using beutiful soup in python.
beutiful soup is a package in Python
Sensors

1. Sensors are very Important but very simple to understand.


2. Sensors are the part of IOT. IOT is internet of things.
3. Example of IOT is smart watches or smart fire alarm which automatically detects wire and
starts the alarm.. How does this happen, this happens when sensors like fire sensor sends
data to the IOT or the smart alarm and if sensor detects heat or fire the alarm starts.
Cameras

1. Camera captures the visual information and then that information which is called image is
used as a source of data.
2. Cameras are used to capture raw visual data.
Observations

1. When we observe something carefully we get some information


2. For example, Scientists take instects in observation for years and that data will be used by
them . So this is a data source.
3. Observations is a time consuming data source.
API

1. API stands for Application Programming interface.


2. Let us take an example to understand API: When you visit a restaurant and check the
menu, and then you want to order some food, do you do to the kitchen and ask the cook to
prepare food, no right. You ask the waiter for the order and then the waiter gives that order
to the main kitchen area.
3. So here waiter is a messenger which takes request and tells the kitchen what you wantand
then the waiter responds you with the food
4. Like that: API is actually a messenger which takes requests from you and then tells the
system what you want and then it gives you a response.
5. Now the response which it takes can be in json format or other formats..
6. Well what is json .. it is just a format to store structured, object type data. Below is an
example of JSON, for nerds.
Data Exploration
Data Exploration is the process of arranging the gathered data uniformly for a better
understanding. Data can be arranged in the form of a table, plotting a chart, or making a
database.
If we simplify this Data Exploration means that the data which we collected in Data
Acquisition, in Data Exploration we need to arrange it for example if we have data of 50
students in a class, we have their Mobile Number, Date of Birth, Class, etc.

In the process of data exploration, we can make a chart for that data in which all the names
will be at one place and all the mobile numbers at one, etc.
Data Exploration or Visualization Tools

1. Google Charts

Google chart tools are powerful, simple to use, and free. Try out our rich gallery of
interactive charts and data tools.
2. Tableau

Tableau is often regarded as the grandmaster of data visualization software and for good
reason.

Tableau has a very large customer base of 57,000+ accounts across many industries due to
its simplicity of use and ability to produce interactive visualizations far beyond those
provided by general BI solutions.
3. FusionCharts

This is a very widely-used, JavaScript-based charting and visualization package that has
established itself as one of the leaders in the paid-for market.
It can produce 90 different chart types and integrates with a large number of platforms and
frameworks giving a great deal of flexibility.
4. Highcharts

A simple options structure allows for deep customization, and styling can be done via
JavaScript or CSS. Highcharts is also extendable and pluggable for experts seeking
advanced animations and functionality.
Modelling
Modelling is the process in which different models based on the visualized data can
be created and even checked for the advantages and disadvantages of the model.
To Make a machine learning model there are 2 ways/Approaches Learning-Based
Approach and a Rule-Based Approach.
Learning Based Approach

The learning-Based Approach is based on a Machine learning experience with the data fed.
Machine Learning (ML)

Machine learning is a subset of artificial intelligence (AI) which provides machines the
ability to learn automatically and improve from experience without being programmed for it.
Types of Machine Learning

Machine learning can be divided into 3 types, Supervised Learning, Unsupervised


Learning, and semi-supervised or Reinforcement Learning
Supervised Learning
Supervised learning is where a computer algorithm is trained on input data that has been
labeled for a particular output.

For example a shape with three sides is labeled as a


triangle, Classification and Regression models are also type of supervised Learning

Regression
Regression is a supervised learning technique used to predict continuous numerical values
based on input features. It aims to establish a functional relationship between independent
variables and a dependent variable, such as predicting house prices based on features like
size, bedrooms, and location Regression is a type of supervised learning which is used to
predict continuous value.
The goal is to minimize the difference between predicted and actual values using algorithms
like Linear Regression, Decision Trees, or Neural Networks, ensuring the model captures
underlying patterns in the data.

Classification
Classification is a type of supervised learning that categorizes input data into predefined
labels. It involves training a model on labeled examples to learn patterns between input
features and output classes. In classification, the target variable is a categorical value. For
example, classifying emails as spam or not. Classification in which the algorithm’s job is to
separate the labeled data to predict the output.
The model’s goal is to generalize this learning to make accurate predictions on new, unseen
data. Algorithms like Decision Trees, Support Vector Machines, and Neural Networks are
commonly used for classification tasks.
Advantages of Supervised Learning
The power of supervised learning lies in its ability to accurately predict patterns and make
data-driven decisions across a variety of applications. Here are some advantages listed
below:
Labeled training data benefits supervised learning by enabling models to accurately learn
patterns and relationships between inputs and outputs.
Supervised learning models can accurately predict and classify new data.
Supervised learning has a wide range of applications, including classification, regression,
and even more complex problems like image recognition and natural language processing.
Well-established evaluation metrics, including accuracy, precision, recall, and F1-score,
facilitate the assessment of supervised learning model performance.
Disadvantages of Supervised Learning
Although supervised learning methods have benefits, their limitations require careful
consideration during problem formulation, data collection, model selection, and evaluation.
Here are some disadvantages listed below:
Overfitting: Models can overfit training data, which leads to poor performance on new,
unseen data due to the capture of noise.
Bias in Models: Training data biases can lead to unfair predictions.
Supervised learning heavily depends on labeled training data, which can be costly, time-
consuming, and may require domain expertise.
Unsupervised Learning
In terms of machine learning, unsupervised learning is in which a system learns
through data sets created on its own. In this, the training is not labeled.

Learning on its own is termed Unsupervised learning.


Basically, in unsupervised learning where the data is un-tagged or un-named, the machine
creates a learning algorithm using its structural data-sets present in its input.

Example: Suppose a boy sees someone performing tricks by a ball, so he also learnt the
tricks by himself. This is what we call unsupervised learning.
Unsupervised Learning Algorithms
There are mainly 3 types of Algorithms which are used for Unsupervised dataset.
 Clustering
 Association Rule Learning
 Dimensionality Reduction
Clustering
Clustering in unsupervised machine learning is the process of grouping unlabeled data into
clusters based on their similarities..
Broadly this technique is applied to group data based on different patterns, such as
similarities or differences, our machine model finds.
Association
Association also known as association rule mining is a common technique used to discover
associations in unsupervised machine learning. This technique is a rule-based ML
technique that finds out some very useful relations between parameters of a large data set.
This technique is basically used for market basket analysis that helps to better understand
the relationship between different products. For e.g. shopping stores use algorithms based
on this technique to find out the relationship between the sale of one product w.r.t to
another’s sales based on customer behavior. Like if a customer buys milk, then he may also
buy bread, eggs, or butter. Once trained well, such models can be used to increase their
sales by planning different offers.
Dimensionality Reduction
Dimensionality reduction is the process of reducing the number of features in a dataset
while preserving as much information as possible. This technique is useful for improving
the performance of machine learning algorithms and for data visualization.
Advantages of Unsupervised learning
 No labeled data required: Unlike supervised learning, unsupervised learning does not
require labeled data, which can be expensive and time-consuming to collect.
 Can uncover hidden patterns: Unsupervised learning algorithms can identify patterns
and relationships in data that may not be obvious to humans.
 Can be used for a variety of tasks: Unsupervised learning can be used for a variety of
tasks, such as clustering, dimensionality reduction, and anomaly detection.
 Can be used to explore new data: Unsupervised learning can be used to explore new
data and gain insights that may not be possible with other methods.
Disadvantages of Unsupervised learning
 Difficult to evaluate: It can be difficult to evaluate the performance of unsupervised
learning algorithms, as there are no predefined labels or categories against which to
compare results.
 Can be difficult to interpret: It can be difficult to understand the decision-making
process of unsupervised learning models.
 Can be sensitive to the quality of the data: Unsupervised learning algorithms can be
sensitive to the quality of the input data. Noisy or incomplete data can lead to
misleading or inaccurate results.
 Can be computationally expensive: Some unsupervised learning
algorithms, particularly those dealing with high-dimensional data or large datasets, can
be computationally expensive
Reinforcement Learning
Learning through feedback or trial and error method is called Reinforcement Learning.

In this type of learning, The system works on Reward or Penalty policy. In this an agent
performs an action positive or negative, in the environment which is taken as input from the
system, then the system changes the state in the environment and the agent is provided with
a reward or penalty.
The system also builds a policy, that what action should be taken under a specific condition.

Example: A very good example of these is Vending machines.


Suppose you put a coin (action) in a Juice Vending machine(environment), now the
system detects the amount of coin given (state) you get the drink corresponding to
the amount(reward) or if the coin is damaged or there is any another problem, then you get
nothing (penalty).
Here the machine is building a policy that which drink should be provided under what
condition and how to handle an error in the environment.
Rule Based Approach

 Rule Based Approach Refers to the AI modelling where the relationship or patterns in
data are defined by the developer.
 That means the machine works on the rules and information given by the developer
and performs the task accordingly.

For example: Suppose you have a dataset containing 100 images of apples and bananas each.
Now you created a machine using Computer-Vision and trained it with the labeled images
of apples and bananas. If you test your machine with an image of an apple it will give you
the output by comparing the images in its datasets. This is known as the Rule-Based
Approach.
Datasets
Dataset is a collection of related sets of information that is composed of separate elements
but can be manipulated by a computer as a unit.

In the Rule-based Approach we will deal with 2 divisions of the dataset:


1. Training Data – A subset required to train the model
2. Testing Data – A subset required while testing the trained the model
Training vs Testing Data
Base Training Set Testing Set

Used for Testing the Model after it is


Use Used for Training the Model
trained

Is a lot bigger than testing data and It is smaller than Training Set and
Size
constitutes about 70% to 80% constitutes about 20% to 30%
EVALUATION

What is Evaluation?
Evaluation is a process that critically examines a program. It involves collecting and
analyzing information about a program’s activities, characteristics, and outcomes. Its
purpose is to make judgments about a program, to improve its effectiveness, and/or to
inform programming decisions.
So, Evaluation is basically to check the performance of your AI model. This is done by
mainly two things “Prediction” & “Reality“. Evaluation is done by:-

First search for some testing data with the resulted outcome that is 100% true

Then you will feed that testing data to the AI model while you have the correct outcome
with yourself that is termed as “Reality”.
Then when you will get the predicted outcome from the AI model that is called “Prediction”
compare it with the resulted outcome, that is “Reality”.

You can do this to:-


improve the efficiency and performance of your AI Model,

Improve it, check your mistakes

EXAMPLE
Imagine that you have come up with an AI-based prediction model which has been deployed
to identify Football or a soccer ball.
Now, the objective of the model is to predict whether the given/shown figure is a football.
Now, to understand the efficiency of this model, we need to check if the predictions which it
makes are correct or not. Thus, there exist two conditions that we need to consider upon
Prediction and Reality.
The prediction is the output that is given by the machine

The reality is the real scenario about the figure shown when the prediction has been made.

Now let us look at various combinations that we can have with these two conditions.

1–Possibility

Is this a Football?
Prediction = YES
Reality = YES

True Positive
Here, we can see in the picture that it’s a football. The model’s predicts is Yes
which means it’s a football. The Prediction matches Reality. Hence, this condition is
termed as True Positive.

2-Case

Is this a Football?
Prediction = NO

Reality = NO

True Negtive
Here this is Not an image of Football hence the reality is No. In this case, the machine has
predicted
it correctly as a No. Therefore, this condition is termed as True Negative.

3-Possible action

Is this a Football?

Prediction = YES
Reality = NO

False Positive
Here the reality is that there is no Football. But the machine has incorrectly predicted that
there is
a Football. This case is termed False Positive.

4-last case

Is this a Football?

Prediction = NO
Reality = YES

False Negative
Here, a Football has been in a different look because of which the Reality is Yes but the
machine has incorrectly predicted it as a No which means the machine predicts that there is
no Football. Therefore, this case becomes False Negative.

Confusion Matrix
The comparison between the results of Prediction and reality is called the Confusion Matrix.

The confusion matrix allows us to understand the prediction results. It is not an evaluation
metric but a record that can help in evaluation.
Confusion Matrix table

Parameters to Evaluate a Model


Now let us go through all the possible combinations of “Prediction” and “Reality” & let us
see how we can use these conditions to evaluate the model.

Accuracy
Definition: The percentage of “correct predictions out of all the observations“. A prediction
can be said to be correct “if it matches the reality”.

Here, we have two conditions in which the Prediction matches with the Reality:

True Positive
Prediction = YES

Reality = YES

When the model prediction is Yes & when it matches with reality which is also YES. Hence,
this condition is termed as True Positive.

True Negative
Prediction = NO

Reality = NO

When the model prediction is NO & when it matches with reality which is also NO. Hence,
this condition is termed as True Negative.
Accuracy Formula

Here, total observations cover all the possible cases of prediction that can be True Positive
(TP), True Negative (TN), False Positive (FP), and False Negative (FN).

Example
Let us go back to the Football example.

Assume that the model always predicts that there is no football. But in reality, there is a 2%
chance of a football. In this case, for 98 cases, the model will be right but for those 2 cases
in which there was a football, then to the model predicted for no football.
Here,
True Positives = 0

True Negatives = 98

Total cases = 100


Therefore, accuracy becomes:
98+0/100 = 98%

Precision
Definition: The percentage of “true positive cases“ versus all the cases where the prediction
is true. That is, it takes into account the True Positives and False Positives.

That is taken into the count of:

True Positives

Prediction = YES
Reality = YES
When the model prediction is Yes & when it matches with reality which is also YES. Hence,
this condition is termed as True Positive.

False Positives

Prediction = YES
Reality = NO

When the model prediction is Yes & when it matches with reality which is NO. Hence, this
condition is termed as False Positive.

Precision Formula

Going back to the football example, in this case, assume that the model always predicts that
there is a Football irrespective of the reality. In this case, all the Positive conditions would be
taken into account that is,

True Positive (Prediction = Yes and Reality = Yes)

False Positive (Prediction = Yes and Reality = No)

In this case, the Players will check for the ball all the time to see if it is Football or not
(which means if the reality is True or False ).

Example
Predicition = 10 cases of TP

Reality = 20 cases of YES

100% ACCURATE
Let us consider that a model has 100% precision. This means that whenever the machine
says there’s a Football, there is actually a Football(True Positive).

In the same model, there can be a rare exceptional case where there was actual Football but
the system could not detect it. This is the case of a False Negative condition.
But the precision value would not be affected by it because it does not take FN (False
Negative) into account.

Recall
Definition: The fraction of positive cases that are correctly identified

It majorly takes into account the true reality cases wherein Reality there was a football but
the machine either detected it correctly or it didn’t. That is, it considers

True Positives (There was a football in reality and the model predicted it correctly)
False Negatives (There was a football and the model didn’t predict it).

True Positives
Prediction = YES

Reality = YES

When the model prediction is Yes & when it matches with reality which is also YES. Hence,
this condition is termed as True Positive.
False Negative
Prediction = NO

Reality = YES

When the model prediction is No & when it doesn’t match with the reality which is YES,
then this condition is termed as False Negative.

Recall Formula

F1 Score
Definition: The measure of the balance between precision and recall.
So before going deep inside the F1 score we must first understand its definition. It is said
that “the balance between precision and recall” as we don’t know which metric is more
important we seek the term F1 score.
F1 Score Formula

we must say that if we want to know if our model’s performance is good, we need these two
measures: Recall and Precision.
For some cases, you might have a High Precision but Low Recall or Low Precision but High
Recall.

But since both the measures are important, there is a need for a parameter that takes both
Precision and Recall into account which is called the F1 Score.
An ideal situation would be when we have a value of 1 (that is 100%) for both Precision and
Recall. In that case, the F1 score would also be an ideal 1 (100%). It is known as the perfect
value for the F1 Score. As the values of both Precision and Recall ranges from 0 to 1, the F1
score also ranges from 0 to 1.

Example for confusion matrix:

Calculate Accuracy, Precision, Recall and F1 Score for the following Confusion Matrix

on Heart Attack Risk. Also suggest which metric would not be a good evaluation

parameter here and why?

The Confusion Matrix

Calculation:

Accuracy:

Accuracy is defined as the percentage of correct predictions out of all the observations

Where True Positive (TP), True Negative (TN), False Positive (FP) and False Negative
(FN).
Accuracy = (50+20) / (50+20+20+10)

= (70/100)

= 0.7

Precision:

Precision is defined as the percentage of true positive cases versus all the cases where the

prediction is true.

= (50 / (50 + 20)) = (50/70) = 0.714

Recall: It is defined as the fraction of positive cases that are correctly identified.

= 50 / (50 + 60) = 50 / 110 = 0.5

F1 Score:

F1 score is defined as the measure of balance between precision and recall.

= 2 * (0.714 *0.5) / (0.714 + 0.5)

= 2 * (0.357 / 1.214) = 2* (0.29406) = 0.58

Therefore,

Accuracy= 0.7 Precision=0.714 Recall=0.5

F1 Score=0.588

Here within the test there is a tradeoff. But Recall is not a good Evaluation metric. Recall

metric needs to improve more.Because,False Positive (impacts Precision): A person is


predicted as high risk but does not haveheart attack.False Negative (impacts Recall): A
person is predicted as low risk but has heart attack.

Therefore, False Negatives miss actual heart patients, hence recall metric need more

improvement. False Negatives are more dangerous than False Positives.

You might also like