Unit – II introduction to AI project cycle
The unit 2 of Artificial Intelligence subject of CBSE is AI project cycle class 9.
This unit has following sub units:
1. Problem Scoping
2. Data Acquisition
3. Data Exploration
4. Modelling
Introduction to Problem Scoping in AI Class 9
In AI project cycle frame work there are certain stages. These stages are:
1. Problem Scoping
2. Data Acquisition
3. Data Modelling
4. Evaluation and Development
Problem Scoping is the first stage of the AI project cycle. In this stage of AI
development, problems will be identified. It is then followed by designing,
developing, or building, and finally testing the project.
In AI project cycle everything will be failed if problem scoping is failed or
without appropriate problem scoping. Incorrect problem scoping also leads
to failure of the project as well.
What is Problem Scoping?
Whenever we are starting any work, certain problems always associated
with the work or process. Actually we are surrounded by problems! These
problems can be small or big, sometimes we ignore them, sometimes we
need an urgent solution otherwise your work will suffer.
The problem scoping refers to the identification of a problem and the
vision to solve it.
1
The 4Ws of Problem Scoping
The 4Ws are very helpful in problem scoping. They are:
1. Who? – Refers that who is facing a problem and who are the
stakeholders of the problem
2. What? – Refers to what is the problem and how you know about the
problem
3. Where? – It is related to the context or situation or location of the
problem
4. Why? – Refers to why we need to solve the problem and what are the
benefits to the stakeholders after solving the problem
The final outcome of problem scoping in ai class 9 is the problem statement
template.
The problem statement template
When the above 4Ws are completely filled you need to prepare a summary
of these 4Ws. This summary is known as the problem statement template.
This template explains all the key points in a single template. So if the same
problem arises in the future this statement helps to resolve it easily.
Activity – Brainstorm around the theme and set a goal for the AI project
In this activity you need to select a theme for problem scoping.
Select the Theme
In CBSE Study Material they have given the following themes for problem
scoping:
2
Reference CBSE Study Material
Students can select any of these or they can choose their own as well.
Now answer the following questions:
1. Why you have selected the theme?
For Example,
1. The environment is your theme. So think about the various problems
such as polluted air, water, and land, etc.
2. Suppose you have selected an Agriculture theme, then there are various
pesticides used in agriculture to increase the productions, sowing and
harvesting problems, etc.
3. Traffic is also one of the themes given in the handbook. Here you can
think about traffic issues and to reduce the accidents or any other related
problem.
Similarly you can take any theme and think about the various problems of
that theme.
Place the problems into a problem statement template
As per the handbook they have given a sun and fill the rays with the
problems found in your theme.
Now list down the problems and topic for your theme.
Set up the goal
After understanding and writing the problems, set your goals, and make
them your AI project target. Write your goals for your selected theme.
Suppose you have selected theme of agriculture then write how AI will help
farmers to solve their problems.
1. Determine what will a good time for seeding?
2. Determine what will be a good time for harvesting?
3. Determine when and how much fertilizer will be applied to the selected
crop?
These goals can be more!
3
Now think and apply the 4Ws strategy for each problem or goal.
Your final problem statement will look likes the following table:
Who Stakeholders
Farmers, Fertilizer Producers, Labours, Tractor Companies
What The problem, Issue, Need
Determine what will a good time for seeding or crop
harvesting?
When Context/Situation
Decide the mature age for the crop and determine its time
Ideal Solution Benefits
Take the crop on time and supply against market demand on
time
Introduction to Data Acquisition AI
Data Acquisition consists of two words:
1. Data : Data refers to the raw facts , figures, or piece of facts, or statistics
collected for reference or analysis.
2. Acquisition: Acquisition refers to acquiring data for the project.
The stage of acquiring data from the relevant sources is known as
data acquisition.
Now you need to understand the classification of data for Data Acquisition
AI Class
4
Classification of Data
Now Observe the following diagram to for the data classification, we will
discuss each of them in detail:
Basic Data
Basically, data is classified into two categories:
1. Numeric Data: Mainly used for computation. Numeric data can be
classified into the following:
Discrete Data: Discrete data only contains integer numeric data. It
doesn’t have any decimal or fractional value. The countable data can
be considered as discrete data. For example 132 customers, 126
Students etc.
Continuous Data: It represents data with any range. The uncountable
data can be represented in this category. For example 10.5 KGS,
100.50 Kms etc.
2. Text Data: mainly used to represent names, collection of words together,
phrases, textual information etc.
5
Structural Classification
The data which is going to be feed in the system to train the model or
already fed in the system can have a specific set of constraints or rules or
unique pattern can be considered as structural data.
The structure classification is divided into 3 categories:
1. Structured Data: As we discussed the structured data can have a specific
pattern or set of rules. These data have a simple structure and stores the
data in specific forms such as tabular form. Example, The cricket
scoreboard, your school time table, Exam datasheet etc.
2. Unstructured Data: The data structure which doesn’t have any specific
pattern or constraints as well as can be stored in any form is known as
unstructured data. Mostly the data that exists in the world is
unstructured data. Example, YouTube Videos, Facebook Photos,
Dashboard data of any reporting tool etc.
3. Semi-Structured Data: It is the combination of both structured and
unstructured data. Some data can have a structure like a database
whereas some data can have markers and tags to identify the structure
of data.
Other Classification
This classification is sub divided into the following branches:
1. Time-Stamped Data: This structure helps the system to predict the next
best action. It is following a specific time-order to define the sequence.
This time can be the time of data captured or processed or collected.
2. Machine Data: The result or output of a specific program, system or
technology considered as machine data. It consists of data related to a
user’s interaction with the system like the user’s logged-in session data,
specific search records, user engagement such as comments, likes and
shares etc.
3. Spatiotemporal Data: The data which contains information related to
geographical location and time is considered as spatiotemporal data. It
records the location through GPS and time-stamped data where the
event is captured or data is collected.
4. Open Data: It is freely available data for everyone. Anyone can reuse this
kind of data.
6
5. Real-time Data: The data which is available with the event is considered
as real-time data.
6. Big Data: You may hear this word most often. The data which cannot be
stored by any system or traditional data collection software like DBMS or
RDBMS software can be considered as Big data. Big data itself a very
deep topic.
Data Features
Data features refer to the type of data you want to collects. Here two terms
are associated with this:
1. Training Data: The collected data through the system is known as
training data. In other words the input given by the user in the system
can be considered as training data.
2. Testing Data: The result data set or processed data is known as testing
data. In other words, the output of the data is known as testing data.
Data Exploration AI
So the first question comes in your mind is What is Data Exploration?
Data Exploration refers to the techniques and tools used to visualize data
through complex statistical methods.
Advantages of Data Visualization
A better understanding of data
Provides insights into data
Allows user interaction
Provide real-time analysis
Help to make decisions
Reduces complexity of data
Provides the relationships and patterns contained within data
Define a strategy for your data model
Provides an effective way of communication among users
Till now you learned about problem scoping and data acquisition. Now you
have set your goal for your AI project and found ways to acquire data.
When you acquired data the main problem with data is – the data is very
7
complex. Because it’s having numbers. To make use of these numbers user
need a specific pattern to understand the data.
For example if you are going to reading a book. You went to library and
selected a book. The first thing you try to do is, just turning the pages and
take a review and then select a book of your choice. Similarly, when you are
working with data or going to analyze data you need to use data
visualization.
Data Visualization Tools
There are many data visualization tools available. In next section of Data
Exploration AI Class 9 we will discuss about them.
Here I made a list of 20 data visualization tools for you. Although there are
many more tools available and these numbers increasing day by day.
1. Microsoft Excel
2. Tableau
3. Qlikview
4. FusionCharts
5. DataWrapper
6. MS Power BI
7. Google Data Studio
8. Sisense
9. HiCharts
10. Xplenty
11. HubSpot
12. Whatagraph
13. Adaptive Discovery
14. Teammate Analytics
15. Jupyter
16. Dundas BI
17. Infogram
18. Google Charts
19. Visme
20. Domo
Do a small research and learn how to visualize your data with above tools.
8
How to select a proper graph for data visualization
Now you are familiar with various chart types. Now the next step is to select
an appropriate chart for data visualization. The selection of chart all
depends on the data and the goal you are going to achieve through your
model. Although some basic purposes of charts that let you select an
appropriate chart, they are as follows:
1. Comparison of Values – Show periodical changes i.e. Bar Chart
2. Comparison of Trends – Show changes over a period of time i.e. Line
Chart
3. Distribution of Data according to categories – Show data according to
category i.e. Histogram
4. Highlight a portion of a whole – Highlight data according to value i.e. Pie
Chart
5. Show the relationship between data – Multiple charts can be used
Activities
Activity 1 – MS Excel
Open MS Excel
Prepare data of results
Prepare 5 different types of charts and make a comprehensive report
with these points
Name of the chart
Description of the chart
How to draw it
Suitable for which type of data
Activity 2 – Sketchy Graphs
Materials required – Chart Paper, Sketch-pens, Ruler, Basic Stationary
In this activity, you have to make a graph on chart paper. You can select any
chart to plot the data and draw them. Ensure that you are able to relate this
graph to the goal of your project and describe the trends or patterns you
have witnessed in your chart.
9
AI project Cycle modelling
So as in previous article data exploration, we have seen how we can
represent data in graphics using various tools. This graphical representation
makes data easy to understand for the humans to take a decision or
prediction. But when it comes to machine to access and analyze data,
machine requires mathematical representation of data. Hence every model
needs a mathematical approach to analyze data.
AI modelling approaches
Basically there are two approaches broadly taken by researchers for AI
modelling. They are:
1. Rule-Based Approach
2. Learning-Based Approach
3. Decision Tree
Let us begin with rule based approach for AI project Cycle modelling
Rule Based
A Rule-based approach is generally based on the data and rules fed to the
machine, where the machine reacts accordingly to deliver the desired
output.
In other words, rule-based learning follows the relationship or patterns in
data defined by the developer. The machine follows the instructions or
rules mentioned by the developer and performs the tasks accordingly. It
uses coding to make a successful model.
Consider the following scenarios and try to understand the rule-based
approach for AI project Cycle modelling class 9:
Suppose you have data of 100 employees and 100 businessmen. The
following steps you need to follow to train your machine:
1. Input your data and label them accordingly for employees and
businessman.
10
2. Now if the data is related to employee, the machine will compare its
rules defined by you as employee and label it as employee and this way
it will identify the data of employee.
3. Similarly it will follow the rules for businessman as well.
Here in the machine, you need to feed some of the characteristics of
employees like earning money and provide service whereas businessman
investing money and provide service to train the machine.
In CBSE curriculum handbook they have following example for rule-based
approach.
Suppose you have a dataset comprising of 100 images of apples and 100
images of bananas. To train your machine, you feed this data into the
machine and label each image as either apple or banana. Now if you test
the machine with the image of an apple, it will compare the image with the
trained data and according to the labels of trained images, it will identify
the test image as an apple. This is known as Rule-based approach. The rules
given to the machine in this example are the labels given to the machine
for each image in the training dataset. Observe the following image:
11
Learning Based
The machine is fed with data and the desired output to which the machine
designs its own algorithm (or set of rules) to match the data to the desired
output fed into the machine to train.
In the learning-based approach, the relationship or pattern in data is not
defined by the developer. This approach takes random data which is fed
into the machine and it is left to the machine to figure out the patterns or
required trends.
In general this approach is useful when the data is not labelled and random
for a human to use them.
Thus, the machine looks at the data, tries to extract similar features out of it
and clusters the same datasets together. In the end as output, the machine
tells us about the trends which are observed in the training data.
This approach is used to train the data which is unpredictable or the users
have no idea about it. Let us take a look at the example given in your
curriculum handbook for AI project Cycle modelling class 9.
For example, suppose you have a dataset of 1000 images of random stray
dogs of your area. Now you do not have any clue as to what trend is being
followed in this dataset as you don’t know their breed, or colour or any
other feature. Thus, you would put this into a learning approach based AI
machine and the machine would come up with various patterns it has
observed in the features of these 1000 images. It might cluster the data on
the basis of colour, size, fur style, etc. It might also come up with some very
unusual clustering algorithm which you might not have even thought of!
12
Decision Tree
The decision tree is one of the most common and basic models in data
science. It follows a tree like structure of the decisions with all possible
results. It is similar like rule-based approach.
The decision tree is made up of various node. It follows top to bottom
approach. The top most node of the decision tree is known as root. Then it
continues till the down to the terminal node or leaf node. All these nodes
are connected with each other by arrow lines. So let us talk about the
common terms associated with decision tree.
Common Terms
1. Root Node: We have already seen this in the above paragraph.
2. Splitting: Splitting is a process by which a node is divided into two or
more sub-nodes.
3. Decision or interior node: It is the node where the splitting takes place.
In other words, it is a place where the sub-node is divided into another
sub-nodes.
4. Leaf node or terminal node: We have already seen this.
5. Branch or Subtree: A subsection of the decision tree is known as a
branch or subtree.
6. Parent node and child node: The bottom node which derives from the
top node is known as child node whereas the top node is known as the
parent node.
Different Parts of Decision Tree
The decision tree is made up of various nodes. These nodes are the parts of
a decision tree. They are as follows:
1. Decision Nodes: It represents a decision , typically shown with square
2. Chance Nodes: It represents probability or uncertainty, shown in circle
3. End Nodes: It represents the result or final outcome, shown in triangle
How to make a decision tree in 4 easy steps
As you know the decision tree is an example of a rule-based approach. The
structure of decision starts with the root node and ends with leaves by
13
connecting branches having different conditions. So following things you
have to keep in mind before making the decision tree:
1. Observe your data carefully.
2. Decide what (data) will be your root
3. Decide what (data) will be your leaves
4. Now analyze the data properly and find out some unnecessary data
Observe the following picture given, it is a decision tree given CBSE study
material. It is very important for Class 9 AI How to make a decision tree
topic:
In the above picture, one decision tree is given. Here the decision is
something which is related to our daily activity. The first condition is about
you are hungry or not? If yes then if you have $25 you can go to the
restaurant and if no then you can buy a burger or if you are not hungry
then you can go to sleep. So the top question or condition here is – Am I
Hungry? will be considered as root and Yes or no will be branches. The final
decision like go to sleep, go to a restaurant and buy burger are leaves.
Have I $25 is the interior node.
Based on this decision tree two questions are given in the curriculum
handbook and here I will provide you the answers:
1. How many branches does the tree shown above have? – 2
2. How many leaves does the tree shown above have? – 3
14
Points to remember
While making a decision tree, remember following points:
1. Give a good look to your dataset
2. Try to figure out the pattern of your output leaf
3. Select any one output and find out the common links for similar output
4. Note the the parameters for redundant data from the dataset
5. Choose the simple dataset for your decision tree
Now let we see one more example given in the CBSE study material as
given dataset:
The following is a dataset comprising of 4 parameters which lead to the
prediction of whether
an Elephant would be spotted or not. The parameters which affect the
prediction are:
Outlook, Temperature, Humidity and Wind. Draw a Decision Tree for this
dataset.
15
Common decisions are as following:
If outlook = Sunny and Humidity = High, then Elephant Spotted = No
If outlook = Sunny and Humidity = Normal, then Elephant Spotted = Yes
If outlook = overcast, then Elephant Spotted = Yes
If outlook = Rain and wind= Strong, then Elephant Spotted = No
If outlook = Rain and wind = weak, then Elephant Spotted = Yes
16