SCET Ownership Course: DATA SCIENCE
UNIT - III
AI Vs. ML
2
Artificial Intelligence
What is AI ?
John McCarthy
Artificial Intelligence is concerned with the design of
intelligence in an artificial device.
The term was coined by McCarthy in 1956.
There are two ideas in the definition.
1. Intelligence
2. Artificial device
The Turing Test
(Can Machine think? A. M. Turing, 1950)
• Requires:
– Natural language Processing
– Knowledge representation
– Automated reasoning
– Machine learning
– (vision, robotics) for full test
Agents
An agent is any thing that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators/ effectors
Ex: Human Being , Calculator etc
Agent has goal –the objective which agent has to satisfy
Actions can potentially change the environment
Agent perceive current percept or sequence of perceptions
Agent:Sensors, Actuators,
Environment
Agents
• An agent is anything that can be viewed as perceiving its environment through
sensors and acting upon that environment through actuators/ effectors
• Human agent: eyes, ears, and other organs used as sensors;
• hands, legs, mouth, and other body parts used as actuators/ Effector
• Robotic agent:
– Sensors:- cameras (picture Analysis) and infrared range finders for sensors, Solar Sensor.
– Actuators- various motors, speakers, Wheels
• Software Agent(Soft Bot)
– Functions as sensors
– Functions as actuators
– Ex. [Link], [Link]
Example: Vacuum Cleaner Agent
i Percepts: location and contents, e.g., [A, Dirty]
i Actions: Left, Right, Suck, NoOp
8/7/2021 Unit -1 Introduction
8
AI Vs. ML
9
INTRODUCTION
10
How a computer works?
Cntd..
To solve a problem on a computer, we need an algorithm.
An algorithm is a sequence of instructions that should be carried out to transform the input to output.
Ex. Sorting Input :set of numbers , output : ordered list
For some tasks ,however, we do not have an algorithm, We have machine learning intelligence
Ex. tell spam emails from legitimate emails
Input :Email document (file of characters), output : yes/no output indicating whether the message is spam or not
Computer (machine) to extract automatically the algorithm for this task.
Cntd..
(source: [Link]
• Machine learning is a “Field of study that gives computers the ability to learn without
being explicitly programmed.”
• In other words it is concerned with the question of how to construct computer programs
that automatically improve with the experience. - According to Arthur Samuel(1959)
13
Cntd..
• A computer program is said to learn from experience ‘E’ with respect to some
class of task ‘T’ and performance measure ‘P’ if its performance at task in ‘T’
as measured by ‘P’ improves with experience ‘E’ – Tom M Mitchell
• Machine learning is an application of artificial intelligence (AI) that provides
systems the ability to automatically learn and improve from experience
without being explicitly programmed.
• Machine learning focuses on the development of computer programs that can
access data and use it learn for themselves.
14
Cntd..
Example 1
Classify Email as spam or not spam
• Task (T): Classify email as spam or not spam
• Experience(E): watching the user to mark/label the email as spam or
not spam
• Performance (P): The number or fraction of email to be correctly
classified as spam or not spam
15
Cntd..
Example 2
Recognizing hand written digits/ characters
• Task(T): Recognizing hand written digit
• Experience (E): watching the user to mark/ label the hand written digit
to 10 classes(0-9) & identify underling pattern
• Performance(P):The number of fractions of hand-written digits
correctly classified
16
Why Machine Learning Important?.
• Human expertise does not exist
Navigating on Mars
industrial/manufacturing control
mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise OR Some tasks cannot be defined well, except by
examples
face/handwriting/speech recognition/ recognizing people
driving a car, flying a plane
• Relationships and correlations can be hidden within large amounts of data
(e.g., stock market analysis)
• Environments change over time.
(e.g., routing on a computer network)
17
Cntd..
• The amount of knowledge available about certain tasks might be too large for explicit encoding by
humans
(e.g., medical diagnostic).
• New knowledge about tasks is constantly being discovered by humans. It may be difficult to
continuously re-design systems “by hand”.
• Rapidly changing phenomena
credit scoring, financial modeling
diagnosis, fraud detection
• Need for customization/personalization
personalized news reader
movie/book recommendation
18
How does machine learning help us in daily life?
Social networking :
• Use of the appropriate emotions,
suggestions about friend tags on
Facebook, filtered on Instagram, content
recommendations and suggested
followers on social media platforms,
etc., are examples of how machine
learning helps us in social networking.
19
How does machine learning help us in daily life?
Personal finance and
banking solutions
• Whether it’s fraud
prevention, credit decisions,
or checking deposits on our
smartphones machine
learning does it all.
20
How does machine learning help us in daily life?
Commute estimation
• Identification of the route
to our selected destination,
estimation of the time
required to reach that
destination using different
transportation modes,
calculating traffic time,
and so on are all made by
21
machine learning.
Applications of Machine
• Face detection Learning Speech recognition
• Stock prediction Hand-written digit recognition
• Spam Email Detection Computational Biology
• Machine Translation Recommender Systems
• Self-parking Cars Guiding robots
• Airplane Navigation Systems Space Exploration
• Medicine Supermarket Chain
• Data Mining
22
Examples…
Example 1: hand-written digit recognition: Output
Learn a classifier f(x) such that, f : x → {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}
Input training data : e.g. 500 samples
Example 2: Face detection
Input : is an image , the classes are people to be recognized [non-face , frontal-face , profile-face] and the learning
program should learn to associate the face images to identities.
This problem is more difficult because there are more classes, input image is larger, and a face is 3-dimensional and
differences in pose and lighting cause significant changes in the image. There may also be occlusion ( blockage )of
certain inputs; e.g. glasses may hide the eyes and eyebrows, and a beard may hide the chin.
Example 3: Spam detection
Watch Fest <info@[Link]> Ma
y
22
to me
•This is a classification problem
• Task is to classify email into spam/non-spam
•Requires a learning system as “enemy” keeps innovating
Why is this message in Spam? It's similar to messages that were detected by our spam filters. Learn more
Example 4: Stock price prediction
• Task is to predict stock price at future date
• This is a regression task, as the output is continuous
Example : Weather prediction
Example : Medical Diagnosis
❖ Inputs are the relevant information about the patient and the classes are the illnesses.
❖ The inputs contain the patient’s age, gender, past medical history, and current symptoms.
❖ Some tests may not have been applied to the patient, and thus these inputs would be missing.
❖ Tests take time, may be costly, and may inconvience the patient so we do not want to apply
them unless we believe that they will give us valuable information.
❖ In the case of a medical diagnosis, a wrong decision may lead to a wrong or no treatment, and
in cases of doubt it is preferable that the classifier reject and defer decision to a human expert.
Example : Agriculture
A Crop Yield Prediction App in Senegal Using Satellite Imagery (Video Link)
[Link]
.
Types of Learning
• Supervised learning
Types of Learning
Supervised Learning :
Aim is to learn a mapping from the input to an output whose correct values are provided
by a supervisor.
1. Classification : Data is labelled …meaning it is assigned a class,
for example spam/non-spam or fraud/non-fraud
e.g. for financial institution ..input to classifier is savings and income and output is
one the class like high risk or low risk based on following classification rule
❑ if income > δ1 and saving δ2 then low risk else high risk
2. Regression : Data is labelled with a real value rather then a label.
e.g. price of a stock over time.
e.g predict the price of used car ….
Input : brand, year, engine capacity, mileage & other information …
output: Price of car
Types of Learning
Types of Learning
Types of Learning
Supervised Learning :
Types of Learning
Supervised Learning :
Types of Learning
Supervised LEARNING
40
Supervised LEARNING
41
Unsupervised Learning
• Unsupervised learning
Example of Unsupervised learning
• Clustering
• Association
43
Example of Unsupervised learning
• Clustering
• Association
44
Example of Unsupervised learning
45
Example of Semi-supervised learning
46
Reinforcement Learning
• learning from mistakes
• Place a reinforcement learning algorithm into any environment and
it will make a lot of mistakes in the beginning
• As we provide some sort of signal to the algorithm that associates
good behaviors with a positive signal and bad behaviors with a
negative one
• we can reinforce our algorithm to prefer good behaviors over
bad ones.
• Over time, our learning algorithm learns to make less mistakes
than it used to.
Reinforcement Learning
Reinforcement Learning
Where is reinforcement learning in the real world?
• Video Games
• Industrial Simulation:
• Resource Management
50
Key Elements of Machine Learning
• There are tens of thousands of machine learning algorithms and hundreds
of new algorithms are developed every year.
• Every machine learning algorithm has three components:
1. Representation: how to represent knowledge.
Examples include decision trees, sets of rules, instances, graphical models,
neural networks, support vector machines, model ensembles and others.
2. Evaluation: the way to evaluate candidate programs (hypotheses).
Examples include accuracy, prediction and recall, squared error, likelihood,
posterior probability, cost, margin, entropy k-L divergence and others.
51
3. Optimization: the way candidate programs are generated known as the
search process.
For example combinatorial optimization, convex optimization,
constrained optimization.
• All machine learning algorithms are combinations of these three
components.
• A framework for understanding all algorithms.
52
Aspects of developing a learning system:
training data, concept representation, function approximation
• For training and testing purpose of our model we need to split the dataset in
to three distinct dataset, training set, validation set and testing set
• Training set:-
• A set of data used to train the model
• It is used to fit the model
• The model sees and learn from this data
• Later on the trained model can be deployed and used to accurately predict
on new data that it has not seen before
• Labeled data is used
53
Validation set
• Validation set is the set of data separate from the training data
• It is used to validate our model during training
• It gives information which is used for tuning model hyper parameter
• It ensures that our model is not over fitting to the data in the training
set
• Labeled data is used
54
Test Set
• A set of data use to test the model
• The test set is separated from both the train set and validation set
• Once the model is train and validated using the training data and
validation sets then the model is used to predict the output for the
data in the test set
• Unlabeled data is used
55
Data Split
Train Validation Test
• Rules for performing data split operation
• In order to avoid a correlation between the original dataset must be
randomly shuffled before applying the split phase
• All the split must represent the original distribution
• The percentage of splitting is mostly 60% for training 20% for
validation and 20% for testing
• With scikit-learn this can be done using train_test_split() function
56
Data Preparation
Data Preparation Pipeline
Data Preparation
Data Preparation
Why is Data Preparation important?
sometimes, data in data sets have missing or incomplete information, which leads to less
accurate or incorrect predictions.
Further, sometimes data sets are clean but not adequately shaped, such as aggregated or
pivoted, and some have less business context.
Hence, after collecting data from various data sources, data preparation needs to
transform raw data.
Significant advantages of data preparation in machine learning as follows:
• It helps to provide reliable prediction outcomes in various analytics operations.
• It helps identify data issues or errors and significantly reduces the chances of errors.
• It increases decision-making capability.
• It reduces overall project cost (data management and analytic cost).
• It helps to remove duplicate content to make it worthwhile for different applications.
• It increases model performance.
Steps in Data Preparation Process
[Link] the problem:
• understand the actual problem and try to solve it.
[Link] collection:
• collect data from various potential sources. These data sources may be either within
enterprise or third parties vendors.
• Data collection is beneficial to reduce and mitigate biasing in the ML model;
• hence before collecting data, always analyze it and also ensure that the data set was
collected from diverse people, geographical areas, and perspectives.
[Link] and Data Exploration:
• explore data such as trends, outliers, exceptions, incorrect, inconsistent, missing, or
skewed information, etc.
• Data exploration helps to determine problems such as collinearity, which means a
situation when the Standardization of data sets and other data transformations are
necessary.
Steps in Data Preparation Process
[Link] Cleaning and Validation:
• Data cleaning and validation techniques help determine and solve inconsistencies, outliers,
anomalies, incomplete data, etc.
• Clean data helps to find valuable patterns and information in data and ignores irrelevant data
in the datasets.
[Link] Formatting:
• After cleaning and validating data, the following approach is to ensure that the data is
correctly formatted or not.
Steps in Data Preparation Process
[Link] engineering and selection:
• Feature engineering is defined as the study of selecting, manipulating, and transforming raw
data into valuable features
There are various feature engineering techniques used in machine learning as follows:
Imputation:
• Feature imputation is the technique to fill incomplete fields in the datasets.
• It is essential because most machine learning models don't work when there are missing data
in the dataset.
• Although, the missing values problem can be reduced by using techniques such as single
value imputation, multiple value imputation, K-Nearest neighbor, deleting the row, etc.
Encoding:
• Feature encoding is defined as the method to convert string values into numeric form.
• This is important as all ML models require all values in numeric format.
• Feature encoding includes label encoding and One Hot Encoding (also known as
get_dummies).
Data Pre-Processing
[Link] cleaning Data preprocessing
[Link] integration
[Link] transformation
[Link] reduction
[Link] Discretization
Cntd..
• Data preparation is also known as data "pre-processing," "data wrangling,"
"data cleaning," "data pre-processing," and "feature engineering."
• It is the later stage of the machine learning lifecycle, which comes after data
collection..
The data preparation process can be complicated by issues such as:
1. Missing or incomplete records: Missing data sometimes appears as empty
cells, values (e.g., NULL or N/A), or a particular character, such as a question
mark
65
Cntd..
66
Cntd..
67
Cntd..
68
Cntd..
69
Cntd..
70
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.
71
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.
72
Cntd..
3. Unstructured data format :
• Data comes from various sources and needs to be extracted into a different
format.
• Hence, before deploying an ML project, always consult with domain experts or
import data from known sources.
4. Limited or sparse features / attributes :
• Whenever data comes from a single source, it contains limited features,
• so it is necessary to import data from various sources for feature enrichment
or build multiple features in datasets.
5. Understanding feature engineering:
• Features engineering helps develop additional content in the ML models,
increasing model performance and accuracy in predictions. 73
Cntd..
74
Cntd..
75
Cntd..
76
Cntd..
77
Cntd..
78
Cntd..
79
Cntd..
80