0% found this document useful (0 votes)
26 views80 pages

Unit 3

Uploaded by

Kranium A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views80 pages

Unit 3

Uploaded by

Kranium A
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

SCET Ownership Course: DATA SCIENCE

UNIT - III
AI Vs. ML

2
Artificial Intelligence

What is AI ?
John McCarthy
Artificial Intelligence is concerned with the design of
intelligence in an artificial device.

The term was coined by McCarthy in 1956.

There are two ideas in the definition.

1. Intelligence
2. Artificial device
The Turing Test
(Can Machine think? A. M. Turing, 1950)

• Requires:
– Natural language Processing
– Knowledge representation
– Automated reasoning
– Machine learning
– (vision, robotics) for full test
Agents

An agent is any thing that can be viewed as perceiving its environment through sensors
and acting upon that environment through actuators/ effectors

Ex: Human Being , Calculator etc

Agent has goal –the objective which agent has to satisfy

Actions can potentially change the environment

Agent perceive current percept or sequence of perceptions


Agent:Sensors, Actuators,
Environment
Agents

• An agent is anything that can be viewed as perceiving its environment through


sensors and acting upon that environment through actuators/ effectors

• Human agent: eyes, ears, and other organs used as sensors;


• hands, legs, mouth, and other body parts used as actuators/ Effector

• Robotic agent:
– Sensors:- cameras (picture Analysis) and infrared range finders for sensors, Solar Sensor.
– Actuators- various motors, speakers, Wheels

• Software Agent(Soft Bot)


– Functions as sensors
– Functions as actuators
– Ex. [Link], [Link]
Example: Vacuum Cleaner Agent

i Percepts: location and contents, e.g., [A, Dirty]


i Actions: Left, Right, Suck, NoOp

8/7/2021 Unit -1 Introduction


8
AI Vs. ML

9
INTRODUCTION

10
How a computer works?
Cntd..

To solve a problem on a computer, we need an algorithm.


An algorithm is a sequence of instructions that should be carried out to transform the input to output.
Ex. Sorting Input :set of numbers , output : ordered list

For some tasks ,however, we do not have an algorithm, We have machine learning intelligence
Ex. tell spam emails from legitimate emails
Input :Email document (file of characters), output : yes/no output indicating whether the message is spam or not
Computer (machine) to extract automatically the algorithm for this task.
Cntd..

(source: [Link]

• Machine learning is a “Field of study that gives computers the ability to learn without
being explicitly programmed.”
• In other words it is concerned with the question of how to construct computer programs
that automatically improve with the experience. - According to Arthur Samuel(1959)
13
Cntd..
• A computer program is said to learn from experience ‘E’ with respect to some
class of task ‘T’ and performance measure ‘P’ if its performance at task in ‘T’
as measured by ‘P’ improves with experience ‘E’ – Tom M Mitchell

• Machine learning is an application of artificial intelligence (AI) that provides


systems the ability to automatically learn and improve from experience
without being explicitly programmed.

• Machine learning focuses on the development of computer programs that can


access data and use it learn for themselves.

14
Cntd..
Example 1
Classify Email as spam or not spam
• Task (T): Classify email as spam or not spam
• Experience(E): watching the user to mark/label the email as spam or
not spam
• Performance (P): The number or fraction of email to be correctly
classified as spam or not spam

15
Cntd..
Example 2
Recognizing hand written digits/ characters
• Task(T): Recognizing hand written digit
• Experience (E): watching the user to mark/ label the hand written digit
to 10 classes(0-9) & identify underling pattern
• Performance(P):The number of fractions of hand-written digits
correctly classified

16
Why Machine Learning Important?.
• Human expertise does not exist
Navigating on Mars
industrial/manufacturing control
mass spectrometer analysis, drug design, astronomic discovery
• Black-box human expertise OR Some tasks cannot be defined well, except by
examples
face/handwriting/speech recognition/ recognizing people
driving a car, flying a plane

• Relationships and correlations can be hidden within large amounts of data


(e.g., stock market analysis)
• Environments change over time.
(e.g., routing on a computer network)
17
Cntd..
• The amount of knowledge available about certain tasks might be too large for explicit encoding by
humans
(e.g., medical diagnostic).
• New knowledge about tasks is constantly being discovered by humans. It may be difficult to
continuously re-design systems “by hand”.
• Rapidly changing phenomena
credit scoring, financial modeling
diagnosis, fraud detection
• Need for customization/personalization
personalized news reader
movie/book recommendation

18
How does machine learning help us in daily life?
Social networking :
• Use of the appropriate emotions,

suggestions about friend tags on

Facebook, filtered on Instagram, content

recommendations and suggested

followers on social media platforms,

etc., are examples of how machine

learning helps us in social networking.

19
How does machine learning help us in daily life?
Personal finance and
banking solutions

• Whether it’s fraud

prevention, credit decisions,

or checking deposits on our

smartphones machine

learning does it all.

20
How does machine learning help us in daily life?
Commute estimation

• Identification of the route

to our selected destination,

estimation of the time

required to reach that

destination using different

transportation modes,

calculating traffic time,

and so on are all made by


21
machine learning.
Applications of Machine
• Face detection Learning Speech recognition
• Stock prediction Hand-written digit recognition
• Spam Email Detection Computational Biology
• Machine Translation Recommender Systems
• Self-parking Cars Guiding robots
• Airplane Navigation Systems Space Exploration
• Medicine Supermarket Chain
• Data Mining

22
Examples…

Example 1: hand-written digit recognition: Output

Learn a classifier f(x) such that, f : x → {0, 1, 2, 3, 4, 5, 6, 7, 8, 9}

Input training data : e.g. 500 samples


Example 2: Face detection

Input : is an image , the classes are people to be recognized [non-face , frontal-face , profile-face] and the learning
program should learn to associate the face images to identities.

This problem is more difficult because there are more classes, input image is larger, and a face is 3-dimensional and
differences in pose and lighting cause significant changes in the image. There may also be occlusion ( blockage )of
certain inputs; e.g. glasses may hide the eyes and eyebrows, and a beard may hide the chin.
Example 3: Spam detection
Watch Fest <info@[Link]> Ma
y
22

to me

•This is a classification problem


• Task is to classify email into spam/non-spam
•Requires a learning system as “enemy” keeps innovating

Why is this message in Spam? It's similar to messages that were detected by our spam filters. Learn more
Example 4: Stock price prediction

• Task is to predict stock price at future date


• This is a regression task, as the output is continuous
Example : Weather prediction
Example : Medical Diagnosis
❖ Inputs are the relevant information about the patient and the classes are the illnesses.
❖ The inputs contain the patient’s age, gender, past medical history, and current symptoms.
❖ Some tests may not have been applied to the patient, and thus these inputs would be missing.
❖ Tests take time, may be costly, and may inconvience the patient so we do not want to apply
them unless we believe that they will give us valuable information.
❖ In the case of a medical diagnosis, a wrong decision may lead to a wrong or no treatment, and
in cases of doubt it is preferable that the classifier reject and defer decision to a human expert.
Example : Agriculture

A Crop Yield Prediction App in Senegal Using Satellite Imagery (Video Link)
[Link]
.
Types of Learning
• Supervised learning
Types of Learning
Supervised Learning :
Aim is to learn a mapping from the input to an output whose correct values are provided
by a supervisor.
1. Classification : Data is labelled …meaning it is assigned a class,
for example spam/non-spam or fraud/non-fraud
e.g. for financial institution ..input to classifier is savings and income and output is
one the class like high risk or low risk based on following classification rule
❑ if income > δ1 and saving δ2 then low risk else high risk
2. Regression : Data is labelled with a real value rather then a label.
e.g. price of a stock over time.
e.g predict the price of used car ….
Input : brand, year, engine capacity, mileage & other information …
output: Price of car
Types of Learning
Types of Learning
Types of Learning
Supervised Learning :
Types of Learning
Supervised Learning :
Types of Learning
Supervised LEARNING

40
Supervised LEARNING

41
Unsupervised Learning
• Unsupervised learning
Example of Unsupervised learning
• Clustering
• Association

43
Example of Unsupervised learning
• Clustering
• Association

44
Example of Unsupervised learning

45
Example of Semi-supervised learning

46
Reinforcement Learning
• learning from mistakes
• Place a reinforcement learning algorithm into any environment and
it will make a lot of mistakes in the beginning
• As we provide some sort of signal to the algorithm that associates
good behaviors with a positive signal and bad behaviors with a
negative one
• we can reinforce our algorithm to prefer good behaviors over
bad ones.
• Over time, our learning algorithm learns to make less mistakes
than it used to.
Reinforcement Learning
Reinforcement Learning
Where is reinforcement learning in the real world?
• Video Games
• Industrial Simulation:
• Resource Management
50
Key Elements of Machine Learning

• There are tens of thousands of machine learning algorithms and hundreds


of new algorithms are developed every year.
• Every machine learning algorithm has three components:
1. Representation: how to represent knowledge.
Examples include decision trees, sets of rules, instances, graphical models,
neural networks, support vector machines, model ensembles and others.
2. Evaluation: the way to evaluate candidate programs (hypotheses).
Examples include accuracy, prediction and recall, squared error, likelihood,
posterior probability, cost, margin, entropy k-L divergence and others.

51
3. Optimization: the way candidate programs are generated known as the
search process.
For example combinatorial optimization, convex optimization,
constrained optimization.
• All machine learning algorithms are combinations of these three
components.
• A framework for understanding all algorithms.

52
Aspects of developing a learning system:
training data, concept representation, function approximation
• For training and testing purpose of our model we need to split the dataset in
to three distinct dataset, training set, validation set and testing set
• Training set:-
• A set of data used to train the model
• It is used to fit the model
• The model sees and learn from this data
• Later on the trained model can be deployed and used to accurately predict
on new data that it has not seen before
• Labeled data is used
53
Validation set

• Validation set is the set of data separate from the training data
• It is used to validate our model during training
• It gives information which is used for tuning model hyper parameter
• It ensures that our model is not over fitting to the data in the training
set
• Labeled data is used

54
Test Set
• A set of data use to test the model
• The test set is separated from both the train set and validation set
• Once the model is train and validated using the training data and
validation sets then the model is used to predict the output for the
data in the test set
• Unlabeled data is used

55
Data Split

Train Validation Test

• Rules for performing data split operation


• In order to avoid a correlation between the original dataset must be
randomly shuffled before applying the split phase
• All the split must represent the original distribution
• The percentage of splitting is mostly 60% for training 20% for
validation and 20% for testing
• With scikit-learn this can be done using train_test_split() function
56
Data Preparation

Data Preparation Pipeline


Data Preparation
Data Preparation
Why is Data Preparation important?

sometimes, data in data sets have missing or incomplete information, which leads to less
accurate or incorrect predictions.
Further, sometimes data sets are clean but not adequately shaped, such as aggregated or
pivoted, and some have less business context.
Hence, after collecting data from various data sources, data preparation needs to
transform raw data.
Significant advantages of data preparation in machine learning as follows:
• It helps to provide reliable prediction outcomes in various analytics operations.
• It helps identify data issues or errors and significantly reduces the chances of errors.
• It increases decision-making capability.
• It reduces overall project cost (data management and analytic cost).
• It helps to remove duplicate content to make it worthwhile for different applications.
• It increases model performance.
Steps in Data Preparation Process

[Link] the problem:


• understand the actual problem and try to solve it.
[Link] collection:
• collect data from various potential sources. These data sources may be either within
enterprise or third parties vendors.
• Data collection is beneficial to reduce and mitigate biasing in the ML model;
• hence before collecting data, always analyze it and also ensure that the data set was
collected from diverse people, geographical areas, and perspectives.
[Link] and Data Exploration:
• explore data such as trends, outliers, exceptions, incorrect, inconsistent, missing, or
skewed information, etc.
• Data exploration helps to determine problems such as collinearity, which means a
situation when the Standardization of data sets and other data transformations are
necessary.
Steps in Data Preparation Process

[Link] Cleaning and Validation:


• Data cleaning and validation techniques help determine and solve inconsistencies, outliers,
anomalies, incomplete data, etc.
• Clean data helps to find valuable patterns and information in data and ignores irrelevant data
in the datasets.
[Link] Formatting:
• After cleaning and validating data, the following approach is to ensure that the data is
correctly formatted or not.
Steps in Data Preparation Process

[Link] engineering and selection:


• Feature engineering is defined as the study of selecting, manipulating, and transforming raw
data into valuable features
There are various feature engineering techniques used in machine learning as follows:
Imputation:
• Feature imputation is the technique to fill incomplete fields in the datasets.
• It is essential because most machine learning models don't work when there are missing data
in the dataset.
• Although, the missing values problem can be reduced by using techniques such as single
value imputation, multiple value imputation, K-Nearest neighbor, deleting the row, etc.
Encoding:
• Feature encoding is defined as the method to convert string values into numeric form.
• This is important as all ML models require all values in numeric format.
• Feature encoding includes label encoding and One Hot Encoding (also known as
get_dummies).
Data Pre-Processing

[Link] cleaning Data preprocessing


[Link] integration
[Link] transformation
[Link] reduction
[Link] Discretization
Cntd..
• Data preparation is also known as data "pre-processing," "data wrangling,"
"data cleaning," "data pre-processing," and "feature engineering."
• It is the later stage of the machine learning lifecycle, which comes after data
collection..
The data preparation process can be complicated by issues such as:
1. Missing or incomplete records: Missing data sometimes appears as empty
cells, values (e.g., NULL or N/A), or a particular character, such as a question
mark

65
Cntd..

66
Cntd..

67
Cntd..

68
Cntd..

69
Cntd..

70
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.

71
Cntd..
2. Outliers or anomalies: Unexpected values
• ML algorithms are sensitive to the range and distribution of values when data
comes from unknown sources.
• These values can spoil the entire machine learning training system and the
performance of the model.
• Hence, it is essential to detect these outliers or anomalies through techniques
such as visualization technique.

72
Cntd..
3. Unstructured data format :
• Data comes from various sources and needs to be extracted into a different
format.
• Hence, before deploying an ML project, always consult with domain experts or
import data from known sources.
4. Limited or sparse features / attributes :
• Whenever data comes from a single source, it contains limited features,
• so it is necessary to import data from various sources for feature enrichment
or build multiple features in datasets.
5. Understanding feature engineering:
• Features engineering helps develop additional content in the ML models,
increasing model performance and accuracy in predictions. 73
Cntd..

74
Cntd..

75
Cntd..

76
Cntd..

77
Cntd..

78
Cntd..

79
Cntd..

80

Common questions

Powered by AI

Data preparation is critical because it directly affects the accuracy and reliability of machine learning models. Proper data preparation involves cleaning, integration, transformation, and feature engineering, which ensures that data quality is high, irrelevant features are minimized, and essential patterns are emphasized. This leads to better model training, reducing errors and improving performance. Failing to prepare data correctly can result in models learning incorrect patterns or making faulty predictions due to underlying data issues such as missing values or outliers .

Machine learning can help detect and adapt to changes over time by continuously updating models with new data, allowing them to adjust their predictions or decisions in dynamic environments. In computer network routing, algorithms can optimize paths based on real-time traffic conditions, improving efficiency and response times. In stock market analysis, machine learning models can identify emerging trends or patterns from large volumes of data, aiding in better forecasting and decision-making. These adaptive capabilities make machine learning invaluable for environments requiring constant updates and flexibility to maintain optimal performance .

Face detection tasks in machine learning are challenging due to the complexity and variability of images, including differences in lighting, pose, and potential occlusion (e.g., glasses obstructing eyes or a beard covering the chin). These factors lead to a higher number of classes and larger input sizes, increasing the computational complexity and the potential for errors. The variations introduce difficulties in creating a robust and generalizable model, as the system needs to distinguish between non-face, frontal-face, and profile-face categories despite these challenges .

Supervised learning involves using labeled data to train models, aiming to learn a mapping from inputs to known outputs, making it suitable for classification and regression tasks. Unsupervised learning, in contrast, works with unlabeled data, attempting to find inherent structures or patterns through methods like clustering and association. These differences affect applications by defining the types of problems each can address; supervised learning is apt for tasks with clear outcomes while unsupervised learning is beneficial for exploratory data analysis where outcomes are not predefined .

Reinforcement learning differs from traditional techniques like supervised learning in that it learns by interacting with an environment to maximize cumulative reward rather than learning from a static dataset. This trial-and-error learning process allows systems to adaptively improve based on the feedback received. Real-world applications include video games and industrial simulations where dynamic decision-making is essential. Reinforcement learning is suited for tasks where explicit instructions are unavailable, but success can be measured by outcomes, making it particularly valuable for optimizing resource management and complex strategic environments .

The "experience" component in machine learning tasks refers to the data the system learns from as it is exposed to different scenarios or examples. This is crucial because the accumulation and processing of data enable the system to recognize patterns, improve predictions, and adapt to changes, thereby enhancing performance. For instance, in the task of classifying emails as spam or not spam, the system improves its classification accuracy over time as it processes more examples of spam and non-spam emails, allowing it to make more nuanced distinctions .

Feature engineering improves model accuracy by transforming raw data into meaningful insights, allowing models to capture valuable patterns. Techniques like imputation address missing data, while encoding converts categorical variables into numerical form, essential for mathematical model compatibility. By enhancing the representations of input data, feature engineering directly influences a model's ability to learn effectively, highlighting relevant attributes and mitigating noise or irrelevant information. This process is essential for increasing predictive performance and achieving more accurate outcomes .

The "black-box" nature of human expertise refers to its tacit knowledge that is often difficult to articulate explicitly or encode into rules. Machine learning is important in fields like medical diagnosis because it can identify patterns and insights within large datasets that may elude human experts. It can handle the vast amount of constantly evolving medical knowledge more efficiently, making it a valuable tool for diagnostic insights where traditional methods are limited by human capacity for data processing .

The use of machine learning in personal finance and banking raises significant privacy and ethical concerns. Algorithms require large amounts of personal data to function effectively, leading to risks of unauthorized data access, misuse, or breaches. Ethical considerations include ensuring fairness in credit decisions and avoiding biases that could result in discriminatory practices. As machine learning models can sometimes operate as black boxes, it's crucial to have transparency and explainability in decision-making processes to maintain trust and comply with regulatory standards protecting consumer rights .

Machine learning algorithms use representation to model and structure data, facilitating the application of mathematical and statistical methods. Examples include decision trees and neural networks. Evaluation assesses the quality of these models through metrics like accuracy, squared error, or likelihood, guiding their improvement. Optimization involves adjusting model parameters to minimize error or maximize predictive performance, typically through processes such as gradient descent. These components interact to enhance algorithm effectiveness by refining how models learn and adapt to data .

You might also like