Introduction to AI and
Machine Learning
Supervised and unsupervised learning
• In supervised learning, the machine is trained on
a set of labeled data, which means that the input
data is paired with the desired output.
• The machine then learns to predict the output for
new input data. Supervised learning is often
used for tasks such as classification, regression,
and object detection.
• In unsupervised learning, the machine is
trained on a set of unlabeled data, which
means that the input data is not paired with
the desired output.
• The machine then learns to find patterns
and relationships in the data.
• Unsupervised learning is often used for
tasks such as clustering, dimensionality
reduction, and anomaly detection.
• Supervised learning is when we teach or train
the machine using data that is well-labelled.
Which means some data is already tagged with
the correct answer.
• After that, the machine is provided with a new
set of examples(data) so that the supervised
learning algorithm analyses the training data(set
of training examples) and produces a correct
outcome from labeled data.
• For example, a labeled dataset of images of
Elephant, Camel and Cow would have each image
tagged with either “Elephant” , “Camel”or “Cow.”
Key Points:
• Supervised learning involves training a machine
from labeled data.
• Labeled data consists of examples with the
correct answer or classification.
• The machine learns the relationship between
inputs and outputs.
• The trained machine can then make predictions
on new, unlabeled data.
Example:
• Let’s say you have a fruit basket that you want to
identify. The machine would first analyze the
image to extract features such as its shape, color,
and texture. Then, it would compare these
features to the features of the fruits it has
already learned about. If the new image’s
features are most similar to those of an apple,
the machine would predict that the fruit is an
apple.
• For instance, suppose you are given a basket filled
with different kinds of fruits. Now the first step is to
train the machine with all the different fruits one by
one like this:
• If the shape of the object is rounded and has a
depression at the top, is red in color, then it will be
labeled as –Apple.
• If the shape of the object is a long curving cylinder
having Green-Yellow color, then it will be labeled as –
Banana.
• Now suppose after training the data, you have given a
new separate fruit, say Banana from the basket, and
asked to identify it.
• Since the machine has already learned the things
from previous data and this time has to use it
wisely. It will first classify the fruit with its shape
and color and would confirm the fruit name as
BANANA and put it in the Banana category.
Thus the machine learns the things from training
data(basket containing fruits) and then applies
the knowledge to test data(new fruit).
Types of Supervised Learning
Supervised learning is classified into two categories of
algorithms:
• Regression: A regression problem is when the output
variable is a real value, such as “dollars” or “weight”.
• Classification: A classification problem is when the
output variable is a category, such as “Red” or “blue” ,
“disease” or “no disease”.
Regression
• Regression is a type of supervised learning that is used
to predict continuous values, such as house prices,
stock prices, or customer churn.
• Regression algorithms learn a function that maps from
the input features to the output value.
Some common regression algorithms include:
• Linear Regression
• Polynomial Regression
• Support Vector Machine Regression
• Decision Tree Regression
• Random Forest Regression
Classification
Classification is a type of supervised learning that is used
to predict categorical values, such as whether a customer
will churn or not, whether an email is spam or not, or
whether a medical image shows a tumor or not.
Classification algorithms learn a function that maps from
the input features to a probability distribution over the
output classes.
• Some common classification algorithms include:
• Logistic Regression
• Support Vector Machines
• Decision Trees
• Random Forests
• Naive Baye
Evaluating Supervised Learning Models
There are a number of different metrics that can
be used to evaluate supervised learning models,
but some of the most common ones include:
• For Regression
• Mean Squared Error (MSE): MSE measures the
average squared difference between the
predicted values and the actual values. Lower
MSE values indicate better model performance.
• Root Mean Squared Error (RMSE): RMSE is the square
root of MSE, representing the standard deviation of the
prediction errors. Similar to MSE, lower RMSE values
indicate better model performance.
• Mean Absolute Error (MAE): MAE measures the
average absolute difference between the predicted
values and the actual values. It is less sensitive to
outliers compared to MSE or RMSE.
• R-squared (Coefficient of Determination): R-squared
measures the proportion of the variance in the target
variable that is explained by the model. Higher R-
squared values indicate better model fit.
For Classification
• Accuracy: Accuracy is the percentage of predictions that
the model makes correctly. It is calculated by dividing the
number of correct predictions by the total number of
predictions.
• Precision: Precision is the percentage of positive
predictions that the model makes that are actually
correct. It is calculated by dividing the number of true
positives by the total number of positive predictions.
• Recall: Recall is the percentage of all positive examples
that the model correctly identifies. It is calculated by
dividing the number of true positives by the total number
of positive examples.
• F1 score: The F1 score is a weighted average of
precision and recall. It is calculated by taking the
harmonic mean of precision and recall.
• Confusion matrix: A confusion matrix is a table
that shows the number of predictions for each
class, along with the actual class labels. It can be
used to visualize the performance of the model
and identify areas where the model is struggling.
Applications of Supervised learning
• Spam filtering: Supervised learning algorithms can be
trained to identify and classify spam emails based on their
content, helping users avoid unwanted messages.
• Image classification: Supervised learning can
automatically classify images into different categories,
such as animals, objects, or scenes, facilitating tasks like
image search, content moderation, and image-based
product recommendations.
• Medical diagnosis: Supervised learning can assist in
medical diagnosis by analyzing patient data, such as
medical images, test results, and patient history, to identify
patterns that suggest specific diseases or conditions.
• Fraud detection: Supervised learning models
can analyze financial transactions and identify
patterns that indicate fraudulent activity, helping
financial institutions prevent fraud and protect
their customers.
• Natural language processing (NLP): Supervised
learning plays a crucial role in NLP tasks,
including sentiment analysis, machine
translation, and text summarization, enabling
machines to understand and process human
language effectively.
Advantages of Supervised learning
• Supervised learning allows collecting data and produces
data output from previous experiences.
• Helps to optimize performance criteria with the help of
experience.
• Supervised machine learning helps to solve various types
of real-world computation problems.
• It performs classification and regression tasks.
• It allows estimating or mapping the result to a new
sample.
• We have complete control over choosing the number of
classes we want in the training data.
Disadvantages of Supervised learning
• Classifying big data can be challenging.
• Training for supervised learning needs a lot of
computation time. So, it requires a lot of time.
• Supervised learning cannot handle all complex
tasks in Machine Learning.
• Computation time is vast for supervised learning.
• It requires a labelled data set.
• It requires a training process.
What is Unsupervised learning?
Unsupervised learning is the training of a machine using
information that is neither classified nor labeled and
allowing the algorithm to act on that information without
guidance. Here the task of the machine is to group unsorted
information according to similarities, patterns, and
differences without any prior training of data.
You can use unsupervised learning to examine the animal
data that has been gathered and distinguish between
several groups according to the traits and actions of the
animals. These groupings might correspond to various
animal species, providing you to categorize the creatures
without depending on labels that already exist.
• Key Points
• Unsupervised learning allows the model to discover
patterns and relationships in unlabeled data.
• Clustering algorithms group similar data points
together based on their inherent characteristics.
• Feature extraction captures essential information from
the data, enabling the model to make meaningful
distinctions.
• Label association assigns categories to the clusters
based on the extracted patterns and characteristics.
Example
• For instance, suppose it is given an image having
both dogs and cats which it has never seen.
• Thus the machine has no idea about the features of
dogs and cats so we can’t categorize it as ‘dogs and
cats ‘. But it can categorize them according to their
similarities, patterns, and differences, i.e., we can
easily categorize the above picture into two parts.
The first may contain all pics having dogs in them
and the second part may contain all pics
having cats in them. Here you didn’t learn anything
before, which means no training data or examples.
Types of Unsupervised Learning
Unsupervised learning is classified into two categories
of algorithms:
• Clustering: A clustering problem is where you want to
discover the inherent groupings in the data, such as
grouping customers by purchasing behavior.
• Association: An association rule learning problem is
where you want to discover rules that describe large
portions of your data, such as people that buy X also
tend to buy Y.
Clustering
• Clustering is a type of unsupervised learning that is used to
group similar data points together. Clustering algorithms work
by iteratively moving data points closer to their cluster centers
and further away from data points in other clusters.
1.Exclusive (partitioning)
2.Agglomerative
3.Overlapping
4.Probabilistic
Clustering Types:-
1.Hierarchical clustering
2.K-means clustering
3.Principal Component Analysis
4.Singular Value Decomposition
5.Independent Component Analysis
6.Gaussian Mixture Models (GMMs)
7.Density-Based Spatial Clustering of Applications with
Noise (DBSCAN)
Association rule learning
Association rule learning is a type of unsupervised
learning that is used to identify patterns in a
data. Association rule learning algorithms work by
finding relationships between different items in a
dataset.
• Some common association rule learning algorithms
include:
• Apriori Algorithm
• Eclat Algorithm
• FP-Growth Algorithm
Application of Unsupervised learning
• Anomaly detection: Unsupervised learning can identify
unusual patterns or deviations from normal behavior in
data, enabling the detection of fraud, intrusion, or system
failures.
• Scientific discovery: Unsupervised learning can uncover
hidden relationships and patterns in scientific data, leading
to new hypotheses and insights in various scientific fields.
• Recommendation systems: Unsupervised learning can
identify patterns and similarities in user behavior and
preferences to recommend products, movies, or music that
align with their interests.
• Customer segmentation: Unsupervised learning
can identify groups of customers with similar
characteristics, allowing businesses to target
marketing campaigns and improve customer
service more effectively.
• Image analysis: Unsupervised learning can group
images based on their content, facilitating tasks
such as image classification, object detection,
and image retrieval.
Advantages of Unsupervised learning
• It does not require training data to be labeled.
• Dimensionality reduction can be easily accomplished using
unsupervised learning.
• Capable of finding previously unknown patterns in data.
• Unsupervised learning can help you gain insights from
unlabeled data that you might not have been able to get
otherwise.
• Unsupervised learning is good at finding patterns and
relationships in data without being told what to look for. This
can help you learn new things about your data.
Disadvantages of Unsupervised learning
• Difficult to measure accuracy or effectiveness due to
lack of predefined answers during training.
• The results often have lesser accuracy.
• The user needs to spend time interpreting and label
the classes which follow that classification.
• Unsupervised learning can be sensitive to data quality,
including missing values, outliers, and noisy data.
• Without labeled data, it can be difficult to evaluate the
performance of unsupervised learning models, making
it challenging to assess their effectiveness.