Overview of Machine
Learning
Keerthana N V
AP/IT
VCET
Introduction to Machine Learning
I am vrey
hapy 2 tell
yu that I wil
be gioing
out of coutry.
Hpoe you got
the messgae
Why ML?
Automation and Efficiency
Data-Driven Decision Making
Personalization
Improving Cybersecurity
Healthcare Innovations
Real-Time Analytics and Forecasting
Fraud Detection
Adaptive Systems
How it works?
Paradigms
Applications
Supervised Learning
Supervised learning is a type of machine learning
where an algorithm learns from labeled data.
In this method, the system is trained on a dataset
containing input-output pairs, where each input
(feature) corresponds to a specific known output
(label).
The goal is to learn a mapping function so that it can
predict the output for new, unseen inputs.
Supervised
How Supervised Learning Works
1.Training Phase:
•The algorithm is fed with input data (features)
and corresponding labels (targets).
•It adjusts its internal parameters to minimize the error between
its predictions and the actual labels.
2. Prediction Phase:
•Once trained, the model can predict outcomes for new,
unseen data based on what it has learned.
Types of Supervised Learning
Classification
• Used when the output variable is categorical (discrete classes).
• Example: Spam detection in emails (spam or not spam).
• Algorithms: Logistic Regression, Decision Trees, Random Forest,
Support Vector Machines (SVM), Neural Networks.
Regression
• Used when the output variable is continuous.
• Example: Predicting house prices based on size and location.
• Algorithms: Linear Regression, Polynomial Regression, Ridge
Regression, Lasso Regression.
Classification
Types of Classification
Types of Regression
Examples
•Cybersecurity
•Healthcare
•Finance
•E-commerce
•Voice Recognition
Advantages
•Accurate and reliable
•for well-defined problems with labeled data.
•Easy to understand
•when using simple models like linear
regression.
•Good performance
•for tasks such as classification and
regression.
Challenges and Limitations
•Requires large amounts of labeled data,
•which can be expensive and time-consuming to collect.
•Prone to overfitting
•if the model is too complex for the amount of data
available.
•Limited adaptability:
• It struggles with unexpected scenarios
since it relies heavily on raining data.
Unsupervised Learning
Unsupervised learning is a type of machine learning
where the algorithm works with unlabeled data.
Unlike supervised learning, there are no predefined
output labels or target values.
The algorithm tries to identify patterns, structures,
or relationships in the input data on its own.
Unsupervised
How it works?
• The algorithm looks for underlying patterns
or hidden structures by clustering similar
data points or identifying associations.
• It aims to group data or reduce
dimensionality without human guidance.
Types
Clustering
•Grouping data points into clusters based on
similarity.
•Example: Segmenting customers based on
purchasing behavior.
•Algorithms: K-Means, DBSCAN,
Agglomerative Clustering.
K-Medoids (PAM) Clustering
Similar to K-means, K-medoids clusters data
around k "medoids" (representative points in
the data) instead of centroids, which are
actual data points. This approach is less
sensitive to outliers.
The Partitioning Around Medoids (PAM)
algorithm minimizes the sum of dissimilarities
between points and their assigned medoid.
Association
•Identifying relationships or associations
between variables.
•Example: Market basket analysis (if a
customer buys bread, they are likely to buy
butter).
•Algorithms: Apriori, Eclat, FP-Growth.
Reinforcement Learning
In RL, an agent learns to make decisions by
interacting with an environment, aiming to maximize
some notion of cumulative reward.
Reinforcement
Semi-supervised Learning
This paradigm combines a small amount of labeled
data with a large amount of unlabeled data. It’s
especially useful when labeling data is expensive or
time-consuming.
Example:
Self-training, where a model is first trained on labeled
data, and then predictions on the unlabeled data are
used iteratively to improve the model.
Motivation
Labeling Cost : Annotating data can be
expensive and labor-intensive. Semi-supervised
learning is valuable in fields like medical
imaging, where expert-labeled data is sparse,
but vast unlabeled datasets are available.
Applications:
Natural Language Processing (NLP)
Computer Vision
Bioinformatics
Approaches
Self Training
Co-Training
Self Training:
Self-training is an iterative approach where a model is initially trained on
labeled data, then used to label some of the unlabeled data with high-
confidence predictions. These pseudo-labeled examples are added to
the labeled dataset, and the model is retrained.
Co-Training
Co-training uses two different models trained on two different views of
the data, typically two distinct feature subsets.
Each model is trained on its labeled data, predicts labels for the
unlabeled data, and then teaches the other model.