0% found this document useful (0 votes)

27 views27 pages

ML Algorithms Explained

Uploaded by

yadavasit24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

27 views27 pages

ML Algorithms Explained

Uploaded by

yadavasit24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 27

ML Algorithms Explained

ML Algorithms: Code & Concepts

scikit-learn

mlpfu.pages.dev 1
ML Algorithms Explained

Linear Regression: Theory

Concept: A foundational algorithm that models the linear relationship
between features and a continuous target. It fits a line (or hyperplane) that
minimizes the sum of squared errors (the vertical distance from each point
to the line).

Pros: Simple to understand, highly interpretable coefficients, fast to

train.
Cons: Assumes the relationship is linear, can be sensitive to outliers.

When to Use: Excellent as a starting point or baseline for any regression

problem. Use it when you need a simple, explainable model.

mlpfu.pages.dev 2
ML Algorithms Explained

Linear Regression: Visualization

This plot shows the raw data points and the best-fitting line found by the
model. The goal is to minimize the collective distance from all points to this
line.

mlpfu.pages.dev 3
ML Algorithms Explained

Linear Regression: Code

from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

# Scaling features is a good practice that helps with model convergence.

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Create and train the model

model = LinearRegression(
fit_intercept=True # Calculates the y-intercept. Set to False if data is pre-centered.
)
model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 4
ML Algorithms Explained

Polynomial Regression: Theory

Concept: A powerful variation of linear regression that can model non-
linear, curved relationships. It works by creating new polynomial features
(e.g., x², x³) from the original features and then fitting a linear model to this
expanded feature set.

Pros: Can capture complex, non-linear patterns.

Cons: Prone to overfitting if the degree is too high. Choosing the right
degree can be tricky.

When to Use: When you visually inspect your data and see a clear curve or
non-linear trend.

mlpfu.pages.dev 5
ML Algorithms Explained

Polynomial Regression: Visualization

This plot shows how a degree-2 polynomial model can fit a curved
relationship in the data much better than a straight line could.

mlpfu.pages.dev 6
ML Algorithms Explained

Polynomial Regression: Code

from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline

# A pipeline is the best way to chain the feature creation and modeling steps.
model = make_pipeline(
StandardScaler(),
PolynomialFeatures(
degree=2, # The degree of the polynomial. Higher = more complex curve.
include_bias=False # Avoids a redundant bias term that LinearRegression handles.
),
LinearRegression()
)
model.fit(X_train, y_train)

mlpfu.pages.dev 7
ML Algorithms Explained

Regularization: Theory
Concept: A technique to combat overfitting by adding a penalty to the loss
function based on the size of the model's coefficients. This discourages the
model from becoming too complex.

Ridge (L2): Shrinks all coefficients towards zero, but never to exactly
zero. Good for general-purpose shrinkage.
Lasso (L1): Can shrink coefficients all the way to zero, effectively acting
as a form of automatic feature selection.

When to Use: Whenever you have a model with many features or a

complex model (like polynomial regression) that might be overfitting.

mlpfu.pages.dev 8
ML Algorithms Explained

Regularization: Visualization
These plots show how coefficients change as the regularization strength
( alpha ) increases. Notice how Lasso (right) forces coefficients to become
exactly zero, while Ridge (left) only shrinks them.

mlpfu.pages.dev 9
ML Algorithms Explained

Regularization: Code
from sklearn.linear_model import Ridge, Lasso

# Ridge (L2) Regression - good for reducing model complexity

ridge_model = Ridge(
alpha=1.0 # Regularization strength. Higher alpha = simpler model.
)
ridge_model.fit(X_train_scaled, y_train)

# Lasso (L1) Regression - good for feature selection

lasso_model = Lasso(
alpha=0.1 # Regularization strength. Higher alpha = more features set to zero.
)
lasso_model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 10
ML Algorithms Explained

Logistic Regression: Theory

Concept: The go-to algorithm for binary classification. It calculates the
probability of an instance belonging to a class by passing a linear equation
through the sigmoid function, which squashes the output to a value between
0 and 1.

Pros: Fast, highly interpretable, provides probabilities.

Cons: Assumes a linear decision boundary between classes.

When to Use: A first-choice algorithm for any binary classification task,

especially when interpretability is important.

mlpfu.pages.dev 11
ML Algorithms Explained

Logistic Regression: Visualization

The line represents the decision boundary learned by the model. Points on
one side are classified as class 0, and points on the other side are classified
as class 1.

mlpfu.pages.dev 12
ML Algorithms Explained

Logistic Regression: Code

from sklearn.linear_model import LogisticRegression

# Create and train the model

model = LogisticRegression(
penalty='l2', # Specifies the regularization type ('l1', 'l2').
C=1.0, # Inverse of regularization strength. Smaller C = stronger penalty.
solver='liblinear',# Optimization algorithm. Good choice for small datasets.
multi_class='ovr' # Strategy for multi-class problems: One-vs-Rest.
)
model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 13
ML Algorithms Explained

Naive Bayes: Theory

Concept: A fast, probabilistic classifier based on Bayes' Theorem. Its core is
the "naive" assumption that all features are completely independent of one
another. While this is rarely true, the algorithm is surprisingly effective in
practice.

Pros: Extremely fast, performs very well with high-dimensional data

(many features).
Cons: The independence assumption is a strong one and often not true.

When to Use: A classic choice for text classification (e.g., spam filtering)
where the number of features (words) is very large.

mlpfu.pages.dev 14
ML Algorithms Explained

Naive Bayes: Code

from sklearn.naive_bayes import GaussianNB

# This version (GaussianNB) is used when the features are continuous

# and assumed to follow a normal (Gaussian) distribution.
# Other versions include MultinomialNB (for word counts) and BernoulliNB (for binary features).
model = GaussianNB()
model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 15
ML Algorithms Explained

K-Nearest Neighbors (KNN): Theory

Concept: A simple, "lazy" algorithm that makes predictions by looking at
the 'K' closest data points in the training set. It classifies a new point based
on a majority vote of its neighbors. It doesn't "learn" a model; it just
memorizes the entire training dataset.

Pros: Very simple to understand, no training phase required.

Cons: Can be very slow at prediction time on large datasets, sensitive to
irrelevant features and the scale of the data.

When to Use: For simple problems or as a baseline. When the decision

boundary is highly irregular and you don't need lightning-fast predictions.

mlpfu.pages.dev 16
ML Algorithms Explained

K-Nearest Neighbors (KNN): Visualization

These plots show how the decision boundary changes with K. A small K (left)
creates a complex, jagged boundary that can be prone to noise. A larger K
(right) creates a smoother, more generalized boundary.

mlpfu.pages.dev 17
ML Algorithms Explained

K-Nearest Neighbors (KNN): Code

from sklearn.neighbors import KNeighborsClassifier

# Create and train the model

model = KNeighborsClassifier(
n_neighbors=5, # The number of neighbors to use (K). This is the key hyperparameter.
weights='uniform', # 'uniform' gives all neighbors equal weight. 'distance' gives more weight to closer neighbors.
metric='minkowski', # The distance metric. 'minkowski' with p=2 is the standard Euclidean distance.
p=2
)
model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 18
ML Algorithms Explained

Support Vector Machines (SVM): Theory

Concept: A powerful and versatile classifier that works by finding the
optimal hyperplane that best separates the classes. "Optimal" means the
one with the largest possible margin—the distance between the hyperplane
and the nearest points from each class (the "support vectors").

Pros: Very effective in high-dimensional spaces, memory efficient as it

only uses a subset of points (support vectors).
Cons: Can be slow to train on very large datasets, less interpretable
than other models.

When to Use: For complex classification problems where you need high
accuracy, even if the data is not linearly separable (thanks to the kernel
trick).
mlpfu.pages.dev 19
ML Algorithms Explained

Support Vector Machines (SVM): Visualization

This plot shows the decision boundary (solid line), the margins (dashed
lines), and the circled support vectors that define the margin.

mlpfu.pages.dev 20
ML Algorithms Explained

Support Vector Machines (SVM): Code

from sklearn.svm import SVC

# Create and train the model

model = SVC(
kernel='rbf', # Kernel type. 'rbf' is a powerful default for non-linear problems. 'linear' for linear data.
C=1.0, # Regularization parameter. Controls the trade-off between a wide margin and classifying all points correctly.
gamma='scale' # Kernel coefficient for 'rbf'. 'scale' is a robust default setting.
)
model.fit(X_train_scaled, y_train)

mlpfu.pages.dev 21
ML Algorithms Explained

Decision Tree: Theory

Concept: A highly interpretable model that creates a flowchart of if-then-
else rules based on the data's features. It recursively splits the data into
subsets that are as "pure" (homogeneous) as possible.

Pros: Very easy to understand and visualize, requires no feature scaling.

Cons: Individual trees are prone to overfitting and can be unstable
(small changes in data can lead to a completely different tree).

When to Use: When model interpretability is a top priority. Also serves as

the building block for more powerful ensemble models like Random Forests.

mlpfu.pages.dev 22
ML Algorithms Explained

Decision Tree: Visualization

This image shows the flowchart-like structure of a trained decision tree. You
can follow the path from the root node down to a leaf to get a prediction.

mlpfu.pages.dev 23
ML Algorithms Explained

Decision Tree: Code

from sklearn.tree import DecisionTreeClassifier

# Create and train the model. Note: Trees do not require feature scaling.
model = DecisionTreeClassifier(
criterion='gini', # The function to measure the quality of a split ('gini' or 'entropy').
max_depth=3, # The maximum depth of the tree. Setting this is the primary way to prevent overfitting.
min_samples_leaf=1 # The minimum number of samples required to be at a leaf node.
)
model.fit(X_train, y_train)

mlpfu.pages.dev 24
ML Algorithms Explained

Hyperparameter Tuning: Theory

Concept: The process of finding the optimal settings for a model's
parameters that are not learned from the data (e.g., K in KNN, C in SVM).
This is done by systematically searching through a "grid" of possible
parameter values and evaluating each combination using cross-validation.

Why: Default parameters are rarely optimal. Tuning is crucial for

maximizing model performance.
How: GridSearchCV automates this search, making it a standard and
essential step in the modeling pipeline.

mlpfu.pages.dev 25
ML Algorithms Explained

Hyperparameter Tuning: Visualization

This heatmap shows the cross-validated accuracy for different combinations
of an SVM's C and gamma parameters. This allows you to visually identify
the region of best performance.

mlpfu.pages.dev 26
ML Algorithms Explained

Hyperparameter Tuning: Code

from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC

# 1. Define the grid of parameters you want to search

param_grid = {
'C': [0.1, 1, 10], # Test these regularization values
'gamma': [1, 0.1, 0.01], # Test these kernel coefficient values
'kernel': ['rbf']
}

# 2. Create the GridSearchCV object

grid_search = GridSearchCV(
estimator=SVC(), # The model you want to tune
param_grid=param_grid, # The parameter grid to search
cv=5, # Number of folds for cross-validation
scoring='accuracy', # The metric to optimize
verbose=1 # Set to 1 or higher to see progress updates
)

# 3. Fit it to the data. This will start the search.

grid_search.fit(X_train_scaled, y_train)
print(f"Best Parameters Found: {grid_search.best_params_}")
mlpfu.pages.dev 27

Moocs Ritesh
No ratings yet
Moocs Ritesh
22 pages
Regression Models
No ratings yet
Regression Models
13 pages
Chapter Four
No ratings yet
Chapter Four
75 pages
ML Models
No ratings yet
ML Models
21 pages
Chapter Four - Part One
No ratings yet
Chapter Four - Part One
44 pages
ML Algo Revision (Detailed)
No ratings yet
ML Algo Revision (Detailed)
8 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Predicting Student Pass Rates
No ratings yet
Predicting Student Pass Rates
17 pages
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
No ratings yet
Foundations of Machine Learning and Data Science - Concepts, Techniques, and Applications
9 pages
Introduction To ML
No ratings yet
Introduction To ML
15 pages
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
No ratings yet
ML Assignment 1: 1. A) What Is Machine Learning? Explain Types of Machine Learning
8 pages
5 Markd
No ratings yet
5 Markd
24 pages
Machine Learning
No ratings yet
Machine Learning
28 pages
20MEMECH Part 3 - Classification
No ratings yet
20MEMECH Part 3 - Classification
49 pages
Python Machine Learning Algorithms
100% (3)
Python Machine Learning Algorithms
16 pages
Technical Report
No ratings yet
Technical Report
5 pages
Machine Learning Notes For Exam
No ratings yet
Machine Learning Notes For Exam
29 pages
ML Revision
No ratings yet
ML Revision
207 pages
Lecture 17&18 - Introduction To Machine Learning
No ratings yet
Lecture 17&18 - Introduction To Machine Learning
51 pages
AI Detailed Notes
No ratings yet
AI Detailed Notes
7 pages
Feature Engineering
No ratings yet
Feature Engineering
18 pages
Machine Learning for Beginners
No ratings yet
Machine Learning for Beginners
18 pages
HCIA-AI Machine Learning Lab Guide
No ratings yet
HCIA-AI Machine Learning Lab Guide
82 pages
Building, Tuning, and Deploying Models
No ratings yet
Building, Tuning, and Deploying Models
11 pages
Plagiarism
No ratings yet
Plagiarism
24 pages
July4 SaketAnand FriendlyIntroToML
No ratings yet
July4 SaketAnand FriendlyIntroToML
84 pages
Supervised Learning Algorithmn
No ratings yet
Supervised Learning Algorithmn
4 pages
305 Ba SC MLC Intelligence Using Python - Unlocked
No ratings yet
305 Ba SC MLC Intelligence Using Python - Unlocked
84 pages
Cours1 ML
No ratings yet
Cours1 ML
41 pages
Final ML Project File
No ratings yet
Final ML Project File
16 pages
Capstone Project - Jaro-Prof. Babji
No ratings yet
Capstone Project - Jaro-Prof. Babji
5 pages
ML Overview
No ratings yet
ML Overview
11 pages
Notes On Data Science and Machine Learning
No ratings yet
Notes On Data Science and Machine Learning
53 pages
Slides On DataI
No ratings yet
Slides On DataI
33 pages
Comprehensive Overview of Common ML Techniques
No ratings yet
Comprehensive Overview of Common ML Techniques
7 pages
ML Report 1
No ratings yet
ML Report 1
23 pages
Advanced Machine Learning Tutorial
No ratings yet
Advanced Machine Learning Tutorial
37 pages
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
No ratings yet
ML in Simple Words: in Python, The Function Is Used To Display Output On The Screen or Other Standard Output Device
30 pages
Slides Learners
No ratings yet
Slides Learners
49 pages
Machine Learning Strategies
No ratings yet
Machine Learning Strategies
59 pages
Lect 1
No ratings yet
Lect 1
24 pages
Summary Machine Learning
No ratings yet
Summary Machine Learning
6 pages
Combine PDF
No ratings yet
Combine PDF
75 pages
Machine Learning The Basics
No ratings yet
Machine Learning The Basics
158 pages
Machine Learning
No ratings yet
Machine Learning
3 pages
2021 10 11 - Intro ML - Inserm
No ratings yet
2021 10 11 - Intro ML - Inserm
41 pages
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
No ratings yet
Chapter 2: Artificial Intelligence (Deep Learning and Machine Learning)
9 pages
Beginner's Guide to Machine Learning
No ratings yet
Beginner's Guide to Machine Learning
8 pages
AWS Machine Learning Specialty Master Cheat Sheet
No ratings yet
AWS Machine Learning Specialty Master Cheat Sheet
24 pages
Silver Oak College of Computer Application: Subject:Machine Learning
No ratings yet
Silver Oak College of Computer Application: Subject:Machine Learning
15 pages
ML Syllabus First 5 Topics Hints Visuals
No ratings yet
ML Syllabus First 5 Topics Hints Visuals
25 pages
CS3491 Artificial Intelligence and Machine Learning Laboratory
No ratings yet
CS3491 Artificial Intelligence and Machine Learning Laboratory
36 pages
Natural Language Processing-Section
No ratings yet
Natural Language Processing-Section
38 pages
Scikit-learn Machine Learning Tutorial
No ratings yet
Scikit-learn Machine Learning Tutorial
17 pages
Machine Learning and Its Applications
No ratings yet
Machine Learning and Its Applications
81 pages
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
No ratings yet
A. Install Relevant Package For Classification. B. Choose Classifier For Classification Problem. C. Evaluate The Performance of Classifier
10 pages
Polynomial Regression vs Neural Nets
No ratings yet
Polynomial Regression vs Neural Nets
28 pages
11 Classical Time Series Forecasting Methods in Python (Cheat Sheet)
No ratings yet
11 Classical Time Series Forecasting Methods in Python (Cheat Sheet)
5 pages
Different Types of Post
100% (1)
Different Types of Post
4 pages
Advanced Gradient Boosting Techniques
No ratings yet
Advanced Gradient Boosting Techniques
44 pages
Ch3 Slides
No ratings yet
Ch3 Slides
30 pages
Understanding Correlation Types & Methods
No ratings yet
Understanding Correlation Types & Methods
85 pages
MUF0142 Sample Exam Questions 4
No ratings yet
MUF0142 Sample Exam Questions 4
16 pages
Chatgpt Unit - 3
No ratings yet
Chatgpt Unit - 3
4 pages
BBA Time Series Analysis Assignment
No ratings yet
BBA Time Series Analysis Assignment
5 pages
Assignment 1-ML
No ratings yet
Assignment 1-ML
4 pages
Stargazer Cheatsheet for R Tables
No ratings yet
Stargazer Cheatsheet for R Tables
40 pages
Murphy's Machine Learning Solutions Manual
No ratings yet
Murphy's Machine Learning Solutions Manual
100 pages
Multple Linear Regression
No ratings yet
Multple Linear Regression
8 pages
Polynomial Regression in Python
No ratings yet
Polynomial Regression in Python
6 pages
Econometrics One Chapter 4-Violation of CLRM Assumptions
100% (1)
Econometrics One Chapter 4-Violation of CLRM Assumptions
31 pages
Statistical Data Analysis - 2 - Step by Step Guide To SPSS & MINITAB - Nodrm
No ratings yet
Statistical Data Analysis - 2 - Step by Step Guide To SPSS & MINITAB - Nodrm
83 pages
Exercise#8 Instructions Linear Regression Model
No ratings yet
Exercise#8 Instructions Linear Regression Model
4 pages
Parametric Tests: Usage & Assumptions
No ratings yet
Parametric Tests: Usage & Assumptions
2 pages
Chapter 2
No ratings yet
Chapter 2
22 pages
Business - Report-Comp-Fin - Data - Part A - Problem
No ratings yet
Business - Report-Comp-Fin - Data - Part A - Problem
17 pages
(Ebook) Multilevel Analysis: An Introduction To Basic and Advanced Multilevel Modeling by Tom A. B. Snijders ISBN 9781849202008, 9781849202015, 1849202001, 184920201X, 2011926498 PDF Download
100% (1)
(Ebook) Multilevel Analysis: An Introduction To Basic and Advanced Multilevel Modeling by Tom A. B. Snijders ISBN 9781849202008, 9781849202015, 1849202001, 184920201X, 2011926498 PDF Download
61 pages
Clustering Activity K Means+ (Age+and+amount)
No ratings yet
Clustering Activity K Means+ (Age+and+amount)
10 pages
Forward Difference Scheme in Python
No ratings yet
Forward Difference Scheme in Python
7 pages
Tutorial Session 11 - Heteroscedasticity Solution
No ratings yet
Tutorial Session 11 - Heteroscedasticity Solution
3 pages
Answers To Analysis of Covarience Research Questions
No ratings yet
Answers To Analysis of Covarience Research Questions
9 pages
Q-7, Ritika
No ratings yet
Q-7, Ritika
3 pages
Generalized Least Squares
No ratings yet
Generalized Least Squares
5 pages
SM I 2013 LecturesWeek 6
No ratings yet
SM I 2013 LecturesWeek 6
7 pages
Bias - Varience Trade Off
No ratings yet
Bias - Varience Trade Off
4 pages

ML Algorithms Explained

Uploaded by

ML Algorithms Explained

Uploaded by

ML Algorithms Explained

ML Algorithms: Code & Concepts

Linear Regression: Theory

Pros: Simple to understand, highly interpretable coefficients, fast to

When to Use: Excellent as a starting point or baseline for any regression

Linear Regression: Visualization

Linear Regression: Code

# Scaling features is a good practice that helps with model convergence.

# Create and train the model

Polynomial Regression: Theory

Pros: Can capture complex, non-linear patterns.

Polynomial Regression: Visualization

Polynomial Regression: Code

When to Use: Whenever you have a model with many features or a

# Ridge (L2) Regression - good for reducing model complexity

# Lasso (L1) Regression - good for feature selection

Logistic Regression: Theory

Pros: Fast, highly interpretable, provides probabilities.

When to Use: A first-choice algorithm for any binary classification task,

Logistic Regression: Visualization

Logistic Regression: Code

# Create and train the model

Naive Bayes: Theory

Pros: Extremely fast, performs very well with high-dimensional data

Naive Bayes: Code

# This version (GaussianNB) is used when the features are continuous

K-Nearest Neighbors (KNN): Theory

Pros: Very simple to understand, no training phase required.

When to Use: For simple problems or as a baseline. When the decision

K-Nearest Neighbors (KNN): Visualization

K-Nearest Neighbors (KNN): Code

# Create and train the model

Support Vector Machines (SVM): Theory

Pros: Very effective in high-dimensional spaces, memory efficient as it

Support Vector Machines (SVM): Visualization

Support Vector Machines (SVM): Code

# Create and train the model

Decision Tree: Theory

Pros: Very easy to understand and visualize, requires no feature scaling.

When to Use: When model interpretability is a top priority. Also serves as

Decision Tree: Visualization

Decision Tree: Code

Hyperparameter Tuning: Theory

Why: Default parameters are rarely optimal. Tuning is crucial for

Hyperparameter Tuning: Visualization

Hyperparameter Tuning: Code

# 1. Define the grid of parameters you want to search

# 2. Create the GridSearchCV object

# 3. Fit it to the data. This will start the search.

You might also like