ML Algorithms Explained
ML Algorithms: Code & Concepts
scikit-learn
mlpfu.pages.dev 1
ML Algorithms Explained
Linear Regression: Theory
Concept: A foundational algorithm that models the linear relationship
between features and a continuous target. It fits a line (or hyperplane) that
minimizes the sum of squared errors (the vertical distance from each point
to the line).
Pros: Simple to understand, highly interpretable coefficients, fast to
train.
Cons: Assumes the relationship is linear, can be sensitive to outliers.
When to Use: Excellent as a starting point or baseline for any regression
problem. Use it when you need a simple, explainable model.
mlpfu.pages.dev 2
ML Algorithms Explained
Linear Regression: Visualization
This plot shows the raw data points and the best-fitting line found by the
model. The goal is to minimize the collective distance from all points to this
line.
mlpfu.pages.dev 3
ML Algorithms Explained
Linear Regression: Code
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler
# Scaling features is a good practice that helps with model convergence.
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
# Create and train the model
model = LinearRegression(
fit_intercept=True # Calculates the y-intercept. Set to False if data is pre-centered.
)
model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 4
ML Algorithms Explained
Polynomial Regression: Theory
Concept: A powerful variation of linear regression that can model non-
linear, curved relationships. It works by creating new polynomial features
(e.g., x², x³) from the original features and then fitting a linear model to this
expanded feature set.
Pros: Can capture complex, non-linear patterns.
Cons: Prone to overfitting if the degree is too high. Choosing the right
degree can be tricky.
When to Use: When you visually inspect your data and see a clear curve or
non-linear trend.
mlpfu.pages.dev 5
ML Algorithms Explained
Polynomial Regression: Visualization
This plot shows how a degree-2 polynomial model can fit a curved
relationship in the data much better than a straight line could.
mlpfu.pages.dev 6
ML Algorithms Explained
Polynomial Regression: Code
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import make_pipeline
# A pipeline is the best way to chain the feature creation and modeling steps.
model = make_pipeline(
StandardScaler(),
PolynomialFeatures(
degree=2, # The degree of the polynomial. Higher = more complex curve.
include_bias=False # Avoids a redundant bias term that LinearRegression handles.
),
LinearRegression()
)
model.fit(X_train, y_train)
mlpfu.pages.dev 7
ML Algorithms Explained
Regularization: Theory
Concept: A technique to combat overfitting by adding a penalty to the loss
function based on the size of the model's coefficients. This discourages the
model from becoming too complex.
Ridge (L2): Shrinks all coefficients towards zero, but never to exactly
zero. Good for general-purpose shrinkage.
Lasso (L1): Can shrink coefficients all the way to zero, effectively acting
as a form of automatic feature selection.
When to Use: Whenever you have a model with many features or a
complex model (like polynomial regression) that might be overfitting.
mlpfu.pages.dev 8
ML Algorithms Explained
Regularization: Visualization
These plots show how coefficients change as the regularization strength
( alpha ) increases. Notice how Lasso (right) forces coefficients to become
exactly zero, while Ridge (left) only shrinks them.
mlpfu.pages.dev 9
ML Algorithms Explained
Regularization: Code
from sklearn.linear_model import Ridge, Lasso
# Ridge (L2) Regression - good for reducing model complexity
ridge_model = Ridge(
alpha=1.0 # Regularization strength. Higher alpha = simpler model.
)
ridge_model.fit(X_train_scaled, y_train)
# Lasso (L1) Regression - good for feature selection
lasso_model = Lasso(
alpha=0.1 # Regularization strength. Higher alpha = more features set to zero.
)
lasso_model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 10
ML Algorithms Explained
Logistic Regression: Theory
Concept: The go-to algorithm for binary classification. It calculates the
probability of an instance belonging to a class by passing a linear equation
through the sigmoid function, which squashes the output to a value between
0 and 1.
Pros: Fast, highly interpretable, provides probabilities.
Cons: Assumes a linear decision boundary between classes.
When to Use: A first-choice algorithm for any binary classification task,
especially when interpretability is important.
mlpfu.pages.dev 11
ML Algorithms Explained
Logistic Regression: Visualization
The line represents the decision boundary learned by the model. Points on
one side are classified as class 0, and points on the other side are classified
as class 1.
mlpfu.pages.dev 12
ML Algorithms Explained
Logistic Regression: Code
from sklearn.linear_model import LogisticRegression
# Create and train the model
model = LogisticRegression(
penalty='l2', # Specifies the regularization type ('l1', 'l2').
C=1.0, # Inverse of regularization strength. Smaller C = stronger penalty.
solver='liblinear',# Optimization algorithm. Good choice for small datasets.
multi_class='ovr' # Strategy for multi-class problems: One-vs-Rest.
)
model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 13
ML Algorithms Explained
Naive Bayes: Theory
Concept: A fast, probabilistic classifier based on Bayes' Theorem. Its core is
the "naive" assumption that all features are completely independent of one
another. While this is rarely true, the algorithm is surprisingly effective in
practice.
Pros: Extremely fast, performs very well with high-dimensional data
(many features).
Cons: The independence assumption is a strong one and often not true.
When to Use: A classic choice for text classification (e.g., spam filtering)
where the number of features (words) is very large.
mlpfu.pages.dev 14
ML Algorithms Explained
Naive Bayes: Code
from sklearn.naive_bayes import GaussianNB
# This version (GaussianNB) is used when the features are continuous
# and assumed to follow a normal (Gaussian) distribution.
# Other versions include MultinomialNB (for word counts) and BernoulliNB (for binary features).
model = GaussianNB()
model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 15
ML Algorithms Explained
K-Nearest Neighbors (KNN): Theory
Concept: A simple, "lazy" algorithm that makes predictions by looking at
the 'K' closest data points in the training set. It classifies a new point based
on a majority vote of its neighbors. It doesn't "learn" a model; it just
memorizes the entire training dataset.
Pros: Very simple to understand, no training phase required.
Cons: Can be very slow at prediction time on large datasets, sensitive to
irrelevant features and the scale of the data.
When to Use: For simple problems or as a baseline. When the decision
boundary is highly irregular and you don't need lightning-fast predictions.
mlpfu.pages.dev 16
ML Algorithms Explained
K-Nearest Neighbors (KNN): Visualization
These plots show how the decision boundary changes with K. A small K (left)
creates a complex, jagged boundary that can be prone to noise. A larger K
(right) creates a smoother, more generalized boundary.
mlpfu.pages.dev 17
ML Algorithms Explained
K-Nearest Neighbors (KNN): Code
from sklearn.neighbors import KNeighborsClassifier
# Create and train the model
model = KNeighborsClassifier(
n_neighbors=5, # The number of neighbors to use (K). This is the key hyperparameter.
weights='uniform', # 'uniform' gives all neighbors equal weight. 'distance' gives more weight to closer neighbors.
metric='minkowski', # The distance metric. 'minkowski' with p=2 is the standard Euclidean distance.
p=2
)
model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 18
ML Algorithms Explained
Support Vector Machines (SVM): Theory
Concept: A powerful and versatile classifier that works by finding the
optimal hyperplane that best separates the classes. "Optimal" means the
one with the largest possible margin—the distance between the hyperplane
and the nearest points from each class (the "support vectors").
Pros: Very effective in high-dimensional spaces, memory efficient as it
only uses a subset of points (support vectors).
Cons: Can be slow to train on very large datasets, less interpretable
than other models.
When to Use: For complex classification problems where you need high
accuracy, even if the data is not linearly separable (thanks to the kernel
trick).
mlpfu.pages.dev 19
ML Algorithms Explained
Support Vector Machines (SVM): Visualization
This plot shows the decision boundary (solid line), the margins (dashed
lines), and the circled support vectors that define the margin.
mlpfu.pages.dev 20
ML Algorithms Explained
Support Vector Machines (SVM): Code
from sklearn.svm import SVC
# Create and train the model
model = SVC(
kernel='rbf', # Kernel type. 'rbf' is a powerful default for non-linear problems. 'linear' for linear data.
C=1.0, # Regularization parameter. Controls the trade-off between a wide margin and classifying all points correctly.
gamma='scale' # Kernel coefficient for 'rbf'. 'scale' is a robust default setting.
)
model.fit(X_train_scaled, y_train)
mlpfu.pages.dev 21
ML Algorithms Explained
Decision Tree: Theory
Concept: A highly interpretable model that creates a flowchart of if-then-
else rules based on the data's features. It recursively splits the data into
subsets that are as "pure" (homogeneous) as possible.
Pros: Very easy to understand and visualize, requires no feature scaling.
Cons: Individual trees are prone to overfitting and can be unstable
(small changes in data can lead to a completely different tree).
When to Use: When model interpretability is a top priority. Also serves as
the building block for more powerful ensemble models like Random Forests.
mlpfu.pages.dev 22
ML Algorithms Explained
Decision Tree: Visualization
This image shows the flowchart-like structure of a trained decision tree. You
can follow the path from the root node down to a leaf to get a prediction.
mlpfu.pages.dev 23
ML Algorithms Explained
Decision Tree: Code
from sklearn.tree import DecisionTreeClassifier
# Create and train the model. Note: Trees do not require feature scaling.
model = DecisionTreeClassifier(
criterion='gini', # The function to measure the quality of a split ('gini' or 'entropy').
max_depth=3, # The maximum depth of the tree. Setting this is the primary way to prevent overfitting.
min_samples_leaf=1 # The minimum number of samples required to be at a leaf node.
)
model.fit(X_train, y_train)
mlpfu.pages.dev 24
ML Algorithms Explained
Hyperparameter Tuning: Theory
Concept: The process of finding the optimal settings for a model's
parameters that are not learned from the data (e.g., K in KNN, C in SVM).
This is done by systematically searching through a "grid" of possible
parameter values and evaluating each combination using cross-validation.
Why: Default parameters are rarely optimal. Tuning is crucial for
maximizing model performance.
How: GridSearchCV automates this search, making it a standard and
essential step in the modeling pipeline.
mlpfu.pages.dev 25
ML Algorithms Explained
Hyperparameter Tuning: Visualization
This heatmap shows the cross-validated accuracy for different combinations
of an SVM's C and gamma parameters. This allows you to visually identify
the region of best performance.
mlpfu.pages.dev 26
ML Algorithms Explained
Hyperparameter Tuning: Code
from sklearn.model_selection import GridSearchCV
from sklearn.svm import SVC
# 1. Define the grid of parameters you want to search
param_grid = {
'C': [0.1, 1, 10], # Test these regularization values
'gamma': [1, 0.1, 0.01], # Test these kernel coefficient values
'kernel': ['rbf']
}
# 2. Create the GridSearchCV object
grid_search = GridSearchCV(
estimator=SVC(), # The model you want to tune
param_grid=param_grid, # The parameter grid to search
cv=5, # Number of folds for cross-validation
scoring='accuracy', # The metric to optimize
verbose=1 # Set to 1 or higher to see progress updates
)
# 3. Fit it to the data. This will start the search.
grid_search.fit(X_train_scaled, y_train)
print(f"Best Parameters Found: {grid_search.best_params_}")
mlpfu.pages.dev 27