Moule 3
Moule 3
Machine Learning (ML) differs from Traditional Programming in several fundamental ways. Here’s a
comparison:
1. Approach to Problem-Solving
Traditional Programming:
o Developers write explicit rules (logic) to process input data and produce output.
Machine Learning:
o Instead of writing rules, the system learns patterns from data to make predictions.
2. Handling Complexity
Traditional Programming:
o Struggles with complex, ambiguous tasks (e.g., image recognition, natural language
processing).
Machine Learning:
o Excels at problems where rules are hard to define (e.g., spam detection,
recommendation systems).
3. Adaptability
Traditional Programming:
o Requires manual updates if rules change (e.g., updating tax laws means rewriting
code).
Machine Learning:
4. Data Dependency
Traditional Programming:
o Relies on logic written by developers.
Machine Learning:
Traditional Programming:
Machine Learning:
o Harder to debug (errors may come from data, model choice, or hyperparameters).
o Some models (e.g., deep learning) act as "black boxes" (hard to interpret).
6. Use Cases
Summary
o Number of bedrooms
o Location
Example:
Example:
Measures how well the model performs (difference between predictions and true values).
Example:
Example:
Given:
A dataset D={(x1,y1),(x2,y2),...,(xn,yn)}D={(x1,y1),(x2,y2),...,(xn,yn)}
Goal:
Find a function h∈Hh∈H that minimizes the expected loss over new, unseen data:
h∗=argminh∈HE(x,y)[L(h(x),y)]h∗=argh∈HminE(x,y)[L(h(x),y)]
Interpretation:
The model should generalize (perform well on unseen data, not just training data).
Avoid overfitting (memorizing training data) and underfitting (failing to learn patterns).
Problem Statement:
Formalization:
1. Input (X): House sizes [1000,1500,2000,...][1000,1500,2000,...].
1. What is a Model?
A model is a function that maps input features (X) to output predictions (Ŷ).
Examples of Models:
Linear
Y^=wX+bY^=wX+b Predicting house prices
Regression
Decision Tree Splits data based on feature thresholds Customer churn prediction
Parameters are the internal settings of a model that are learned from data during training.
The goal of training is to find the best parameters that minimize prediction errors.
Examples of Parameters:
Model Parameters Role
Neural Network Weights & biases in each neuron Adjusts how signals propagate
1. Initialize Parameters
2. Make Predictions
o Compute Y^=Model(X)Y^=Model(X).
3. Calculate Loss
o Compare predictions (Y^Y^) with true values (YY) using a loss function (e.g., Mean
Squared Error).
4. Update Parameters
5. Repeat
Parameters Hyperparameters
Learned from data (e.g., weights in a Set before training (e.g., learning rate, number of trees
neural network). in a random forest).
regression.
To build a reliable machine learning model, data is typically split into three distinct sets:
1. Training Data
2. Validation Data
3. Test Data
1. Training Data
Purpose
Characteristics
The larger the training set, the better the model can learn (but needs to be balanced with
validation/test data).
Example
In image classification, the model sees labeled images (e.g., "cat" or "dog") and adjusts its
weights to minimize prediction errors.
2.Validation Data
Purpose
Used to tune hyperparameters (e.g., learning rate, number of layers in a neural network).
Characteristics
Trying different learning rates (0.01 vs. 0.001) and picking the one that performs best on the
validation set.
3. Test Data
Purpose
Characteristics
Example
After training a spam classifier, you evaluate its accuracy on a held-out test set of emails.
Problem Solution
Overfitting (model works well on training data but fails on Validation set checks performance
new data) during training.
Example: If 20% of data is "spam," each set (train/val/test) keeps ~20% spam.
Key Functions:
o Parameters control how the model makes decisions (e.g., weights in a neural
network, coefficients in linear regression).
2. Optimize Predictions
3. Capture Patterns
Types of Parameters
Key Tradeoffs
1. Bias-Variance Tradeoff
o More parameters → Lower bias (fits training well) but higher variance (overfitting).
o Linear models (few params) are interpretable; deep learning (many params) is
powerful but opaque.
1. Classification Metrics
A table showing:
Actual: Yes TP FN
Actual: No FP TN
(B) Accuracy
Formula:
Accuracy=TP+TN/TP+TN+FP+FN
(C) Precision
Formula: Precision=TP/TP+FP
Use Case: Important when FP are costly (e.g., falsely flagging legit emails as spam).
Formula: Recall=TP/TP+FN
Use Case: Important when FN are costly (e.g., missing a cancer diagnosis).
(E) F1-Score
Formula: F1=2×Precision×Recall/Precision+Recall
Used when the output is a continuous value (e.g., house price, temperature).
Formula: MAE=1/n∑∣yi−y^i∣
Formula: MSE=1/n∑(yi−y^i)^2
Measures how similar a sample is to its own cluster vs. other clusters.
4. Key Takeaways
Best for: Balanced datasets where all classes are equally important
2. Precision
When to use: When false positives are costly (e.g., spam detection)
Example: High precision means when your model says "spam", it's very likely correct
3. Recall (Sensitivity)
When to use: When false negatives are dangerous (e.g., cancer detection)
Example: High recall means your model finds most actual positive cases
4. Confusion Matrix
Predicted
Positive Negative
Actual Positive TP FN
Actual Negative FP TN
5. Bias-Variance Tradeoff
o High bias = model is too simple (e.g., linear model for complex data)
6. Overfitting vs Underfitting
Overfitting Underfitting
Training
Excellent Poor
Performance
- Regularization
- More complex model
- More training data
Solutions - More features
- Feature selection
- Longer training
- Early stopping
Practical Implications
Model selection: Use validation set to find sweet spot in bias-variance tradeoff
Debugging:
Real-world Example
Overfitting = Model flags transactions as fraud based on random quirks in training data
1. Supervised Learning
Classificatio
Discrete categories Spam detection, image recognition
n
Algorithms:
Linear/Logistic Regression
Pros:
✔ Predictions are interpretable
✔ Well-established techniques
Cons:
❌ Requires labeled data (often expensive)
❌ May not generalize beyond training distribution
2. Unsupervised Learning
Algorithms:
K-Means, DBSCAN
Autoencoders
Pros:
✔ Works with unlabeled data
✔ Reveals hidden insights
Cons:
❌ Harder to evaluate objectively
❌ Results may be ambiguous
3. Semi-Supervised Learning
Speech recognition
Approaches:
Self-training
Co-training
Pros:
✔ Reduces labeling costs
✔ More robust than pure supervised
Cons:
❌ Complex implementation
❌ Quality depends on initial labeled data
Applications:
Robotics control
Autonomous vehicles
Algorithms:
Q-Learning
Pros:
✔ Can handle complex, dynamic environments
✔ Doesn't require pre-labeled data
Cons:
❌ Computationally expensive
❌ Hard to design proper reward functions
5. Self-Supervised Learning
Pros:
✔ Eliminates manual labeling
✔ Powerful for pre-training
Cons:
❌ Requires massive data
❌ Task-specific design needed
6. Transfer Learning
Definition: Applies knowledge from one task to another
Approach:
Applications:
Pros:
✔ Saves computation time
✔ Works well with limited target data
Cons:
❌ Potential negative transfer if domains mismatch
. Memory-Based Learning
Definition: Systems that store and retrieve specific training instances to make predictions.
Key Types:
1. Instance-Based Learning:
2. Case-Based Reasoning:
Characteristics:
Pros:
✔ Adapts easily to new data
✔ Handles complex relationships
Cons:
❌ Computationally expensive at runtime
❌ Sensitive to irrelevant features
. Hebbian Learning
Core Principle: "Neurons that fire together, wire together" (Donald Hebb, 1949)
Mechanism:
If activation is uncorrelated:
o Connection weakens
Mathematical Form:
Δw_ij = η x_i x_j
(where η is learning rate, x_i and x_j are neuron activations)
Applications:
Neuromorphic computing
Modern Variants:
A. Competitive Learning
Used in:
B. Error-Corrective Learning
Includes:
o Perceptron Learning Rule
o Delta Rule
2. Actor-Critic Methods:
D. Meta-Learning
"Learning to learn"
Includes:
E. Neuromodulated Learning
Enables:
o Context-dependent learning
Example
Learning Supervisi Weaknesse Best Use
Mechanism Strengths Algorith
Type on s Cases
ms
Works K-
Discovers Clustering,
Unsupervi with Hard to Means,
No labels hidden Dimensionali
sed unlabeled evaluate PCA,
patterns ty Reduction
data GANs
Label
Medical
Semi- Uses both Reduces Complex Propaga
Partial imaging,
Supervise labeled/unla labeling implementa tion,
labels Speech
d beled data costs tion Self-
recognition
Training
Handles Needs
Maximizes Q-
Reinforce Reward dynamic careful Game AI,
cumulative Learning,
ment signals environm reward Robotics
reward PPO
ents design
Creates BERT,
Self- Auto- Eliminate Requires NLP,
labels from Contrasti
Supervise generate s manual massive Computer
data ve
d d labeling data Vision
structure Learning
Saves
Fine-
computa
Leverages Domain tuning,
Transfer tion,
Varies pre-trained mismatch All domains Feature
Learning works
models risk Extractio
with little
n
data
k-NN,
Stores/ Adapts to Computatio Recommend Case-
Memory-
Varies retrieves new data nally ation Based
Based
instances easily expensive systems Reasonin
g
neuron BCM
plausible simple tasks ic computing
connections Theory
Fast
Learns Complex,
Meta- Multi- adaptatio Few-shot MAML,
learning data-
Learning task n to new learning Reptile
strategies hungry
tasks
Actual: Negative
True Negative (TN) False Positive (FP)
(0)
False Positives (FP): Negative samples wrongly predicted as positive (Type I error).
False Negatives (FN): Positive samples wrongly predicted as negative (Type II error).
Example:
Model Predictions:
Predicted: 0 Predicted: 1
Python code
cm = confusion_matrix(y_true, y_pred)]
Accuracy=TP+TN/TP+TN+FP+FN
Precision=TP/TP+FP
Recall=TP/TP+FN
F1=2×Precision×Recall/Precision+Recall
Example Calculations (from the email classifier):
Precision = 8 / (8 + 5) = 61.5%
Recall = 8 / (8 + 2) = 80%