0% found this document useful (0 votes)
28 views6 pages

Summary Machine Learning

The document provides an overview of machine learning, detailing its definition, types (supervised, unsupervised, reinforcement), and key concepts such as performance measures and common algorithms. It also covers regression techniques, decision trees, artificial neural networks, deep learning, and reinforcement learning, highlighting their applications and challenges. Additionally, it discusses the importance of data quality, ethics, and bias in machine learning systems.

Uploaded by

Sachit Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
28 views6 pages

Summary Machine Learning

The document provides an overview of machine learning, detailing its definition, types (supervised, unsupervised, reinforcement), and key concepts such as performance measures and common algorithms. It also covers regression techniques, decision trees, artificial neural networks, deep learning, and reinforcement learning, highlighting their applications and challenges. Additionally, it discusses the importance of data quality, ethics, and bias in machine learning systems.

Uploaded by

Sachit Varshney
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Unit 1: Introduction to Machine Learning

• Definition: ML enables computers to learn from data and improve tasks over time without explicit
programming 1 . For example, Netflix uses ML to recommend movies by learning user preferences
from viewing history.
• Types of Learning:
• Supervised Learning: Trains on labeled data (input–output pairs). The model learns to map inputs to
known outputs 1 . Example: classifying fruits by shown images labeled “apple” or “banana.”
• Unsupervised Learning: Trains on unlabeled data to discover patterns or groupings 2 . Example:
grouping coins by size or color without given labels.
• Reinforcement Learning: An agent learns by interacting with an environment and receiving rewards or
penalties, without explicit labels 3 4 .
• Key Differences (Supv vs Unsupv; Clustering vs Classification):
• Supervised learning uses pre-assigned labels; unsupervised has none. In classification (a supervised
task), the number of classes and labels are known beforehand, and each data point is assigned a
label. In clustering (unsupervised), groupings emerge from the data: class labels are unknown and
the number of clusters is not predefined 5 6 .
• Example: In classification, we may label emails as “spam” or “not spam” (labels known). In clustering,
we might group emails by similarity without knowing categories in advance.
• Well-Defined Learning Problem: Characterized by (T, P, E): a Task T, a Performance measure P, and
Experience E 7 . E.g., for a Checkers-playing program: T = playing checkers, P = % of games won, E
= self-play games 7 .
• Performance Measures: Important qualities of ML models include generality (works on varied
data), efficiency (learns quickly), robustness (handles noise), efficacy (overall effectiveness), and ease
of implementation 8 9 . For instance, a robust voice recognizer works in noisy rooms 10 .
• ML Approaches: Includes algorithms like Artificial Neural Networks (ANNs), Decision Trees, Bayesian
methods, Support Vector Machines (SVM), Genetic Algorithms, etc. 11 . Each approach solves
different problems (e.g., SVMs for classification, Genetic Algorithms for optimization).
• Data Science vs Machine Learning: Data Science involves data collection, cleaning, and analysis,
often using tools like Tableau or Spark to extract insights. Machine Learning is a subset of AI focused
on algorithms that learn from data 12 . DS might prepare data for business analysis, whereas ML
builds predictive models (e.g., ML for recommendation engines or image recognition 13 14 ).
• Issues in ML:
• Data Quality: Poor or biased data leads to unreliable models. For example, an unrepresentative facial
dataset can bias a face-recognition ML model.
• Overfitting: Models (like very deep trees) may learn training noise, performing poorly on new data
(mitigated by pruning or regularization).
• Transparency: Complex models (e.g., deep nets) can be “black boxes,” making it hard to explain
decisions (a concern in applications like credit scoring).
• Ethics and Bias: ML systems can inherit biases from data. Diverse teams and careful validation are
needed to avoid unfair models.

1
Unit 2: Regression & Bayesian Learning (and SVM)
• Linear Regression: Models the relationship between a dependent variable Y and one or more
independent variables X. For simple linear regression:

Y = a + bX + u,

where a is the intercept, b the slope (change in Y per unit X), and u the error term 15 . E.g., predicting
house price (Y) from house size (X). Multiple regression extends this to Y = a + b1 X1 + b2 X2 +
… + bn Xn + u .
• Logistic Regression: A classification method for binary outcomes. It uses the logistic (sigmoid)
1
function to model the probability that an input belongs to class 1. Output is p = 1+e −z where z =

b0 + b1 X1 + ... + bn Xn . The target is categorical (0 or 1). For example, predicting whether an email
is spam (1) or not (0) based on features. As a supervised model, logistic regression estimates the
probability of class membership. 16 .
• Linear vs Logistic Comparison: Linear regression predicts continuous values; logistic predicts
binary class probability. Linear uses a straight-line (linear) model, logistic uses a sigmoid activation.
(See comparison: linear for house prices vs. logistic for “disease vs no disease” 15 17 .)
• Bayes’ Theorem: A rule for updating probabilities given new evidence. In probability notation:

P (B∣A) P (A)
P (A∣B) =
P (B)

where P (A∣B) is the probability of hypothesis A given data B 18 . In ML, it’s used to compute
posterior probabilities of classes given features.
• Naïve Bayes Classifier: A simple probabilistic classifier applying Bayes’ theorem with a strong
independence assumption among features 19 . Despite this “naive” assumption, it often performs well
(e.g., in text classification and spam filtering 19 ). It computes P (C∣x1 , ..., xn ) ∝
P (C) ∏i P (xi ∣C) .
• Bayes Optimal Classifier: The theoretically optimal classifier that chooses the hypothesis with the
highest posterior probability (minimizes expected error) given the data. (Often intractable in
practice.)
• Bayesian Belief Networks (Bayesian Networks): Probabilistic graphical models representing
variables and their conditional dependencies via a directed acyclic graph (DAG) 20 . Each node has a
Conditional Probability Table (CPT) quantifying the effect of parent nodes. Used for reasoning under
uncertainty (e.g., diagnosing disease given symptoms) 20 .
• Support Vector Machine (SVM): A supervised learning algorithm for classification (and regression).
SVM finds the best separating hyperplane that maximizes the margin between classes 21 .
• Hyperplane: Decision boundary w ⋅ x + b = 0 in feature space 22 .
• Support Vectors: Data points closest to the hyperplane; they “support” the position of the boundary
22 .

• Margin: Distance between the hyperplane and support vectors; SVM maximizes this margin for
generalization 22 .
• Kernels: Functions that map inputs into higher-dimensional spaces so that non-linearly separable
data can become separable. Common kernels include linear, polynomial, and Gaussian (RBF) 23 24 .
(E.g., the Gaussian/RBF kernel allows forming nonlinear decision boundaries.)

2
• Properties: SVMs are effective in high-dimensional spaces and with clear margins. They are robust
to outliers (via soft margins) but can be slow on very large datasets. The regularization parameter C
controls trade-off between margin size and misclassification penalty 25 .

Unit 3: Decision Tree Learning & Instance-Based


Learning
• Decision Trees: A flowchart-like tree structure for decision making and prediction. Internal (decision)
nodes test input features, branches represent outcomes of the test, and leaf (terminal) nodes assign
class labels or output values 26 . For example, a credit-decision tree might test “Income > 50K?” at a
node, branching to “Yes/No” and eventually classifying as “Approve/Reject.”
• Key Terms:
◦ Root: The top node with the full dataset.
◦ Splitting: Dividing a node into sub-nodes based on a feature.
◦ Decision Node: A node where a feature is tested.
◦ Leaf Node: Final node giving a classification.
◦ Pruning: Removing branches with little effect to prevent overfitting (making the tree simpler).
• Advantages: Easy to interpret and visualize; requires minimal data preprocessing; handles numeric
and categorical data 27 .
• Limitations: Can overfit noisy data (requires pruning); decision boundaries are axis-aligned (may be
less accurate than some methods); prone to bias toward features with many levels 28 .
• Entropy & Information Gain: When building a tree, we measure the impurity of a node using
entropy (uncertainty). Splitting on a feature should reduce entropy. Information gain is defined as
the reduction in entropy achieved by the split. The ID3 algorithm (Iterative Dichotomiser 3) builds a
tree by choosing at each step the feature with the highest information gain 29 . (Entropy and info
gain calculations guide optimal splits.)
• ID3 Algorithm: A classic decision tree learning algorithm (Quinlan, 1986). ID3 uses a top-down,
greedy approach:
• If all examples have the same label, make a leaf with that label. Otherwise, pick the attribute with
highest information gain (from entropy) 29 .
• Split the dataset by that attribute’s values, creating child nodes.
• Recursively apply the same procedure to each child node.
ID3 is simple but can overfit; successors like C4.5 and CART improve it (handling continuous features,
pruning) 30 .
• Instance-Based Learning (Memory-Based): These methods do not build an explicit model but
compare new instances with stored examples.
• k-Nearest Neighbors (k-NN): Classifies a new point by looking at the k closest training examples
(e.g., by Euclidean distance) and taking a majority vote (for classification) or average (for regression)
of their labels 31 32 . Example: To classify a person’s weight category given height/weight, find k
similar people and pick the most common category.
◦ Steps: Choose k; compute distances from the query point to all training points; select the k
nearest; assign class by majority vote (or mean value). 32
◦ Advantages: Simple, no training time (lazy learning), adapts to new data easily 33 .
◦ Disadvantages: High storage and computation (needs all data for each query), sensitive to
feature scaling, and slow for large datasets 34 35 .

3
• Locally Weighted Regression: A non-parametric regression that fits a linear model around the
query point, giving higher weight to nearby training examples 36 37 . It produces a smooth curve
through local fits (also called LOWESS/LOESS).
• Radial Basis Function (RBF) Networks: A type of neural network where hidden units are RBFs (e.g.,
Gaussian functions). They model complex nonlinear relationships by combining localized responses
38 . RBF networks can be used for interpolation and classification when linear models are

insufficient.
• Case-Based Learning: Similar to k-NN, systems store and adapt past cases to new problems (not
detailed here).

Unit 4: Artificial Neural Networks & Deep Learning


• Artificial Neuron (Perceptron): A computational unit inspired by biological neurons. It sums
weighted inputs and passes the sum through an activation function. Components 39 40 :
• Inputs xi with weights wi ; compute z = ∑i wi xi + b .
• Activation function f (z) (e.g. step, sigmoid, ReLU) produces the output.
• ANN Definition: An Artificial Neural Network (ANN) is a network of interconnected artificial neurons
organized in layers 41 . ANNs learn patterns from data; weights are adjusted to minimize error on
training examples.
• Network Architecture:
• Input Layer: Receives raw input features (one node per feature).
• Hidden Layers: Intermediate layers with neurons that detect features at various abstraction levels.
There can be none, one, or many hidden layers. More hidden layers (deeper network) can model
more complex functions 42 .
• Output Layer: Produces the final output (one node per target class or regression value) 43 .
• Example: A single hidden-layer feedforward network with sigmoid neurons is a multilayer perceptron
(MLP).
• Perceptron (Single-Layer): The simplest ANN with no hidden layer. It’s a linear classifier: output is 1
if w ⋅ x + b > 0 , else 0. Learning rule (Delta rule) updates weights for misclassified samples:

wj ← wj + η(ti − yi )xij

where ti is the target, yi the output, η the learning rate 44 . The perceptron converges only if data is
linearly separable 45 . Otherwise, we need multilayer networks.
• Multilayer Perceptron (MLP) & Backpropagation: MLPs have one or more hidden layers with
nonlinear activations (e.g., sigmoid or ReLU). They can model non-linear decision boundaries.
Training is done via backpropagation: compute output error, propagate it backward through the
network, and update each weight by gradient descent to minimize loss 46 44 .
• Gradient Descent (Delta Rule): For a weight w , update Δw = η δ x , where δ is the error term from
backprop.
• Activation Functions: Introduce non-linearity. Common choices: sigmoid (logistic), tanh, ReLU
(Rectified Linear Unit). Without them, the network would collapse to a linear model.
• Generalization: The goal is not just to fit training data but to perform well on new data. Techniques
like weight regularization, dropout, and early stopping help prevent overfitting. Dropout randomly
“drops” some neurons during training to improve robustness.

4
• Self-Organizing Map (SOM): An unsupervised network that projects high-dimensional data onto a
lower-dimensional (often 2D) grid while preserving topological properties 47 . Neurons compete and
adjust to represent clusters of similar inputs. Useful for visualization and clustering.
• Deep Learning – Convolutional Neural Networks (CNNs): CNNs are deep networks specialized for
grid-like data (e.g. images) 48 . They automatically learn spatial hierarchies of features through
layers:
• Convolutional Layers: Use learnable filters (kernels) that slide over the input (e.g. image) to extract
feature maps 49 . Each convolution computes dot-products between filter and local patches of the
input, highlighting features like edges. Stride and padding control output size 50 .
• Pooling Layers: Reduce spatial dimensions of feature maps, retaining important information.
Commonly max-pooling (taking the maximum in each patch) or average-pooling 51 . Pooling adds
translation invariance and reduces computation.
• Fully Connected (Dense) Layers: After convolutions and pooling, the final features are flattened into a
vector and passed through one or more dense layers to perform classification or regression 52 .
• Dropout Layers: During training, randomly deactivate a fraction of neurons to prevent overfitting;
during testing, all neurons are used with scaled outputs 53 .
• Example Applications: CNNs excel at image tasks (e.g., classifying retinal images for disease, object
detection) and have been used to build smart speakers, self-driving cars, etc.
• Other Deep Models: Recurrent Neural Networks (RNNs) for sequences, and various autoencoders/
transformers for different data types (beyond scope here).

Unit 5: Reinforcement Learning & Genetic


Algorithms
• Reinforcement Learning (RL): An area where an agent learns to make sequences of decisions by
interacting with an environment 3 . At each step, the agent observes the state, takes an action, and
receives a reward (positive or negative). The goal is to learn a policy maximizing cumulative reward.
• Key Components: Agent, Environment, State, Action, Reward 54 .
• Learning Process: The agent explores possible actions; correct actions yield positive rewards, incorrect
yield penalties 3 . Unlike supervised learning, RL has no labeled examples; learning is through trial
and error 55 .
• Examples: Training a robot to walk, or a program to play games (e.g., chess, Go) by rewarding wins. In
a maze game, avoiding pitfalls and reaching the goal increases the cumulative reward.
• Challenges: Defining appropriate reward signals (designing reward functions) can be complex. RL
may not suit problems with static datasets (e.g., object recognition is better solved by supervised
classifiers) 56 .
• Comparison Table: (RL vs Supervised vs Unsupervised) – RL gets continuous feedback from the
environment; supervised uses fixed labeled data; unsupervised has no feedback 57 . For example,
RL is used in robotics and game playing, whereas classification/regression are supervised tasks.
• Markov Decision Process (MDP): A formal model for RL problems. Defined by states, actions,
transition probabilities, and rewards 58 . The Markov property means the next state depends only on
the current state and action (memoryless) 59 . A policy π(s) specifies which action to take in each
state. The agent seeks an optimal policy maximizing expected reward.
• Q-Learning: A popular model-free RL algorithm. It learns an action-value function Q(s, a)
representing expected return starting from state s, taking action a, and following the optimal policy
thereafter 60 . The core update rule (for discounted rewards) is:

5
Q(s, a) ← Q(s, a) + α(r + γ max

Q(s′ , a′ ) − Q(s, a)),
a

where r is the reward received, s′ the next state, γ the discount factor, and α the learning rate. Over
episodes, Q converges to the optimal action values. (E.g., in a grid world, Q-values tell the agent
which moves lead to the highest reward.) 60 .
• Deep Q-Learning: Uses a neural network to approximate the Q-function for large or continuous
state spaces. The network takes state (or observations) as input and outputs Q-values for actions.
This allows RL on complex tasks (e.g. Deep Q-Networks that learned to play Atari games) 61 .
• Other RL Algorithms: Besides Q-learning, there are SARSA (on-policy learning), policy gradient
methods, Actor-Critic models, etc. (Not detailed here.)
• Genetic Algorithms (GA): Optimization and search methods inspired by natural evolution 62 . A
population of candidate solutions (“chromosomes”) evolves over generations to optimize a fitness
function.
• Process:
1. Initialization: Start with a random population of n candidate solutions (chromosomes) 63 .
2. Fitness Evaluation: Compute a fitness score for each solution (how well it solves the
problem) 63 .
3. Selection: Probabilistically select pairs of parent chromosomes, favoring higher fitness 64 .
4. Crossover: Exchange parts of parents’ chromosomes to create offspring (simulating
reproduction) 64 .
5. Mutation: Randomly alter some genes in offspring to maintain diversity 64 .
6. Replacement: Form a new generation (often replacing the old population). Repeat until
convergence or max generations.
• Example: Optimizing a car’s shape to minimize air resistance. Each “chromosome” encodes design
parameters (shape, size), fitness is measured by simulated drag, and over generations the designs
evolve to more aerodynamic forms 65 .
• Applications: Used in engineering design, scheduling, feature selection, and more where
conventional optimization is hard. GAs handle complex search spaces but can be computationally
expensive.

Sources: Concepts and definitions are drawn from the Machine Learning in 7 Hours notes 1 15 64 3 ,
supplemented with standard ML references 18 21 .

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 20 26 27 28 29 30 31 32 33 34 35 36 37

38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65

Machine_Learning_in_7_Hours_lyst1728638868665.pdf
file://file-NxRPtsPQiS4c1bXRKGZ3Ma

18 19 Bayes Theorem in Machine learning - GeeksforGeeks


https://www.geeksforgeeks.org/machine-learning/bayes-theorem-in-machine-learning/

21 22 23 24 25 Support Vector Machine (SVM) Algorithm - GeeksforGeeks


https://www.geeksforgeeks.org/machine-learning/support-vector-machine-algorithm/

You might also like