Scalable Machine Learning

The document outlines various scalable machine learning techniques, including online and distributed learning, semi-supervised learning, active learning, reinforcement learning, inference in graphical models, and Bayesian learning. Each section provides definitions, algorithms, applications, and examples for these methods, emphasizing their utility in handling large datasets and improving model performance. Key applications range from real-time recommendation systems to medical diagnostics and game AI.

Uploaded by

Saurabh Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

1K views4 pages

Scalable Machine Learning

Uploaded by

Saurabh Sarkar

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit 5 Scalable Machine Learning (Online and Distributed Learning)

🔹 1. Scalable Machine Learning: Online and Distributed Learning

✅ Definition:
Scalable machine learning refers to techniques that allow models to be trained efficiently
on large-scale datasets, often by distributing computation or processing data
sequentially.
✅ Types:
❖ Online Learning:
 Learns incrementally from one data point at a time.
 Useful when data arrives continuously (streaming data).
Algorithms:
 Online Gradient Descent (OGD)
 Follow-the-Regularized-Leader (FTRL)
 Passive-Aggressive Algorithms
❖ Distributed Learning:
 Splits data across multiple machines/nodes for parallel training.
 Often used with frameworks like Apache Spark , TensorFlow , or PyTorch
distributed .
Algorithms:
 Distributed Stochastic Gradient Descent (DSGD)
 AllReduce and Parameter Server Architectures
 Federated Learning : trains models across decentralized devices (e.g., mobile
phones)
✅ Applications:
 Real-time recommendation systems
 Fraud detection in streaming transactions
 Large-scale image classification
 Edge computing and IoT
✅ Example:
 Training a model on user clicks as they happen using online learning.
 Using Apache Spark to train logistic regression on terabytes of clickstream data.

🔹 2. Semi-Supervised Learning
✅ Definition:
A hybrid approach where only a small portion of the data is labeled, and the rest is
unlabeled. The model leverages both to improve performance.
✅ Algorithms:
 Self-training : Model labels unlabeled data iteratively and re-trains.
 Co-training : Uses multiple views of data; two classifiers teach each other.
 Graph-based methods : Build graphs connecting similar instances and propagate
labels.
 Consistency Regularization : Enforces consistency in predictions over
perturbations.
✅ Applications:
 Medical imaging (few expert annotations)
 Web document classification (few manually labeled pages)
 Customer feedback analysis with limited human tagging
✅ Example:
 Classifying news articles using only a few manually labeled ones and many
unlabeled.

🔹 3. Active Learning
✅ Definition:
An iterative process where the model actively selects which data points it wants to be
labeled next to maximize learning efficiency.
✅ Procedure:
1. Train initial model on a small labeled set.
2. Use model to predict on unlabeled data.
3. Select most informative samples (e.g., uncertain predictions).
4. Query an oracle (human annotator) to label those.
5. Retrain model with new labels.
6. Repeat until performance goals met or budget exhausted.
✅ Query Strategies:
 Uncertainty Sampling
 Query-by-Committee
 Expected Model Change
 Information Density
✅ Applications:
 Reducing annotation cost in NLP tasks
 Labeling rare events (e.g., fraud detection)
 Efficient labeling in robotics or autonomous systems
✅ Example:
 A medical diagnosis system queries doctors only for ambiguous cases.

🔹 4. Reinforcement Learning (RL)

✅ Definition:
A type of machine learning where an agent learns to make decisions by interacting with
an environment to maximize cumulative reward.
✅ Key Components:
 Agent : Learner and decision-maker
 Environment : What the agent interacts with
 State : Current situation
 Action : What the agent does
 Reward : Feedback signal after action
 Policy : Strategy mapping states to actions
✅ Algorithms:
 Q-Learning (model-free, tabular)
 Deep Q-Networks (DQN) – combines RL with deep learning
 Policy Gradient Methods
 Actor-Critic Methods
 Monte Carlo Tree Search (MCTS)
✅ Applications:
 Game AI (AlphaGo, Dota bots)
 Robotics control
 Autonomous vehicles
 Resource management
 Personalized recommendations
✅ Example:
 Training a robot to walk using trial and error with rewards for stable movement.

🔹 5. Inference in Graphical Models

✅ Definition:
Graphical models (like Bayesian Networks and Markov Random Fields) represent
probabilistic relationships between variables via graphs. Inference involves computing
posterior probabilities given observed variables.
✅ Types:
 Exact Inference :
 Variable Elimination
 Belief Propagation (Sum-Product Algorithm)
 Approximate Inference :
 Markov Chain Monte Carlo (MCMC) sampling
 Variational Inference
 Loopy Belief Propagation
✅ Applications:
 Medical diagnosis systems
 Image segmentation
 Natural language understanding
 Anomaly detection in networks
✅ Example:
 Given symptoms, infer the most likely disease using a Bayesian Network.

🔹 6. Introduction to Bayesian Learning and Inference

✅ Definition:
Bayesian learning uses Bayes' theorem to update the probability of hypotheses as more
evidence becomes available. It treats parameters as random variables with prior
distributions.
✅ Key Concepts:
 Prior : Initial belief about parameters
 Likelihood : Probability of data given parameters
 Posterior : Updated belief after seeing data
 Predictive Distribution : Used for making predictions
✅ Algorithms:
 Bayesian Linear Regression
 Gaussian Processes
 Bayesian Neural Networks
 Variational Inference
 Hamiltonian Monte Carlo (HMC)
✅ Applications:
 Uncertainty quantification in predictions
 Clinical trials
 Financial forecasting
 Sensor fusion
✅ Example:
 Estimating the probability of rain tomorrow based on historical weather patterns
and today's forecast.
✅ Summary Table
TOPIC TYPE CORE IDEA ALGORITHM(S) APPLICATION
Scalable ML Optimizatio Handle large-scale SGD, FTRL, Real-time
n data with Parameter Server recommendation
online/distributed systems
methods
Semi- Hybrid Use small labeled + Self-training, Document
supervised Learning large unlabeled Consistency Reg. classification
Learning dataset
Active Data- Selectively query Uncertainty Medical diagnostics
Learning efficient most informative Sampling, Expected
Learning samples Model Change
Reinforcemen Decision Learn optimal Q-learning, DQN, Game AI, Robotics
t Learning Making policy through Policy Gradients
reward
maximization
Inference in Probabilistic Reason about Belief Propagation, Disease diagnosis
Graphical Modeling uncertainty in MCMC
Models complex systems
Bayesian Uncertainty Update beliefs Gaussian Process, Risk modeling, sensor
Learning Modeling using Bayes’ Variational fusion
Theorem Inference

Sparse Modelling
No ratings yet
Sparse Modelling
13 pages
Evaluating Machine Learning Models
100% (2)
Evaluating Machine Learning Models
10 pages
Evaluating Machine Learning Algorithms and Model Selection
No ratings yet
Evaluating Machine Learning Algorithms and Model Selection
3 pages
Distance-Based Methods - KNN
0% (1)
Distance-Based Methods - KNN
8 pages
Machine Learning Techniques Unit-1 (KAI-601)
No ratings yet
Machine Learning Techniques Unit-1 (KAI-601)
78 pages
Tangent Prop and Manifold Tangent Classifier Are B
No ratings yet
Tangent Prop and Manifold Tangent Classifier Are B
4 pages
Unit4 - Chain Rule and Backpropagation
No ratings yet
Unit4 - Chain Rule and Backpropagation
4 pages
Unit - 3 ML
No ratings yet
Unit - 3 ML
17 pages
ML Unit 1
100% (1)
ML Unit 1
44 pages
UNIT2
No ratings yet
UNIT2
25 pages
Deep Learning Unit-II
No ratings yet
Deep Learning Unit-II
19 pages
Dimensionality Reduction in Machine Learning
No ratings yet
Dimensionality Reduction in Machine Learning
4 pages
Unit 5 Reinforcement Learning Notes
No ratings yet
Unit 5 Reinforcement Learning Notes
20 pages
What Is Gradient Based Learning in Deep Learning
100% (1)
What Is Gradient Based Learning in Deep Learning
12 pages
PAC Learning and VC Dimension Explained
No ratings yet
PAC Learning and VC Dimension Explained
31 pages
SVM Notes
No ratings yet
SVM Notes
8 pages
Class Notes Unit 2 ML Material
No ratings yet
Class Notes Unit 2 ML Material
31 pages
UNIT-2 ML Notes
No ratings yet
UNIT-2 ML Notes
15 pages
Unit 3 Full Notes
No ratings yet
Unit 3 Full Notes
30 pages
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
No ratings yet
Explain The Concept of Unfolding Computational Graphs in The Context of Recurrent Neural Networks
9 pages
ML Notes MAKAUT 7th Sem
100% (2)
ML Notes MAKAUT 7th Sem
31 pages
ML - Viva QnA - Doubtly - in
No ratings yet
ML - Viva QnA - Doubtly - in
14 pages
Introduction to Machine Learning Concepts
No ratings yet
Introduction to Machine Learning Concepts
22 pages
UNIT-II Regularization in Deep Learning
No ratings yet
UNIT-II Regularization in Deep Learning
24 pages
Efficient Convolution Algorithms
100% (1)
Efficient Convolution Algorithms
13 pages
Ai Unit 4 Notes
No ratings yet
Ai Unit 4 Notes
36 pages
Operations Research (Organizer)
No ratings yet
Operations Research (Organizer)
141 pages
AIML 4th and 5th Module Notes
No ratings yet
AIML 4th and 5th Module Notes
77 pages
Unit I Notes Machine Learning Techniques 1
No ratings yet
Unit I Notes Machine Learning Techniques 1
21 pages
Neural Networks & Deep Learning Makaut & & 7th SemNotes
No ratings yet
Neural Networks & Deep Learning Makaut & & 7th SemNotes
36 pages
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
100% (1)
Chap 11 12 - Practical Methodology and Applications - Heechul Lim
60 pages
DL Unit - 5
No ratings yet
DL Unit - 5
14 pages
Machine Learning Tutorial
100% (2)
Machine Learning Tutorial
775 pages
R20 Iii-Ii ML Lab Manual
100% (1)
R20 Iii-Ii ML Lab Manual
79 pages
Experiment-7: Implementation of K-Means Clustering Algorithm
No ratings yet
Experiment-7: Implementation of K-Means Clustering Algorithm
3 pages
NN UNIT-1 Complete Notes With 153 Pages
No ratings yet
NN UNIT-1 Complete Notes With 153 Pages
153 pages
Foundations of Machine Learning
100% (4)
Foundations of Machine Learning
202 pages
CLASS NOTES Unit 1 ML Material
No ratings yet
CLASS NOTES Unit 1 ML Material
42 pages
ML UNIT-4 Notes PDF
100% (1)
ML UNIT-4 Notes PDF
40 pages
ML UNIT 2 Sir
No ratings yet
ML UNIT 2 Sir
46 pages
A Probabilistic Theory of Deep Learning: Unit 2
100% (3)
A Probabilistic Theory of Deep Learning: Unit 2
17 pages
UNIT1
No ratings yet
UNIT1
38 pages
ML Unit 1
No ratings yet
ML Unit 1
15 pages
Machine Learning Unit 5
No ratings yet
Machine Learning Unit 5
43 pages
ML Unit V
No ratings yet
ML Unit V
12 pages
Instance-Based Learning in Machine Learning
100% (1)
Instance-Based Learning in Machine Learning
49 pages
Unit I
0% (1)
Unit I
21 pages
NNDL Unit 3: Deep Learning Overview
No ratings yet
NNDL Unit 3: Deep Learning Overview
17 pages
Data Mining Metrices
No ratings yet
Data Mining Metrices
6 pages
Associative Memory Neural Networks
100% (2)
Associative Memory Neural Networks
35 pages
Autoencoders - Buffalo University
100% (1)
Autoencoders - Buffalo University
36 pages
KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN
0% (1)
KNN Is A Very Simple Algorithm Used To Solve Classification Problems. KNN Stands For K-Nearest Neighbors. K Is The Number of Neighbors in KNN
9 pages
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
No ratings yet
1000 Machine Learning MCQ (Multiple Choice Questions) - Sanfoundry
16 pages
Lec-1 ML Intro
No ratings yet
Lec-1 ML Intro
15 pages
KNN Solved Numerical Problem (Regression)
No ratings yet
KNN Solved Numerical Problem (Regression)
3 pages
Implementing MLPs with Keras
No ratings yet
Implementing MLPs with Keras
61 pages
DL Unit Wise Important Questions
No ratings yet
DL Unit Wise Important Questions
2 pages
Distributed Cost Model
0% (1)
Distributed Cost Model
52 pages
Unit V Graphical Models
No ratings yet
Unit V Graphical Models
23 pages
Machine Learning: Principles and Practices
No ratings yet
Machine Learning: Principles and Practices
5 pages
Supervised Learning
No ratings yet
Supervised Learning
237 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Feed Forward Neural Network
No ratings yet
Feed Forward Neural Network
145 pages
Deep Learning
No ratings yet
Deep Learning
8 pages
B. Tech 7th Sem Project Rubric
No ratings yet
B. Tech 7th Sem Project Rubric
2 pages
Training Neural Network
No ratings yet
Training Neural Network
114 pages
Electives
No ratings yet
Electives
3 pages
Gradient Descent
No ratings yet
Gradient Descent
6 pages
Water Pollution and Control
No ratings yet
Water Pollution and Control
70 pages
Math 3rd Sem Notes
No ratings yet
Math 3rd Sem Notes
3 pages
Multivariable Calculus Guide
100% (4)
Multivariable Calculus Guide
326 pages
BSCG Programme Guide 2019 Final PDF
0% (1)
BSCG Programme Guide 2019 Final PDF
137 pages
Applications in Neural Network and Deep Learning
No ratings yet
Applications in Neural Network and Deep Learning
4 pages
Part of Speech for "Abdicate"
No ratings yet
Part of Speech for "Abdicate"
2 pages
NEST 2020 Exam Results Analysis
No ratings yet
NEST 2020 Exam Results Analysis
52 pages
Bengali Cuisine PDF
67% (3)
Bengali Cuisine PDF
20 pages
Geomage Modules for Borehole Imaging
No ratings yet
Geomage Modules for Borehole Imaging
2 pages
Mag2600 Pulse Secure
No ratings yet
Mag2600 Pulse Secure
21 pages
04 - IoT - MicroPython
No ratings yet
04 - IoT - MicroPython
24 pages
ERP Interview Questions Guide
100% (1)
ERP Interview Questions Guide
4 pages
Flagship: Visualsmp Users Manual
No ratings yet
Flagship: Visualsmp Users Manual
238 pages
IBM+BPM+"Housekeeping"+Best+Practices IBM Internal
No ratings yet
IBM+BPM+"Housekeeping"+Best+Practices IBM Internal
13 pages
Draft 4 ProjectGovCompWireframes-Update
No ratings yet
Draft 4 ProjectGovCompWireframes-Update
37 pages
Negative Effects of Online Games on Students
No ratings yet
Negative Effects of Online Games on Students
32 pages
ITIL Service Operation Overview
No ratings yet
ITIL Service Operation Overview
77 pages
DES-DD33 SA PowerProtectDD Specialist Exam
No ratings yet
DES-DD33 SA PowerProtectDD Specialist Exam
5 pages
Excel1-Module 3 Lesson
No ratings yet
Excel1-Module 3 Lesson
14 pages
Excel Course Roadmap
No ratings yet
Excel Course Roadmap
3 pages
Trainer Profile Format
No ratings yet
Trainer Profile Format
3 pages
Ansh
No ratings yet
Ansh
2 pages
Installing and Registering FSUIPC7
No ratings yet
Installing and Registering FSUIPC7
8 pages
11 Notes CH 1
No ratings yet
11 Notes CH 1
12 pages
Understanding Data Types in C Programming
No ratings yet
Understanding Data Types in C Programming
34 pages
OKCL
No ratings yet
OKCL
2 pages
ERP Success Factors in Fashion Industry
No ratings yet
ERP Success Factors in Fashion Industry
31 pages
Getting Started With Scratch
No ratings yet
Getting Started With Scratch
48 pages
Autonomous Paintball Sentry Gun
100% (1)
Autonomous Paintball Sentry Gun
15 pages
Student Result: Session: Semesters: Result: Marks: COP: Audit 1
No ratings yet
Student Result: Session: Semesters: Result: Marks: COP: Audit 1
2 pages
Bastion
No ratings yet
Bastion
10 pages
Pharmacy Management System Data Flow Data Flow Diagram
No ratings yet
Pharmacy Management System Data Flow Data Flow Diagram
3 pages
Managing Information Systems: Seventh Canadian Edition
No ratings yet
Managing Information Systems: Seventh Canadian Edition
37 pages
SDLC Guide for Aspiring Analysts
No ratings yet
SDLC Guide for Aspiring Analysts
17 pages
MoreCore Basic User Manual
No ratings yet
MoreCore Basic User Manual
14 pages
BIT 100 Assignment 1
No ratings yet
BIT 100 Assignment 1
11 pages
MAN-PMI Off PDF
No ratings yet
MAN-PMI Off PDF
92 pages
En Product-Flyer Axiocam 208-Color
No ratings yet
En Product-Flyer Axiocam 208-Color
4 pages

Scalable Machine Learning

Uploaded by

Scalable Machine Learning

Uploaded by

Unit 5 Scalable Machine Learning (Online and Distributed Learning)

🔹 1. Scalable Machine Learning: Online and Distributed Learning

🔹 4. Reinforcement Learning (RL)

🔹 5. Inference in Graphical Models

🔹 6. Introduction to Bayesian Learning and Inference

You might also like