0% found this document useful (0 votes)
1K views4 pages

Scalable Machine Learning

The document outlines various scalable machine learning techniques, including online and distributed learning, semi-supervised learning, active learning, reinforcement learning, inference in graphical models, and Bayesian learning. Each section provides definitions, algorithms, applications, and examples for these methods, emphasizing their utility in handling large datasets and improving model performance. Key applications range from real-time recommendation systems to medical diagnostics and game AI.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views4 pages

Scalable Machine Learning

The document outlines various scalable machine learning techniques, including online and distributed learning, semi-supervised learning, active learning, reinforcement learning, inference in graphical models, and Bayesian learning. Each section provides definitions, algorithms, applications, and examples for these methods, emphasizing their utility in handling large datasets and improving model performance. Key applications range from real-time recommendation systems to medical diagnostics and game AI.

Uploaded by

Saurabh Sarkar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 4

Unit 5 Scalable Machine Learning (Online and Distributed Learning)

🔹 1. Scalable Machine Learning: Online and Distributed Learning


✅ Definition:
Scalable machine learning refers to techniques that allow models to be trained efficiently
on large-scale datasets, often by distributing computation or processing data
sequentially.
✅ Types:
❖ Online Learning:
 Learns incrementally from one data point at a time.
 Useful when data arrives continuously (streaming data).
Algorithms:
 Online Gradient Descent (OGD)
 Follow-the-Regularized-Leader (FTRL)
 Passive-Aggressive Algorithms
❖ Distributed Learning:
 Splits data across multiple machines/nodes for parallel training.
 Often used with frameworks like Apache Spark , TensorFlow , or PyTorch
distributed .
Algorithms:
 Distributed Stochastic Gradient Descent (DSGD)
 AllReduce and Parameter Server Architectures
 Federated Learning : trains models across decentralized devices (e.g., mobile
phones)
✅ Applications:
 Real-time recommendation systems
 Fraud detection in streaming transactions
 Large-scale image classification
 Edge computing and IoT
✅ Example:
 Training a model on user clicks as they happen using online learning.
 Using Apache Spark to train logistic regression on terabytes of clickstream data.

🔹 2. Semi-Supervised Learning
✅ Definition:
A hybrid approach where only a small portion of the data is labeled, and the rest is
unlabeled. The model leverages both to improve performance.
✅ Algorithms:
 Self-training : Model labels unlabeled data iteratively and re-trains.
 Co-training : Uses multiple views of data; two classifiers teach each other.
 Graph-based methods : Build graphs connecting similar instances and propagate
labels.
 Consistency Regularization : Enforces consistency in predictions over
perturbations.
✅ Applications:
 Medical imaging (few expert annotations)
 Web document classification (few manually labeled pages)
 Customer feedback analysis with limited human tagging
✅ Example:
 Classifying news articles using only a few manually labeled ones and many
unlabeled.

🔹 3. Active Learning
✅ Definition:
An iterative process where the model actively selects which data points it wants to be
labeled next to maximize learning efficiency.
✅ Procedure:
1. Train initial model on a small labeled set.
2. Use model to predict on unlabeled data.
3. Select most informative samples (e.g., uncertain predictions).
4. Query an oracle (human annotator) to label those.
5. Retrain model with new labels.
6. Repeat until performance goals met or budget exhausted.
✅ Query Strategies:
 Uncertainty Sampling
 Query-by-Committee
 Expected Model Change
 Information Density
✅ Applications:
 Reducing annotation cost in NLP tasks
 Labeling rare events (e.g., fraud detection)
 Efficient labeling in robotics or autonomous systems
✅ Example:
 A medical diagnosis system queries doctors only for ambiguous cases.

🔹 4. Reinforcement Learning (RL)


✅ Definition:
A type of machine learning where an agent learns to make decisions by interacting with
an environment to maximize cumulative reward.
✅ Key Components:
 Agent : Learner and decision-maker
 Environment : What the agent interacts with
 State : Current situation
 Action : What the agent does
 Reward : Feedback signal after action
 Policy : Strategy mapping states to actions
✅ Algorithms:
 Q-Learning (model-free, tabular)
 Deep Q-Networks (DQN) – combines RL with deep learning
 Policy Gradient Methods
 Actor-Critic Methods
 Monte Carlo Tree Search (MCTS)
✅ Applications:
 Game AI (AlphaGo, Dota bots)
 Robotics control
 Autonomous vehicles
 Resource management
 Personalized recommendations
✅ Example:
 Training a robot to walk using trial and error with rewards for stable movement.

🔹 5. Inference in Graphical Models


✅ Definition:
Graphical models (like Bayesian Networks and Markov Random Fields) represent
probabilistic relationships between variables via graphs. Inference involves computing
posterior probabilities given observed variables.
✅ Types:
 Exact Inference :
 Variable Elimination
 Belief Propagation (Sum-Product Algorithm)
 Approximate Inference :
 Markov Chain Monte Carlo (MCMC) sampling
 Variational Inference
 Loopy Belief Propagation
✅ Applications:
 Medical diagnosis systems
 Image segmentation
 Natural language understanding
 Anomaly detection in networks
✅ Example:
 Given symptoms, infer the most likely disease using a Bayesian Network.

🔹 6. Introduction to Bayesian Learning and Inference


✅ Definition:
Bayesian learning uses Bayes' theorem to update the probability of hypotheses as more
evidence becomes available. It treats parameters as random variables with prior
distributions.
✅ Key Concepts:
 Prior : Initial belief about parameters
 Likelihood : Probability of data given parameters
 Posterior : Updated belief after seeing data
 Predictive Distribution : Used for making predictions
✅ Algorithms:
 Bayesian Linear Regression
 Gaussian Processes
 Bayesian Neural Networks
 Variational Inference
 Hamiltonian Monte Carlo (HMC)
✅ Applications:
 Uncertainty quantification in predictions
 Clinical trials
 Financial forecasting
 Sensor fusion
✅ Example:
 Estimating the probability of rain tomorrow based on historical weather patterns
and today's forecast.
✅ Summary Table
TOPIC TYPE CORE IDEA ALGORITHM(S) APPLICATION
Scalable ML Optimizatio Handle large-scale SGD, FTRL, Real-time
n data with Parameter Server recommendation
online/distributed systems
methods
Semi- Hybrid Use small labeled + Self-training, Document
supervised Learning large unlabeled Consistency Reg. classification
Learning dataset
Active Data- Selectively query Uncertainty Medical diagnostics
Learning efficient most informative Sampling, Expected
Learning samples Model Change
Reinforcemen Decision Learn optimal Q-learning, DQN, Game AI, Robotics
t Learning Making policy through Policy Gradients
reward
maximization
Inference in Probabilistic Reason about Belief Propagation, Disease diagnosis
Graphical Modeling uncertainty in MCMC
Models complex systems
Bayesian Uncertainty Update beliefs Gaussian Process, Risk modeling, sensor
Learning Modeling using Bayes’ Variational fusion
Theorem Inference

You might also like