What is Gen AI?
Generative artificial intelligence (Generative AI, GenAI) is a subset of artificial intelligence that uses
generative models to produce text, images, videos, or other forms of data in response to prompts.
It can produce a variety of novel content, such as images, video, music, speech, text, software code and
product designs.
These models learn the underlying patterns and structures of their training data and use them to produce new
data.
Essence of Generative Models
Generative models are the creative minds of the world of AI. And At their core, these models are designed to
capture and reproduce the underlying data distribution.
In simpler terms, they learn the patterns and structures within a dataset and use that knowledge to generate
new data that is similar to what they’ve learned
Probability Distributions and Generative Models
To understand generative models, we need to dive into the world of probability distributions. At the heart of
generative models is the concept of probability distributions.
The accuracy and quality of Generative AI outputs depend entirely on the data it is trained on.
Generative models aim to learn the probability distribution of the training data to generate new, similar data
points.
Probability distributions provide a mathematical framework for describing the likelihood of different
outcomes in an uncertain world.
They are practical instruments that empower us to model and manage uncertainty throughout the machine
learning lifecycle.
Probability Distributions: A Cornerstone of AI
Probability distributions are a cornerstone of data science, and artificial intelligence.
Probability distributions play a fundamental role in modeling uncertainty of information and data, allow us to
simulate real-world scenarios, calibrate model outputs, and even guide algorithm selection.
Embracing probability distributions is key to developing reliable machine learning models.
Probability Distributions: A Cornerstone of AI – 2
The theoretical principles underlying probability distributions serve as a bridge between classical statistics
and modern machine learning techniques.
For AI and machine learning practitioners, understanding probability distributions is crucial for building
models that can reason effectively under uncertainty and make accurate predictions.
By understanding probability distributions, AI/ML practitioners can build more robust, interpretable, and
efficient systems for a wide range of applications, from computer vision and natural language processing to
robotics and predictive maintenance.
Let us dive into the world of Probability distributions
Random Experiment
An activity or process that leads to one of several possible outcomes.
All possible outcomes are known, however, its exact outcome cannot be precisely predicted in advance.
Experiment can be repeated a number of times under the same conditions.
Examples
Tossing a Coin: When you toss a coin, it can land on either heads or tails. The outcome is uncertain until the
coin lands.
Rolling a Die: Rolling a six-sided die can result in any number from 1 to 6, and the exact number cannot be
predicted beforehand.
There are certain terms associated with random experiments that are given as follows:
Outcome: A possible result of the random experiment.
Sample space: List of all possible outcomes of a random experiment. For example, the sample space for rolling
a die is {1, 2, 3, 4, 5, 6}.
Event: A subset of the sample space. For example, getting an even number when rolling a die is an event with
outcomes {2, 4, 6}
Trial: When a random experiment is repeated many times each one is known as a trial.
Random Variables
A mathematical function that assigns a numerical value to an outcome in the sample space of a random
experiment.
It Bridges the gap between theoretical probability and real-world data.
Examples:
Suppose a die is rolled (Random experiment)
Here, the sample space S = {1, 2, 3, 4, 5, 6}.
Let X = random variable denoting the outcome of the roll.
Suppose Two (unbiased) coins are tossed (Random experiment)
Here, the sample space S = {HH, HT, TH, TT} X = random variable denoting number of heads.
Types of Random Variables
Random variables are of two types that are:
Discrete Random Variable
Continuous Random Variable
Discrete Random Variables: Take on a countable number of values.
Example: Number of heads in 3 coin tosses (X = 0, 1, 2, 3).
Continuous Random Variables: Take on an uncountable number of values within an interval.
Example: The height of a person in cm (X ∈ [150, 200]).
Random Variables and Probability Distributions
Probability Distribution is a mathematical function that represents the probability of different possible values
of a random variable within a given range.
For example, imagine flipping a coin, there is a probability distribution that tells us the chances of getting
heads or tails. The following probability table describes it –
A probability distribution is a theoretical representation of frequency distribution (FD).
FD describes the number of occurrences of a variable in a dataset. On the other hand, probability distribution
assigns probabilities to them.
Now, corresponding to two types of random variables, we have two types of prob dist.
Discrete Probability Distributions:
Are Used to model discrete random variables that take on a countable set of distinct values, such as the
outcomes of a coin flip, the number of defects in a manufacturing process, or the words in a document.
A discrete probability distribution is described by a mathematical function called as probability mass
function (PMF)
It gives the probability of every possible value of a variable.
A probability mass function can be represented as an equation or as a graph.
PMF, P(X= x), assigns a probability to each possible value x of the random variable X, such that the
probabilities sum to 1.
Continuous Probability Distributions:
Are a mathematical framework for modeling and interpreting continuous variables.
Examples of continuous variables are: the speed of sound waves, height, weight, blood pressure, cholesterol
levels, etc.
These are mathematical functions that describe the probabilities of different occurrences within a continuous
range of values, such as time, weight, or height, taking on a specific value within a given range.
Continuous probability distributions are represented by a Probability Density Function (PDF).
Probability Density Function:
It describes the likelihood of a continuous random variable taking on specific values within a given range.
It essentially outlines how values are distributed across the range, providing a visual depiction of the
distribution's shape, such as whether it's symmetric, skewed, or peaked.
Let X be the continuous random variable with probability density function f(x). For a function to be valid
probability function should satisfy below conditions.
f(x) ≥ 0, ∀ x ∈ R
f(x) should be piecewise continuous.
∫f(x)dx=1
Next we‘ll explore some of the essential probability distributions that every data scientist and AI/ML
practitioner should know: We‘ll dive into their properties, use cases, and how they are leveraged in
real-world AI and ML applications.
Binomial Distribution
The binomial distribution is a discrete probability distribution that models the number of successes in a fixed
number of independent trials, where each trial has only two possible outcomes: success or failure
Few examples of situations that can be modelled using the binomial distribution:
Suppose you flip a fair coin 10 times. Each flip is an independent trial, and there are only two possible
outcomes: heads or tails.
In a clinical trial, patients are often given a treatment or a placebo. The outcome for each patient might be
success (the treatment works) or failure (the treatment doesn't work).
A factory produces a large number of items, and each item may be defective or non-defective. Inspectors
randomly select a sample of items and check them for defects.
Real-World Examples of Binomial Distribution
• Coin Flips: Probability of getting exactly 5 heads in 10 coin flips, given a fair coin (p=0.5).
• Drug Effectiveness: A new drug has a 70% success rate. If administered to 15 patients, what's the
probability it cures exactly 10?
• Manufacturing Defects: A production line has a 2% defect rate. In a batch of 100 items, what's the
probability of finding exactl 3 defective items?
• Customer Conversion: An online ad has a 5% click-through rate. Out of 500 impressions, what's the
chance of getting exactly 25 clicks?
• Election Outcomes: In a city, 60% of voters favor a candidate. If a random sample of 20 voters is
surveyed, what's the likelihood exactly 12 support the candidate?
Binomial Distribution in ML
The most common use-case in machine learning or AI is a binary classification problem. In machine
learning, binary classification is a supervised learning algorithm that categorizes new observations into one of
two classes. That is, when we want to train and validate an algorithm that predicts whether or not the
particular observation belongs to one class or not (0 or 1 scenarios).
For example, Suppose you have a binary classifier that predicts whether an email is spam (1) or not spam (0).
You test this classifier on 100 emails (n = 100), and it correctly identifies 80 of them as spam (k = 80). If the
probability of correctly classifying an email as spam is ( p ), the number of correctly classified spam emails
follows a binomial distribution:
A/B Testing
A/B testing is a method of comparing two versions of something to determine which performs better.
Also known as bucket testing or split testing, it is a method in which we take a look at two unique variations
of a website or app or an email to see which one gets higher results.
When the outcome of each user interaction can be categorized as either a success (e.g., a click, a conversion)
or a failure (e.g., no click, no conversion), the binomial distribution is a suitable model.
The binomial distribution helps analyze the results of A/B tests.
Consider an example where we want to determine if a new feature in a recommendation system improves
user click-through rates.
By modeling the click events as a binomial distribution, we can perform a hypothesis test to evaluate if the
observed improvement is statistically significant or just due to random chance.
Poisson Distribution
Consider an event that is re-ocurring. And we want to model the number of times this event takes place in a
given interval of time. This can be dome using Poisson Dist.
It gives the probability of an event happening a certain number of times (k) within a given interval of time or
space.
“Events” could be anything from disease cases to customer purchases to meteor strikes. The interval can be
any specific amount of time or space, such as 10 days or 5 square inches.
You can use a Poisson distribution if:
Individual events happen at random and independently. That is, the probability of one event doesn’t affect the
probability of another event.
You know the mean number of events occurring within a given interval of time or space. This number is
called λ (lambda), and it is assumed to be constant.
Real-World Examples of Poisson Distribution:
• Call Center: The number of calls received by a call center per hour. If a call center receives an average
of 100 calls per hour, Poisson distribution can model the probability of receiving 90, 110, or any other
number of calls in a given hour.
•
• Traffic Flow: The number of cars passing a specific point on a highway per minute. For example, if
an average of 30 cars pass a certain point every minute, Poisson distribution can determine the
likelihood of 20 cars passing in one minute or 40 cars in another. Useful for traffic management.
•
• Defects in Manufacturing: The number of defects per unit area in a manufactured product (e.g., the
number of scratches on a car panel). If a production process yields an average of 2 defects per square
meter, the distribution predicts the chance of finding 0, 1, 3, or more defects on a randomly chosen
square meter of the product.
•
• Website Traffic: Number of users visiting a website in a given interval. A website averaging 300
visits per minute can use Poisson to assess the likelihood of 250 or 350 visits in any particular minute.
•
• Bacterial Colonies: Number of bacterial colonies in a petri dish of a given area. If on average there are
5 colonies per square cm, Poisson helps predict the probability of finding 2, 7, or any other number of
colonies on a specific square cm.
In AI/ML, Poisson distributions are used in various applications such as:
Anomaly Detection: Anomaly detection involves identifying unusual patterns or observations in data that
differ significantly from the majority of the data. These anomalies could represent anything from fraud in
financial transactions to faults in industrial machinery.
In anomaly detection, the Poisson distribution is often used to model counts of events under normal
conditions. If the observed count of events deviates significantly from the expected counts predicted by the
Poisson model, it may indicate an anomaly.
For example, if a system typically observes 5 events per minute on average and suddenly observes 20 events
in one minute, this might be flagged as anomalous under a Poisson-based detection system.
A sudden spike in network security alerts might indicate a cyberattack. A significant drop in customer
website visits could suggest a technical issue.
Natural Language Processing
The connection between Natural Language Processing (NLP) and the Poisson process might not be
immediately obvious, but the Poisson process can be leveraged in certain aspects of NLP. some contexts
where the Poisson process can intersect with NLP:
Poisson processes are used to model the arrival of textual events, such as user queries, messages, or
interactions in a chatbot system.
In anomaly detection applied to NLP , Poisson process is used for fraud detection in customer support chats.
When clustering documents, word occurrence patterns could be analyzed using distributions derived from a
Poisson process.
In analyzing sentiment associated with events, especially temporal data (tweets about live events, stock
market updates), the Poisson process can model the rate at which sentiments shift in response to external
events.
Poisson processes can help in time-sensitive topic modeling to study how the rate of occurrence of certain
topics changes across time.
Modeling Customer Behavior: The Poisson distribution can be used to model customer interactions, such
as: Predicting the number of customer service calls received per day.
Analyzing the frequency of customer purchases within a specific timeframe.
Uniform Dist
A uniform distribution is a type of symmetric probability distribution in which all the outcomes have an equal
likelihood of occurrence.
There are two types of uniform distributions: discrete and continuous.
Examples of Uniform Distribution:
Rolling a Fair Die: When rolling a fair six-sided die, each outcome (1, 2, 3, 4, 5, or 6) has an equal
probability of 1/6. This is a discrete uniform distribution.
Random Number Generators: Many computer algorithms generate random numbers that are uniformly
distributed between 0 and 1 (or within a specified range). These are used in simulations, cryptography, and
statistical modeling.
Waiting Time (Idealized): Consider a bus that arrives exactly every 30 minutes. If you arrive at the bus stop
at a random time, your waiting time (up to 30 minutes) can be modeled as a continuous uniform distribution
between 0 and 30 minutes.
Applications of Uniform dist in ML:
Random initialization refers to the practice of assigning initial values (e.g., weights in a neural network)
randomly, typically before the training process begins. Proper initialization can affect how well and how
quickly the network converges during training.
When initializing weights or sampling random numbers, using a uniform distribution ensures that the values
are spread evenly across the range, avoiding bias.
Simulation: Used extensively in Monte Carlo simulations to model random events and estimate
probabilities.
Testing: In software testing, uniform distributions help generate test data to ensure all possible inputs are
tested equally.
Cryptography: Used in some encryption algorithms to generate random keys or initialization vectors.
Normal Distribution
Normal distribution, also known as the Gaussian distribution, is a continuous probability distribution that is
symmetric about the mean, depicting that data near the mean are more frequent in occurrence than data far
from the mean.
Applied to many natural phenomena. All kinds of variables in natural and social sciences are normally or
approximately normally distributed.
Examples of Normal Distribution:
• Height: The heights of adult men or women in a population tend to be normally distributed.
• Blood Pressure: Systolic blood pressure readings for a large population often follow a normal
distribution.
• Exam Scores: Standardized test scores, such as the SAT or GRE, are designed to follow a
normal distribution.
• Errors in Measurement: Repeated measurements of a physical quantity (e.g., the length of
an object) will cluster around the true value in a normal distribution pattern, assuming random
errors.
• IQ Scores: IQ scores are normally distributed with a mean of 100 and a standard deviation of
15.
Normal Distribution is an important concept in statistics and the backbone of Machine Learning.
Applications of Normal Dist in ML:
Linear Regression
Linear regression is a statistical method used to model the relationship between a dependent variable (output)
and one or more independent variables (inputs). The normal distribution often plays a critical role in linear
regression. Here’s how:
In ordinary least squares (OLS) regression, it's typically assumed that the error term follows a normal
distribution. This assumption ensures that the parameter estimates are unbiased and efficient.
Inference and Predictions: Many statistical tests -tests for coefficients, 𝐹𝐹 -tests for model significance) and
confidence interval calculations rely on the assumption of normality.
Gaussian Naive Bayes (GNB)
The Naive Bayes algorithm is a probabilistic classifier based on Bayes' Theorem and assumes that features
are conditionally independent given the class label. Despite its simplicity, it is widely used because it
performs well in many real-world applications like text classification, spam filtering, and sentiment analysis.
GNB is a variation of the Naive Bayes algorithm that assumes the data for each feature follows a normal
distribution (Gaussian distribution). This assumption makes Gaussian Naive Bayes particularly well-suited
for continuous data.
Gaussian Mixture Model (GMM)
A Gaussian Mixture Model (GMM) is a probabilistic model that represents a dataset as a mixture of multiple
Gaussian distributions. It's widely used in clustering, density estimation, and unsupervised learning tasks.
Variational Encoders
Variational Encoders, often referred to as Variational Autoencoders (VAEs), are a type of generative model
in machine learning. They are designed to learn a probabilistic mapping between input data and a latent
space, enabling both reconstruction of the input and generation of new, similar data.
Variational Encoders and the normal distribution are intrinsically linked, as VAEs leverage the normal
distribution for defining their latent space and ensuring smooth and interpretable generative modeling.
Latent space refers to a compressed, abstract representation of data in machine learning and deep learning
models. This space is crucial for tasks like dimensionality reduction, generative modeling, and unsupervised
learning.
Beta Dist
• A continuous probability distribution that models random variables with values falling inside a finite
interval.
• The standard beta distribution uses the interval [0,1].
• Defined as a family of continuous probability distributions set on the interval [0, 1] having two
positive shape parameters, expressed by α and β.
• Useful when dealing with scenarios where the outcomes are bounded within a specific range, such as
success rates, probabilities, and proportions in Bayesian inference, decision theory, and reliability
analysis.
• Various fields apply the beta distribution because of its capacity to model proportions, probabilities,
and bounded data.
•
• Beta Distribution: Real-World Examples:
• Click-Through Rate (CTR) Prediction: A/B testing different ad creatives. Beta distribution models the
probability that a user will click on an ad.
•
• Project Completion Probability: Estimate the likelihood of a project finishing on time.
•
• Conversion Rate Optimization: Model the probability of visitors converting into customers on a
website. Beta distribution to represent this conversion rate and predict future conversion rates based
on new A/B testing changes.
•
• Equipment Failure Rates: A factory wants to model the probability of machine failure. Beta
distribution estimates the likelihood of a machine lasting a certain number of days.
•
• Sports Analytics: Model a baseball player's batting average. Beta distribution estimates the probability
of the player's true batting average being within a certain range, accounting for sample size.
Applications of Beta Dist in ML
Modeling Probabilities: Since the beta distribution is defined on the interval [0, 1], it is perfect for modeling
probabilities, such as the probability of success in binary classification tasks or the confidence of predictions.
Bayesian Inference: In Bayesian analysis, the beta distribution serves as a conjugate prior for the Bernoulli
and binomial distributions. This means that if the prior distribution of a parameter is beta and the likelihood
is Bernoulli/binomial, the posterior distribution will also be beta. This makes computations more
straightforward.
Hyperparameter Estimation:
Hyperparameters are the parameters of a model that are not learned during the training process; they are set
manually before training begins. For example, in machine learning, hyperparameters could include: The
number of hidden layers in a neural network. Hyperparameter estimation refers to the process of determining
optimal values for these hyperparameters, often using techniques such as grid search, random search, or
Bayesian optimization.
In Bayesian frameworks, the beta distribution is used to model the prior beliefs about probabilities.
In some algorithms, beta distribution can be used for hyperparameter estimation, especially when dealing with
probabilistic models.
In Bayesian frameworks, the beta distribution is used to model the prior beliefs about probabilities.
Reinforcement learning is a framework in which an agent learns to make decisions by interacting with an
environment, receiving rewards (or penalties), and aiming to maximize cumulative rewards over time. It
often involves:
• Exploration: Trying out new actions to gather information.
• Exploitation: Choosing actions that are known to yield high rewards.
The beta distribution can play an essential role in reinforcement learning (RL), especially in scenarios
involving probability modeling or exploration-exploitation strategies. In RL tasks involving uncertainty, the
beta distribution serves as a conjugate prior for the Bernoulli or binomial distributions.
In RL environments where reward probabilities are unknown and must be learned, the beta distribution
models this uncertainty. The agent refines the beta parameters as it gathers more data about rewards from its
actions.