BBS College of Engineering and Technology
Department of Business Administration
M.B.A 1st Semester
Business Statistics and Analytics(BMB 104)
Name of Faculty: Mr. Shubham Kushwaha
Unit- 4
Probability: Theory of Probability, Addition Law, Multiplication Law & Baye's theorem,
Probability Distribution: Concept and application of Binomial, Poisson and Normal.
Introduction to bivariate and multivariate data analysis (Cluster and Factor analysis)
Probability
The theory of probability has its origin in the games of chance related to gambling like
drawing cards from a pack of cards or throwing of dice ete. The term probability is
familiar to most of us. It is a part of everday life. Probability theory is frequently used as
an aid in making decisions in the face of uncertainty. In common parlance the term
'Probability' refers to the chance of happening or not happening of an event.
Probability is a branch of mathematics that deals with the likelihood or chance of an
event occurring. It quantifies uncertainty and provides a measure for how likely an event
is to happen, expressed as a number between 0 and 1. A probability of 0 means the
event cannot happen, while a probability of 1 means the event is certain to happen.
Probabilities for any event lie in the range [0, 1].
Events:
An event is any outcome or a set of outcomes of an experiment. In probability theory, we refer
to the set of all possible outcomes of a random experiment as the sample space.
Events can be classified as:
Simple Event: An event that consists of a single outcome. For example, rolling a 4 on a die.
Compound Event: An event that consists of more than one outcome. For example, rolling an
even number on a die (which includes 2, 4, and 6).
Complementary Event: The event that represents all outcomes that are not in the original event.
For example, if A is the event of getting a 4 on a die, the complement of A is the event of not
getting a 4.
Mutually Exclusive Events: Two events that cannot happen at the same time. For example,
when flipping a coin, the events "getting heads" and "getting tails" are mutually exclusive.
Independent Events: Two events that do not affect each other. For example, flipping a coin and
rolling a die are independent events.
Theorem's of Probability
1- Additional law
2- Multiplicational law
3- Conditional Probability
Conditional probability is the probability of an event occurring given that another event has
already occurred. It is denoted as P( A |B) which represents the probability of event A occurring,
given that event B has occurred.
The multiplication theorem explained above is not applicable in case of dependent events. Two
events A and B are said to be dependent when B can occur only when A is known to have
occurred (or vice versa). The probability attached to such an event is called the conditional
probability and is denoted by P (A / B) or in other words, probability of A given that B has
occurred.
If two events A and B are dependent then the conditional probability of B given A is:
P(B/A) = P(AB)
P(A)
Bayes theorem
Bayes' Theorem is a fundamental concept in probability theory and statistics that provides a
way to update the probability of a hypothesis based on new evidence or information. It
describes the relationship between conditional probabilities and allows for the revision of
predictions or beliefs in light of new data. The theorem is named after the Reverend Thomas
Bayes, who introduced it in the 18th century.
Bayes Theorem is particularly useful in situations where the probability of an event depends on
prior knowledge or experience. It is widely applied in various fields, including machine learning,
decision-making, medical testing, and artificial intelligence.
Probability distribution
In Statistics, the probability distribution gives the possibility of each outcome of a random
experiment or events. A probability distribution is a statistical function that describes all the
possible values and likelihoods that a random variable can take within a given range. This range
can be finite, as in the case of discrete random variables, or infinite, as with continuous random
variables. Probability distributions are foundational in statistics and probability theory because
they provide a complete picture of how a random variable behaves. They help in understanding
the uncertainty associated with outcomes and are widely used in fields like finance, economics,
engineering, and the natural sciences.
It provides the probabilities of different possible occurrence.
The probability is a measure of uncertainty of various phenomena.
Like, if you throw a dice, what the possible outcomes of it, is defined by the probability. This
distribution could be defined with any random experiments, whose outcome is not sure or could
not be predicted.
Binomial Distribution
A distribution where only two outcomes are possible, such as success or failure, gain or loss,
win or lose and where the probability of success and failure is same for all the trials is called a
Binomial Distribution.
The binomial distribution is a discrete probability distribution that models the number of
successes in a fixed number of independent trials, each with the same probability of success.
Characteristics:
There are n independent trials.
Each trial has two outcomes: success or failure.
The probability of success in each trial is constant and denoted by p, while the
probability of failure is 1- p.
The random variable X represents the number of successes in n trials.
The properties of a Binomial Distribution are:
1. Each trial is independent.
2. There are only two possible outcomes in a trial- either a success or a failure.
3. A total number of n identical trials are conducted.
4. The probability of success and failure is same for all trials. (Trials are identical.)
5. The mathematical representation of binomial distribution (Probability mass function) is given
by
P(x)= N(p+q)n
Examples:
Flipping a coin 10 times and counting the number of heads.
Testing lightbulbs and recording the number of defective ones.
Poisson distribution
The Poisson distribution is a discrete probability distribution that models the number of events
occurring in a fixed interval of time, space, or any other continuum, where the events occur
independently and at a constant average rate.
This distribution was derived by a noted mathematician, Simon D. Poisson, in 1837. The first
application was the description of the number of deaths by horse kicking in the Prussian army.
He derived this distribution as a limiting case of binomial distribution, when the number of trials
n tends to become very large and the probability of success in a trial p tends to become very
small such that their product np remains a constant.
This distribution is used as a model to describe the probability distribution of a random variable
defined over a unit of time, length or space.
For example, the number of telephone calls received per hour at a telephone exchange, the
number of accidents in a city per week, the number of defects per meter of cloth, the number of
insurance claims per year, the number breakdowns of machines at a factory per day, the
number of arrivals of customers at a shop per hour, the number of typing errors per page,
emission of radio active (alpha) particles etc.
Conditions for using Poisson Distribution:
An event can occur any number of times during a time period.
Events occur independently.
The rate of occurrence is constant, that is, the rate does not change based on time.
The probability of an event occurring is proportional to the length of the time period.
Some examples are:
1. The number of emergency calls recorded at a hospital in a day.
2. The number of thefts reported in an area on a day.
3. The number of customers arriving at a salon in an hour.
4. The number of suicides reported in a particular city.
5. The number of printing errors at each page of the book.
Here, X is called a Poisson Random Variable and the probability distribution of X is called
Poisson distribution.
The Probability distribution of X following a Poisson distribution is given by:
P(x)= e-a × an
n!
Normal Distribution
The normal distribution is one of the most important and widely used continuous probability
distribution. In the initial stages, the normal distribution was developed by Abraham De Moivre
(1667-1754). His work was later taken up by Pierre S Laplace (1949-1827). But the discovery of
equation for the normal density function is attributed to Carl Friedrich Gauss (1777-1855), who
did much work with the formula. In science books, this distribution is often called the Gaussian
distribution.
The normal distribution, also known as the Gaussian distribution, is a continuous probability
distribution that models many natural phenomena where values cluster around a central mean.
In general, the normal distribution provides a good model for a random variable,
when:
There is a strong tendency for the variable to take a central value;
Positive and negative deviations from this central value are equally likely;
The frequency of deviations falls off rapidly as the deviations become larger.
As an underlying mechanism that produces the normal distribution, we can think of
an infinite number of independent random (binomial) events that bring about the
values of a particular variable.
For example, there are probably a nearly infinite number of factors that determine a person's
height (thousands of genes, nutrition, diseases, etc.). Thus, height can be expected to be
normally distributed in the population.
In order that the distribution of a random variable X is normal, the factors affecting
itsobservations must satisfy the following conditions:
(i) A large number of chance factors: The factors, affecting the observations of a random
variable, should be numerous and equally probable so that the occurrence or nonoccurrence of
any one of them is not predictable.
(ii) Condition of homogeneity: The factors must be similar although, their incidence may vary
from observation to observation.
(iii) Condition of independence: The factors, affecting observations, must of each other. atly
(iv) Condition of symmetry: Various factors operate in such a way that the deviations of
observations above and below mean are balanced with regard to their magnitude as well as
their number.
Two Parameters Of Normal Distribution:
1. Mean p (center of the curve) - Mean u locates the center of the distribution. It defines the
location of the peak for normal distributions. Most values cluster around the mean. On a graph,
changing the mean shifts the entire curve left or right on the X-axis.
2. Standard deviation (spread about the center) (and variance ) The standard deviation
determines the spread of a Normal distribution. The standard deviation is a measure of
variability. It defines the width of the normal distribution. The standard deviation determines
how far away from the mean the values tend to fall. It represents the typical distance between
the observations and the average. We define the standard normal random variable Z as the
normal random variable with mean 0 and standard deviation = 1. 'z' is called Standard Normal
Variate or Variable.
Calculation of Z score
Z= X - X̄
σ
Properties of Normal Curve:
The normal curve is asymptotic to the X-axis. The two tails extend upto infinity at
both the ends.
The height of the curve declines symmetrically.
Normal curve is a smooth curve.
The normal curve is bilateral.
The normal curve is a mathematical model in behavioral sciences.
It is a continuous probability distribution.
As the distance from the mean increases. The curve comes closer to the horizontal
axis (x=axis).
Not only the distribution of discrete random variable, the probability distributions of t,
chi-square and F also tend to normal distribution under certain specific conditions.
In order to infer about the unknown universe, we take recourse to sampling and
inferences regarding the universe is made possible only on the basis of normality
assumption.
key difference between Binomial ,Poisson and Normal Distribution
Introduction to Bivariate and Multivariate Data Analysis
Bivariate Data Analysis
Bivariate analysis involves examining the relationship between two variables. It aims to
identify associations, correlations, or causations. The two variables can be
Quantitative vs. Quantitative: E. g., analyzing the correlation between height and
weight
Quantitative vs. Categorical: Eg., comparing average income across different
education levels.
Categorical vs. Categorical: Eg., examining the relationship between gender and
voting preferences.
Methods in Bivariate Analysis
1. Scatter Plots: Visual representation of the relationship between two quantitative
variables
2. Correlation Analysis: Measures the strength and direction of a linear relationship (e.g.,
Pearson's r).
3. Regression Analysis: Models the relationship where one variable predicts the other
4. Chi-Square Test: Tests the relationship between two categorical variables.
Multivariate Data Analysis
Multivariate analysis involves studying relationships among three or more variables
simultaneously. This method identifies pattems, structures, or relationships within
complex datasets.
Applications of Multivariate Analysis:
Market Research: Analyzing customer preferences across multiple attributes.
Medical Studies: Identifying factors influencing disease outcomes.
Environmental Studies: Examining how multiple factors affect climate.
Common Multivariate Methods
1. Multiple Regression: Analyzing the effect of multiple independent variables on a
single dependent variable.
2. MANOVA (Multivariate Analysis of Variance): Examines the influence of categorical
independent variables on multiple dependent variables.
3. Principal Component Analysis (PCA): Reduces the dimensionality of data by
identifying key components.
4. Cluster Analysis: Groups similar observations into clusters based on shared
characteristics.
5. Factor Analysis: Identifies underlying latent variables that explain observed
correlations.
Cluster Analysis
Cluster analysis is an unsupervised learning technique used to group data into clusters
(or segments) based on their similarity.
Steps in Cluster Analysis:
1. Define Variables: Choose variables relevant to the clustering objective.
2. Select a Distance Measure:
Euclidean Distance (for continuous data)
Manhattan Distance or others (for categorical data)
3. Choose Clustering Method:
Hierarchical Clustering: Builds a tree of clusters.
K-Means Clustering: Divides data into k clusters based on centroids.
DBSCAN: Density-based clustering for complex shapes.
4. Determine the Number of Clusters: Use methods like the Elbow Method or Silhouette
Coefficient.
5. Interpret Clusters: Understand the characteristics of each cluster.
Example: In marketing, cluster analysis can group customers based on purchasing
behavior, helping target specific customer segments.
Factor Analysis
Factor analysis is a statistical technique used to reduce data dimensionality by
identifying a smaller number of latent factors that explain the observed correlations
among variables.
Types of Factor Analysis:
1. Exploratory Factor Analysis (EFA):
Used to uncover the underlying structure without prior assumptions.
2. Confirmatory Factor Analysis (CFA):
Tests hypotheses or confirms predefined factor structures.
Steps in Factor Analysis:
1. Prepare Data: Ensure variables are continuous and have correlations.
2. Extract Factors: Methods include Principal Component Analysis (PCA) or Maximum
Likelihood.
3. Rotation: Simplifies interpretation by adjusting factor loadings (e.g., Varimax or
Promax rotation).
4. Interpret Factors: Analyze factor loadings to understand which variables contribute to
each factor.
Example: In psychology, factor analysis identifies underlying traits (e.g., intelligence,
extraversion) from multiple questionnaire items.