0% found this document useful (0 votes)
4 views9 pages

Module 1

Business Analytics (BA) involves systematic data exploration and statistical analysis to enhance business decision-making and planning. It encompasses key components like data management, statistical analysis, predictive modeling, and data visualization, and serves to improve efficiency, provide customer insights, and manage risks. The document also discusses various types of analytics, big data characteristics, measurement scales, and statistical concepts such as sampling distributions and confidence intervals.

Uploaded by

Preesha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Module 1

Business Analytics (BA) involves systematic data exploration and statistical analysis to enhance business decision-making and planning. It encompasses key components like data management, statistical analysis, predictive modeling, and data visualization, and serves to improve efficiency, provide customer insights, and manage risks. The document also discusses various types of analytics, big data characteristics, measurement scales, and statistical concepts such as sampling distributions and confidence intervals.

Uploaded by

Preesha Singh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Business Analytics (BA) refers to the systematic exploration of data and

statistical analysis to drive business planning and decision-making.

It aids in discovering patterns, predicting outcomes, and supporting data-


driven decisions across various business functions.

Key Components:

1. Data management (collection, cleaning, storage)


2. Statistical and quantitative analysis
3. Predictive modelling
4. Data visualization and reporting

Why Business Analytics?

1. Informed Decision-Making: Enables organizations to make evidence-


based decisions rather than relying on intuition.
2. Improved Efficiency: Identifies bottlenecks, optimizes operations,
and reduces waste.
3. Competitive Advantage: Offers insights into market trends,
customer behavior, and operational performance.
4. Risk Management: Helps in forecasting risks and developing
mitigation strategies.
5. Customer Insights: Provides detailed understanding of customer
preferences and lifetime value

Types Of Analytics

1 Descriptive Analytics Purpose: Answers ”What happened?”


Tools/Examples: Dashboards, summary statistics, reports Use Case: Sales
trends over time

2 Diagnostic Analytics Purpose: Answers ”Why did it happen?”


Tools/Examples: Drill-down, correlation analysis Use Case: Investigating a
sudden drop in customer retention

3 Predictive Analytics Purpose: Answers ”What is likely to happen?”


Tools/Examples: Regression models, machine learning Use Case: Sales
forecasting, credit scoring

4 Prescriptive Analytics Purpose: Answers ”What should be done?”


Tools/Examples: Optimization algorithms, decision analysis Use Case:
Resource allocation, pricing strategy

Big data Analytics Definition: The process of analyzing large, complex data
sets (structured and unstructured) that traditional data processing tools
cannot handle efficiently.

Characteristics (The 5 Vs):


Volume: Massive amounts of data

Velocity: High-speed data generation and processing

Variety: Different types and sources (text, video, sensor data)

Veracity: Uncertainty and inconsistency in data quality

Value: Turning data into actionable insights Tools/Technologies: Hadoop,


Spark, NoSQL databases, cloud computing Applications: Customer
sentiment analysis, fraud detection, supply chain optimization,
personalized marketing

1. Data Types
2. Structured Data: Data that is organized in a tabular format (rows
and columns), making it easy to store and analyze. Examples: 2 3
Excel sheets with sales records SQL databases with employee
details
3. Unstructured Data Data that does not follow a predefined format or
structure. Examples: Emails, social media posts, audio/video files
Customer reviews or open-ended survey responses
4. Based on Time Dimension
a. Cross-sectional Data: Data collected at a single point in time
from multiple subjects. Example: A survey on job satisfaction
of 500 employees conducted in July 2025.
b. Time Series Data: Data collected from the same subject across
different time periods. Example: Monthly sales of a product
from Jan 2020 to Dec 2024. PANEL DATA

Types of Variable Measurement Scales

1. Nominal:Categorical data where the categories have no inherent


order or ranking.
a. Gender: Male, Female, Other.
b. Hair Color: Brown, Black, Blonde, Red, etc.
c. Eye Color: Blue, Brown, Green, etc.
d. Political Affiliation: Democrat, Republican, Independent, etc.
e. Car Brands: Toyota, Ford, BMW, etc.
f. Type of Food: Italian, Mexican, Chinese, etc.

Types of Variable Measurement

2. Ordinal: Categorical data where the categories have a meaningful


order or ranking, but the intervals between the ranks may not be
equal.
Education Level: High School, Bachelor’s, Master’s, Doctoral.
a. Likert Scale Responses: Strongly Agree, Agree, Neutral,
Disagree, Strongly Disagree.
b. Customer Satisfaction: Very Unsatisfied, Unsatisfied,
Neutral, Satisfied, Very Satisfied.
c. Rank in a Competition: 1st Place, 2nd Place, 3rd Place,
etc.
d. Movie Ratings: G, PG, PG-13, R.
3. Interval: Numerical data where the differences between values are
meaningful and consistent, but there is no true zero point.
a. Temperature in Celsius or Fahrenheit: The difference between
10° C and 20° C is the same as the difference between 20° C
and 30° C, but 0° C does not represent the absence of
temperature.
b. Dates on a Calendar: The difference between January 1st and
January 2nd is one day, but there is no ”zero” year.
c. SAT Scores: The difference between 1200 and 1300 is the
same as between 1300 and 1400, but a score of 0 doesn’t
mean a complete lack of knowledge.
d. Credit Scores: Credit scores have equal intervals, but a score
of 0 doesn’t represent the absence of credit
4. Ratio: Numerical data where the differences between values are
meaningful and consistent, and there is a true zero point that
represents the absence of the measured quantity
a. Height: A person who is 6 feet tall is twice as tall as a person
who is 3 feet tall.
b. Weight: A 200-pound person weighs twice as much as a 100-
pound person.
c. Age: A 50-year-old is twice as old as a 25-year-old.
d. Temperature: 0 Kelvin represents absolute zero, the absence
of all thermal energy.
e. Distance: 0 miles means no distance is traveled.
f. Number of Items: 0 items means there are no items present.
Skewness
Skewness measures the lack of symmetry in a distribution.
Interpretation:
1. Positive skew (right skew): The tail on the right side of the
distribution is longer or fatter than the left side. The mean is
typically greater than the median.
2. Negative skew (left skew): The tail on the left side of the
distribution is longer or fatter than the right side. The mean is
typically less than the median.
3. Zero skew: The distribution is perfectly symmetrical.
Kurtosis
Kurtosis measures the ”tailedness” of a distribution, or how much
the distribution’s tails differ from a normal distribution.
Interpretation:
1. Mesokurtic: A normal distribution (kurtosis = 3).
2. Leptokurtic: A distribution with heavier tails and a higher peak
than a normal distribution (kurtosis > 3).
3. Platykurtic: A distribution with lighter tails and a flatter peak
than a normal distribution (kurtosis < 3).

Relationship between Skewness and Kurtosis Skewness and kurtosis,


though distinct, work together to provide a more comprehensive
understanding of a data distribution’s shape. Skewness focuses on
asymmetry, describing whether the data leans to one side or the other.
Kurtosis focuses on ”tailedness” and peak sharpness, indicating the
likelihood of outliers and the data’s concentration around the mean.
Understanding both measures aids in identifying patterns, detecting
anomalies, and making data-driven decisions, which is crucial in fields like
data science and financial analysis. Dr.M. Revathy (SOC,NMIMS,Bengaluru)
Semester-V July 23, 2025 24/39 Inter Quartile Range The interquartile
range (IQR) is a measure of the spread of the middle 50% of values in a
dataset. It’s essentially the range of values that lie between the first
quartile (Q1) and the third quartile (Q3). Outlier Detection: The IQR is a

range of Q1 − 1.5 ∗ IQR and Q3 +1.5∗IQR are often considered potential


powerful tool for identifying outliers. Data points that fall outside the

outliers. Dr.M. Revathy (SOC,NMIMS,Bengaluru) Semester-V July 23, 2025


25/39 Standard Error Take multiple samples from a population of students’
test scores. The standard deviation would tell you how much the scores
vary within each sample. The standard error, on the other hand, would tell
you how much the average test scores of those samples vary from each
other Dr.M. Revathy (SOC,NMIMS,Bengaluru) Semester-V July 23, 2025
26/39 Sampling Distribution In statistics, a sampling distribution is the
probability distribution of a statistic obtained from multiple samples drawn
from a larger population. it’s a way to understand how a statistic, like the
mean or standard deviation, will vary across different samples taken from
the same population Dr.M. Revathy (SOC,NMIMS,Bengaluru) Semester-V
July 23, 2025 27/39 Sampling Distribution Select a random sample:
Choose a random sample of a specific size (denoted as ’n’) from the
overall population. Calculate a statistic: Determine a statistic (like the
mean, median, or standard deviation) for that sample. Repeat the process:
Repeat steps 1 and 2 many times, drawing new random samples each
time and calculating the statistic for each sample. Create a frequency
distribution: Create a frequency distribution (a graph) of the calculated
statistics from all the samples. This graph represents the sampling
distribution. Dr.M. Revathy (SOC,NMIMS,Bengaluru) Semester-V July 23,
2025 28/39 Terminologies in sampling Distribution Statistic:A numerical
summary of a sample, such as mean, median, standard deviation, etc.
Parameter: A numerical summary of a population is often estimated using
sample statistics. Sample: A subset of individuals or observations selected
from a population. Population: Entire group of individuals or observations
that a study aims to describe or draw conclusions about. Standard Error:
Standard deviation of a sampling distribution, representing the variability
of sample statistics around the population parameter. Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 29/39 Terminologies in
sampling Distribution Bias: Systematic error in estimation or inference,
leading to a deviation of the estimated statistic from the true population
parameter. Confidence Interval: A range of values calculated from sample
data that is likely to contain the population parameter with a certain level
of confidence. Sampling Method: Technique used to select a sample from
a population, such as simple random sampling, stratified sampling, cluster
sampling, etc. Inferential Statistics: Statistical methods and techniques
used to draw conclusions or make inferences about a population based on
sample data. Hypothesis Testing: A statistical method for making decisions
or drawing conclusions about a population parameter based on sample
data and assumptions about the population. Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 30/39 Purpose of
Sampling Distribution Estimate Population Parameters: By analyzing the
distribution of sample statistics, data scientists can make inferences about
population parameters (e.g., population mean or proportion). Quantify
Uncertainty: Sampling distributions provide a measure of the variability of
a statistic, which is crucial for constructing confidence intervals and
hypothesis tests. Model Performance Evaluation: They help in
understanding the variability and performance of models, especially when
dealing with small datasets or conducting resampling techniques like
bootstrap. Dr.M. Revathy (SOC,NMIMS,Bengaluru) Semester-V July 23,
2025 31/39 Types of Sampling Distribution Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 32/39

Central Limit Theorem The Central Limit Theorem (CLT) states that the
distribution of sample means will approximate a normal distribution,
regardless of the original population distribution, as the sample size
becomes sufficiently large. This means that even if the population data
isn’t normally distributed, the averages of many samples from that
population will tend to form a bell curve. Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 33/39 Confidence
Interval A Confidence Interval (CI) is a range of values that contains the
true value of something we are trying to measure like the average height
of students or average income of a population. Instead of saying: “The
average height is 165 cm.” We can say: “We are 95% confident the
average height is between 160 cm and 170 cm.” Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 34/39 Interpretation Of
Confidence Interval Let’s say we take a sample of 50 students and
calculate a 95% confidence interval for their average height which turns
out to be 160–170 cm. This means If we repeatedly take similar samples
95% of those intervals would contain the true average height of all
students in the population.
Margin Error: If a poll reports that 60% of people prefer a
product, with a margin of error of ±3%, it means that the true
percentage of people in the population who prefer the product is
likely between 57% and 63%. Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 37/39 Example
Question: In a tree, there are hundreds of mangoes. You randomly
choose 40 mangoes with a mean of 80 and a standard deviation
of 4.3. Determine that the mangoes are big enough. Solution:
Mean = 80 Standard deviation = 4.3 Number of observations = 40
Take the confidence level as 95 Substituting the value in the
formula, we get = 80 ± 1.960 Ö [ 4.3 / 40 ] = 80 ± 1.960 Ö [ 4.3 /
6.32] = 80 ± 1.960 Ö 0.6803 = 80 ± 1.33 The margin of error is
1.33 All the hundreds of mangoes are likely to be in the range of
78.67 and 81.33. Dr.M. Revathy (SOC,NMIMS,Bengaluru)
Semester-V July 23, 2025 38/39 Shaipro Wilk Test Purpose: The
Shapiro-Wilk test helps determine if a dataset is likely to have
been drawn from a population with a normal distribution. Null
Hypothesis: The null hypothesis of the test is that the data is
normally distributed. Test Statistic (W): The test calculates a
statistic (W) that ranges from 0 to 1. A value closer to 1 suggests
a better fit to a normal distribution. Dr.M. Revathy
(SOC,NMIMS,Bengaluru) Semester-V July 23, 2025 39/39 Shaipro
Wilk test p value Interpretation 1 Small p-value (typically less
than or equal to 0.05): Indicates that the data is likely not
normally distributed, and the null hypothesis is rejected. 2 Large
p-value (greater than 0.05): Suggests that there is no statistically
significant evidence to reject the null hypothesis, and the data is
likely normally distributed. Dr.M. Revathy (SOC,NMIMS,Bengaluru)
Semester-

You might also like