Data Analysis
Introduction
Outline
Relevance of Statistics
Introduction to Basic Concepts
Course Details
Relevance of Statistics
CASE: PEPSI’S EXCLUSIVITY AGREEMENT
Case: Pepsi’s Exclusivity
Agreement
•A large university with a total enrollment of
about 50,000 students has offered Pepsi an
exclusivity agreement that would give Pepsi
exclusive rights to sell its products at all
university facilities for the next year with an
option for future years.
• In return, the university would receive 35% of
the on-campus revenues and an additional lump
sum of $200,000 per year.
• Pepsi has been given 2 weeks to respond.
The market for soft drinks is measured in
terms of 12-ounce cans.
Case 1: Pepsi currently sells an average of 22,000
Background cans per week (over the 40 weeks of the
year that the university operates).
Details
The cans sell for an average of 75 cents
each. The costs including labor amount to
20 cents per can.
Case 1: A Problem
• Pepsi is unsure of its market share.
• However, they suspect that it is considerably less
than 50%.
Source: https://99designs.com/icon-button-design/contests/icon-button-design-wanted-guessing-game-167222
Profit-Loss Calculation
• Suppose the current market share were around
25%.
• Pepsi would sell 88,000 (22,000 is 25% of
88,000) cans per week or 3,520,000 cans per
year.
• The profit or loss can be calculated.
Source: https://www.score.org/resource/12-month-profit-and-loss-projection
Case 1: Market Survey
• The only problem is that Pepsi does not know
how many soft drinks are sold weekly at the
university.
• Pepsi assigned a recent university graduate to
survey the university's students to supply the
missing information.
• Accordingly, she organizes a survey that asks 500
students to keep track of the number of soft drinks
they purchase in the next 7 days.
Source: https://getthematic.com/insights/customer-survey-design/
Simple Random
Sample
✓ Simple random sample is a sample
of n observations which has the
same probability of being selected
from the population as any other
sample of n observations.
• Most statistical methods presume
simple random samples.
• However, in some situations
other sampling methods have an
advantage over simple random
samples.
Source: https://www.statisticshowto.com/simple-random-sample/
Stratified Random
Sampling
• Divide the population into mutually
exclusive and collectively exhaustive
groups, called strata.
• Randomly select observations from each
stratum, which are proportional to the
stratum’s size.
• Advantages:
✓ Guarantees that each population
subdivision is represented in the
sample.
✓ Parameter estimates have greater
precision than those estimated from
simple random sampling.
Source: https://www.netquest.com/blog/en/random-sampling-stratified-sampling
Cluster Sampling
• Divide population into mutually exclusive
and collectively exhaustive groups, called
clusters.
• Randomly select clusters.
• Sample every observation in those randomly
selected clusters.
• Advantages and disadvantages:
✓ Less expensive than other sampling
methods.
✓ Less precision than simple random
sampling or stratified sampling.
✓ Useful when clusters occur naturally in
the population.
Source: https://www.netquest.com/blog/en/cluster-sampling
A Simple Representation of Survey Data
(First 8 Rows)
Student Id No. of Cans Purchased in a Week
1 14
2 10
3 8
4 6
5 9
6 12
7 13
8 4
Decision-Making
Design a Market Estimate the
Profit and Loss
Survey and Potential
Calculation
Collect Data Volume
What is Statistics?
What is Statistics?
Data Statistics Information
Statistics is a tool for creating new understanding from a set of numbers.
Broad Framework
Recognize Recognize the problem or question
Review Review Previous Findings
Select Select Your Variables
Broad
Framework..
Collect Collect Data
Analyze Analyze Data
Present and Act
Present and Act on Results
on
https://hbsp.harvard.edu/download?url=%2Fcontent%2Fsample%2FR1307L-PDF-
ENG%2Fcontent&metadata=e30%3D
Basic Concepts
Population and Sample
Subset
Population Sample
Parameter Statistic
Populations have Parameters, Samples have Statistics.
Population and Sample
• Population
✓ A population is the group of all items of interest to a statistics
practitioner.
✓Frequently very large.
• Sample
✓A sample is a set of data drawn from the population.
✓Large enough, but less than the population.
Parameter and Statistic
• Parameter
✓A descriptive measure of a population.
• Statistic
✓A descriptive measure of a sample.
Too expensive to gather
information on the
entire population
Need for
Sampling Often impossible to
gather information on
the entire population
Two Branches
Statistics
Descriptive Statistics Inferential Statistics
Descriptive Statistics
• Descriptive Statistics provides a set of methods for
organizing, summarizing, and presenting data in a
convenient and informative way.
• These methods include:
✓ Graphical Techniques and
✓ Numerical Techniques.
A Problem…
• Descriptive Statistics describe the data set that’s
being analyzed but doesn’t allow us to draw any
conclusions about the population.
Inferential Statistics
• Statistical inference is the process of making an estimate, prediction, or
decision about a population based on a sample.
Population
What can we infer
Sample
about a Population’s
Parameters based
Inference
on a Sample’s
Statistics?
Statistic
Parameter
Inferential Statistics
• We use statistics to make inferences about parameters.
• Therefore, we can make an estimate, prediction, or decision about a
population based on sample data.
• Thus, we can apply what we know about a sample to the larger
population from which it was drawn!
Data Types
Types of Data
Data Types
Cross- Time
Sectional Series
Case 1: Survey Data
Student Id No. of Cans Purchased
in a Week
Cross-sectional Data
1 14 • Data collected by recording a characteristic of many
2 10 subjects at the same point in time, or without
3 8
regard to differences in time.
4 6 • Subjects might include individuals, households,
5 9 firms, industries, regions, and countries.
6 12
7 13
8 4
Time Series Data
• Data collected by recording a characteristic of e-3 Wheeler Registrations in India
a subject over several time periods. 800000
• Data can include daily, weekly, monthly, 700000
quarterly, or annual observations. 600000
• The graph shows e-3 wheeler registrations in 500000
India.
400000
• 3-wheeler EVs like e-autos and e-rickshaws
account for close to 65% of all EVs registered
300000
in India. 200000
• For more details, check our article: 100000
0
https://www.thehindu.com/opinion/op-ed/indias-ev-ambition-
2016,Sep
2013,Jan
2013,Sep
2014,Jan
2014,Sep
2015,Jan
2015,Sep
2016,Jan
2017,Jan
2017,Sep
2018,Jan
2018,Sep
2019,Jan
2019,Sep
2020,Jan
2020,Sep
2021,Jan
2021,Sep
2022,Jan
2013,May
2014,May
2015,May
2016,May
2017,May
2018,May
2019,May
2020,May
2021,May
rides-on-three-wheels/article65480119.ece
Variables and Scales of Measurement
Variable
• A variable is the general characteristic being observed on an object of
interest.
Types of Variables
Variables
Qualitative Quantitative
Types of Variables
• Qualitative – gender, race, political affiliation
• Quantitative – test scores, age, weight
✓Discrete
✓Continuous
Discrete Variable
• A discrete variable assumes a countable number of distinct values.
• Examples: Number of children in a family, number of points scored in a
basketball game.
Continuous Variables
• A continuous variable can assume an infinite number of values within
some interval.
• Examples: Weight, height, investment return.
Scales of Measurement
- Nominal
Qualitative Variables
- Ordinal
- Interval
Quantitative Variables
- Ratio
Nominal Scale
• The least sophisticated level of measurement.
• Data are simply categories for grouping the data.
Qualitative values may be converted
to quantitative values for
analysis purposes.
Ordinal Scale
• Ordinal data may be categorized and ranked with respect to some
characteristic or trait.
• For example, students are often evaluated on an ordinal scale
(excellent, good, fair, poor).
• Differences between categories are meaningless because the actual
numbers used may be arbitrary.
• There is no objective way to interpret the difference between student
quality.
Interval Scale
• Differences between values are equal and meaningful. Thus, the
arithmetic operations of addition and subtraction are meaningful.
• No “absolute 0” or starting point defined. Meaningful ratios may not be
obtained.
Interval Scale
•For example, consider the Fahrenheit
scale of temperature.
•This scale is interval because the data
are ranked and differences (+ or -)
may be obtained.
•But there is no “absolute 0”.
Ratio Scale
• The strongest level of measurement.
• Differences between values are equal and meaningful.
• There is an “absolute 0” or defined starting point. “0” does mean
“the absence of …” Thus, meaningful ratios may be obtained.
Ratio Scale
•The following variables are measured on a ratio scale:
✓General Examples: Weight and Distance
✓Business Examples: Sales, Profits, and Inventory Levels
Course Details
Course Plan
Introduction to
Sampling
Descriptive Probability and
Introduction Distribution and
Statistics Probability
Interval Estimation
Distributions
Hypothesis Testing ANOVA Regression Analysis
Textbook & Reference Book
• Doane et al. (2020), Applied Statistics in Business and Economics, McGraw Hill (Textbook).
• Jaggia et al. (2021), Business Statistics, McGraw Hill Education (Reference Book).
Evaluation Components
Components Weightage
Quiz 20%
Mid-Term 25%
End-Term 35%
Project 20%
3/4 In-Class Quizzes and 3/4 Scheduled
Quizzes
Best 2 In-Class Quizzes and Best 2 Scheduled
Quizzes will be taken for final grading
Quiz (Tentative)
Quizzes will be mainly concept-based and
may require minor computing
In-Class Quizzes: 5 Questions, 5 Minutes
Scheduled Quizzes: 10 Questions, 14 Minutes
Mainly Descriptive Questions
5/6 Questions
Mid-Term &
End-Term
Exams
Total Marks: 50
(Tentative)
Open book with excel
Duration: 2-3 Hours
Group project (Group decision at the end of
second week, final date will be updated by TA)
Analysis on primary data is preferred
Two Submissions: Project Proposal & Final
Project Submission
Project Proposal submission at the end of 18th
session (exact date will be updated by TA)
Final Submission (most possibly) on the day of end-
term exam (exact date will be updated by TA)
Project Proposal
• Project Proposal Preparation Details (one page)
✓Title of the project
✓Introduction and Motivation for the Problem
✓Data Source/ Data Collection (If it is a survey, then a brief discussion about the
questionnaire)
Final Project Submission
• Final project report should have sections as follows:
✓A title of the project with introduction and motivation for the problem
✓Data Source(s)/Data Collection
✓Descriptive Statistics
✓Methodology
✓Results
✓Conclusion
• The data set should be provided in the Appendix.
• Project submission should include the data set and the report.
Potential Project Topics
• Causal Effects of Maternal Education on Child Health Outcomes in India
(https://pmc.ncbi.nlm.nih.gov/articles/PMC7068132/) .
• Validating the Inverted U-Shaped Curve of Inequality and Wellbeing
(https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2017.02052/full).
• Evaluating the Success of the Startup India Mission (https://www.interesjournals.org/articles/an-analysis-of-
government-of-indias-startup-india-initiatives-and-their-impact-on-entrepreneurship.pdf ).
• Forecasting EV Market Growth and Infrastructure Needs in India
(https://www.niti.gov.in/sites/default/files/2022-06/ForecastingPenetration-ofElectric2W_28-06.pdf )
• Analyzing the Impact of ESG Scores on Company Performance
(https://www.sciencedirect.com/science/article/pii/S221484502200103X)
• Event Studies such as Impact of Budget Announcements on Nifty and sectoral index returns.
(https://economictimes.indiatimes.com/markets/stocks/news/how-has-nifty-performed-post-budget-in-the-last-
10-years/market-watch/slideshow/117785793.cms?from=mdr)
Reading Materials
• Chapters 1 & 2 of Textbook