0% found this document useful (0 votes)
7 views55 pages

Slide 1 DA Basics

The document outlines a data analysis course focused on statistics, specifically highlighting the relevance of statistics through a case study of Pepsi's exclusivity agreement with a university. It covers basic concepts of statistics, types of data, sampling methods, and evaluation components of the course, including quizzes and projects. Additionally, it provides potential project topics and reading materials for students.

Uploaded by

Disha Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views55 pages

Slide 1 DA Basics

The document outlines a data analysis course focused on statistics, specifically highlighting the relevance of statistics through a case study of Pepsi's exclusivity agreement with a university. It covers basic concepts of statistics, types of data, sampling methods, and evaluation components of the course, including quizzes and projects. Additionally, it provides potential project topics and reading materials for students.

Uploaded by

Disha Tiwari
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Data Analysis

Introduction
Outline
Relevance of Statistics

Introduction to Basic Concepts

Course Details
Relevance of Statistics
CASE: PEPSI’S EXCLUSIVITY AGREEMENT
Case: Pepsi’s Exclusivity
Agreement
•A large university with a total enrollment of
about 50,000 students has offered Pepsi an
exclusivity agreement that would give Pepsi
exclusive rights to sell its products at all
university facilities for the next year with an
option for future years.
• In return, the university would receive 35% of
the on-campus revenues and an additional lump
sum of $200,000 per year.
• Pepsi has been given 2 weeks to respond.
The market for soft drinks is measured in
terms of 12-ounce cans.

Case 1: Pepsi currently sells an average of 22,000


Background cans per week (over the 40 weeks of the
year that the university operates).
Details
The cans sell for an average of 75 cents
each. The costs including labor amount to
20 cents per can.
Case 1: A Problem
• Pepsi is unsure of its market share.
• However, they suspect that it is considerably less
than 50%.

Source: https://99designs.com/icon-button-design/contests/icon-button-design-wanted-guessing-game-167222
Profit-Loss Calculation
• Suppose the current market share were around
25%.
• Pepsi would sell 88,000 (22,000 is 25% of
88,000) cans per week or 3,520,000 cans per
year.
• The profit or loss can be calculated.

Source: https://www.score.org/resource/12-month-profit-and-loss-projection
Case 1: Market Survey
• The only problem is that Pepsi does not know
how many soft drinks are sold weekly at the
university.
• Pepsi assigned a recent university graduate to
survey the university's students to supply the
missing information.
• Accordingly, she organizes a survey that asks 500
students to keep track of the number of soft drinks
they purchase in the next 7 days.

Source: https://getthematic.com/insights/customer-survey-design/
Simple Random
Sample
✓ Simple random sample is a sample
of n observations which has the
same probability of being selected
from the population as any other
sample of n observations.
• Most statistical methods presume
simple random samples.
• However, in some situations
other sampling methods have an
advantage over simple random
samples.

Source: https://www.statisticshowto.com/simple-random-sample/
Stratified Random
Sampling
• Divide the population into mutually
exclusive and collectively exhaustive
groups, called strata.
• Randomly select observations from each
stratum, which are proportional to the
stratum’s size.
• Advantages:
✓ Guarantees that each population
subdivision is represented in the
sample.
✓ Parameter estimates have greater
precision than those estimated from
simple random sampling.

Source: https://www.netquest.com/blog/en/random-sampling-stratified-sampling
Cluster Sampling
• Divide population into mutually exclusive
and collectively exhaustive groups, called
clusters.
• Randomly select clusters.
• Sample every observation in those randomly
selected clusters.
• Advantages and disadvantages:
✓ Less expensive than other sampling
methods.
✓ Less precision than simple random
sampling or stratified sampling.
✓ Useful when clusters occur naturally in
the population.

Source: https://www.netquest.com/blog/en/cluster-sampling
A Simple Representation of Survey Data
(First 8 Rows)
Student Id No. of Cans Purchased in a Week
1 14
2 10
3 8
4 6
5 9
6 12
7 13
8 4
Decision-Making

Design a Market Estimate the


Profit and Loss
Survey and Potential
Calculation
Collect Data Volume
What is Statistics?
What is Statistics?

Data Statistics Information

Statistics is a tool for creating new understanding from a set of numbers.


Broad Framework
Recognize Recognize the problem or question

Review Review Previous Findings

Select Select Your Variables


Broad
Framework..
Collect Collect Data

Analyze Analyze Data

Present and Act


Present and Act on Results
on
https://hbsp.harvard.edu/download?url=%2Fcontent%2Fsample%2FR1307L-PDF-
ENG%2Fcontent&metadata=e30%3D
Basic Concepts
Population and Sample

Subset

Population Sample
Parameter Statistic

Populations have Parameters, Samples have Statistics.


Population and Sample
• Population
✓ A population is the group of all items of interest to a statistics
practitioner.
✓Frequently very large.

• Sample
✓A sample is a set of data drawn from the population.
✓Large enough, but less than the population.
Parameter and Statistic
• Parameter
✓A descriptive measure of a population.

• Statistic
✓A descriptive measure of a sample.
Too expensive to gather
information on the
entire population
Need for
Sampling Often impossible to
gather information on
the entire population
Two Branches

Statistics

Descriptive Statistics Inferential Statistics


Descriptive Statistics
• Descriptive Statistics provides a set of methods for
organizing, summarizing, and presenting data in a
convenient and informative way.
• These methods include:
✓ Graphical Techniques and
✓ Numerical Techniques.
A Problem…
• Descriptive Statistics describe the data set that’s
being analyzed but doesn’t allow us to draw any
conclusions about the population.
Inferential Statistics
• Statistical inference is the process of making an estimate, prediction, or
decision about a population based on a sample.

Population
What can we infer
Sample
about a Population’s
Parameters based
Inference
on a Sample’s
Statistics?
Statistic
Parameter
Inferential Statistics
• We use statistics to make inferences about parameters.

• Therefore, we can make an estimate, prediction, or decision about a


population based on sample data.

• Thus, we can apply what we know about a sample to the larger


population from which it was drawn!
Data Types
Types of Data

Data Types

Cross- Time
Sectional Series
Case 1: Survey Data
Student Id No. of Cans Purchased
in a Week
Cross-sectional Data
1 14 • Data collected by recording a characteristic of many
2 10 subjects at the same point in time, or without
3 8
regard to differences in time.
4 6 • Subjects might include individuals, households,
5 9 firms, industries, regions, and countries.
6 12
7 13
8 4
Time Series Data
• Data collected by recording a characteristic of e-3 Wheeler Registrations in India
a subject over several time periods. 800000

• Data can include daily, weekly, monthly, 700000

quarterly, or annual observations. 600000

• The graph shows e-3 wheeler registrations in 500000


India.
400000

• 3-wheeler EVs like e-autos and e-rickshaws


account for close to 65% of all EVs registered
300000

in India. 200000

• For more details, check our article: 100000

0
https://www.thehindu.com/opinion/op-ed/indias-ev-ambition-

2016,Sep
2013,Jan

2013,Sep
2014,Jan

2014,Sep
2015,Jan

2015,Sep
2016,Jan

2017,Jan

2017,Sep
2018,Jan

2018,Sep
2019,Jan

2019,Sep
2020,Jan

2020,Sep
2021,Jan

2021,Sep
2022,Jan
2013,May

2014,May

2015,May

2016,May

2017,May

2018,May

2019,May

2020,May

2021,May
rides-on-three-wheels/article65480119.ece
Variables and Scales of Measurement
Variable
• A variable is the general characteristic being observed on an object of
interest.
Types of Variables

Variables

Qualitative Quantitative
Types of Variables
• Qualitative – gender, race, political affiliation
• Quantitative – test scores, age, weight
✓Discrete
✓Continuous
Discrete Variable
• A discrete variable assumes a countable number of distinct values.
• Examples: Number of children in a family, number of points scored in a
basketball game.
Continuous Variables
• A continuous variable can assume an infinite number of values within
some interval.
• Examples: Weight, height, investment return.
Scales of Measurement

- Nominal
Qualitative Variables
- Ordinal

- Interval
Quantitative Variables
- Ratio
Nominal Scale
• The least sophisticated level of measurement.
• Data are simply categories for grouping the data.

Qualitative values may be converted


to quantitative values for
analysis purposes.
Ordinal Scale
• Ordinal data may be categorized and ranked with respect to some
characteristic or trait.
• For example, students are often evaluated on an ordinal scale
(excellent, good, fair, poor).
• Differences between categories are meaningless because the actual
numbers used may be arbitrary.
• There is no objective way to interpret the difference between student
quality.
Interval Scale
• Differences between values are equal and meaningful. Thus, the
arithmetic operations of addition and subtraction are meaningful.
• No “absolute 0” or starting point defined. Meaningful ratios may not be
obtained.
Interval Scale
•For example, consider the Fahrenheit
scale of temperature.
•This scale is interval because the data
are ranked and differences (+ or -)
may be obtained.
•But there is no “absolute 0”.
Ratio Scale
• The strongest level of measurement.
• Differences between values are equal and meaningful.
• There is an “absolute 0” or defined starting point. “0” does mean
“the absence of …” Thus, meaningful ratios may be obtained.
Ratio Scale
•The following variables are measured on a ratio scale:
✓General Examples: Weight and Distance
✓Business Examples: Sales, Profits, and Inventory Levels
Course Details
Course Plan

Introduction to
Sampling
Descriptive Probability and
Introduction Distribution and
Statistics Probability
Interval Estimation
Distributions

Hypothesis Testing ANOVA Regression Analysis


Textbook & Reference Book
• Doane et al. (2020), Applied Statistics in Business and Economics, McGraw Hill (Textbook).
• Jaggia et al. (2021), Business Statistics, McGraw Hill Education (Reference Book).
Evaluation Components
Components Weightage
Quiz 20%
Mid-Term 25%
End-Term 35%
Project 20%
3/4 In-Class Quizzes and 3/4 Scheduled
Quizzes

Best 2 In-Class Quizzes and Best 2 Scheduled


Quizzes will be taken for final grading
Quiz (Tentative)
Quizzes will be mainly concept-based and
may require minor computing

In-Class Quizzes: 5 Questions, 5 Minutes


Scheduled Quizzes: 10 Questions, 14 Minutes
Mainly Descriptive Questions

5/6 Questions
Mid-Term &
End-Term
Exams
Total Marks: 50
(Tentative)
Open book with excel

Duration: 2-3 Hours


Group project (Group decision at the end of
second week, final date will be updated by TA)

Analysis on primary data is preferred

Two Submissions: Project Proposal & Final


Project Submission

Project Proposal submission at the end of 18th


session (exact date will be updated by TA)

Final Submission (most possibly) on the day of end-


term exam (exact date will be updated by TA)
Project Proposal
• Project Proposal Preparation Details (one page)
✓Title of the project
✓Introduction and Motivation for the Problem
✓Data Source/ Data Collection (If it is a survey, then a brief discussion about the
questionnaire)
Final Project Submission
• Final project report should have sections as follows:
✓A title of the project with introduction and motivation for the problem
✓Data Source(s)/Data Collection
✓Descriptive Statistics
✓Methodology
✓Results
✓Conclusion
• The data set should be provided in the Appendix.
• Project submission should include the data set and the report.
Potential Project Topics
• Causal Effects of Maternal Education on Child Health Outcomes in India
(https://pmc.ncbi.nlm.nih.gov/articles/PMC7068132/) .
• Validating the Inverted U-Shaped Curve of Inequality and Wellbeing
(https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2017.02052/full).
• Evaluating the Success of the Startup India Mission (https://www.interesjournals.org/articles/an-analysis-of-
government-of-indias-startup-india-initiatives-and-their-impact-on-entrepreneurship.pdf ).
• Forecasting EV Market Growth and Infrastructure Needs in India
(https://www.niti.gov.in/sites/default/files/2022-06/ForecastingPenetration-ofElectric2W_28-06.pdf )
• Analyzing the Impact of ESG Scores on Company Performance
(https://www.sciencedirect.com/science/article/pii/S221484502200103X)
• Event Studies such as Impact of Budget Announcements on Nifty and sectoral index returns.
(https://economictimes.indiatimes.com/markets/stocks/news/how-has-nifty-performed-post-budget-in-the-last-
10-years/market-watch/slideshow/117785793.cms?from=mdr)
Reading Materials
• Chapters 1 & 2 of Textbook

You might also like