MA 2140: Statistics
Dr. Sameen Naqvi
Department of Mathematics, IIT Hyderabad
Email id:
[email protected] 1 / 22
Logistics
I Lecture Timings and Location:
Tuesday (11 am - 12 noon), Wednesday (2.30 - 4 pm) and Friday (10
- 11 am) in the Auditorium.
I Office/Office hours:
C-Block 312/A (by appointment).
I Prerequisite:
MA 2110 : Probability
2 / 22
Logistics
I Grading Scheme: Relative
Surprise Quiz - 20%
Final Exam - 80%.
Final Exam: March 20, 2020 (Friday); 3 - 5 pm (2 hours).
I Lecture Slides: Google classroom
Code: qk2bjds.
3 / 22
Reference books
I Ross, S.M., 2014. Introduction to probability and statistics for
engineers and scientists. Academic Press.
I Walpole, R.E., Myers, R.H., Myers, S.L. and Ye, K., 1993. Probability
and statistics for engineers and scientists (Vol. 5). New York:
Macmillan.
I Rohatgi, V.K. and Saleh, A.M.E., 2015. An introduction to
probability and Statistics. John Wiley & Sons.
4 / 22
Course Contents
I Fundamentals of Data
I Sampling and Sampling Distributions
I Point and Confidence-Interval Estimation
I Hypothesis Testing
5 / 22
Fundamentals of Data
6 / 22
Agenda
I Understand “Why?” and “What?” of Statistics.
I Review data basics; classify variables as numerical and categorical and
distinguish between observational and experimental studies.
I Learn various techniques of data collection.
I Identify various measures to summarize data, and explore ways to
visualize numerical and categorical data.
7 / 22
Overview of Statistics
Why study Statistics ?
8 / 22
Overview of Statistics
What is Statistics ?
I Statistics is the study of how best to collect, analyze, and draw
conclusions from data.
9 / 22
Overview of Statistics
Branches of Statistics
There are two major branches of statistics:
Descriptive statistics: Devoted to the summarization and
description of data
includes the construction of graphs, charts, and tables, and the
calculation of various descriptive measures such as averages, variation,
and percentiles.
Inferential statistics: Concerned with using sample data to make an
inference about a population
includes methods like point estimation, interval estimation and
hypothesis testing which are all based on probability theory.
10 / 22
Overview of Statistics
Statistics vs. Probability
11 / 22
Overview of Statistics
Data Basics
12 / 22
Data Basics Types of variables
Types of variables
all variables
numerical categorical
(quantitative) (qualitative)
take on numerical values take on a limited number
sensible to add, subtract, of distinct categories
take averages, etc. with categories can be
these values identified with numbers,
but not sensible to do
arithmetic operations
13 / 22
Data Basics Types of variables
Numerical variables
all variables
nume
numerical categorical
continuous discrete
take on any of an take on one of a
infinite number of specific set of
values within a numeric values
given range
I Examples
Continuous: Amount of water in 1 gallon container
Discrete: Number of students in this class
14 / 22
Data Basics Types of variables
Categorical variables
all variables
catego
numerical categorical
continuous discrete regular !
categorical ordinal
levels have an
inherent ordering
I Examples
Regular categorical/ Nominal: Gender, Hair color (here no
hierarchy is implied)
Ordinal: Level of Education, Economic Status
15 / 22
Data Basics Types of variables
Example
data matrix
country cr_req cr_comply ud_req ud_comply … hemisphere hdi
observation!
Argentina 21 100 134 32 … southern very high
Australia 10 40 361 73 … southern very high
(case)
Belgium <10 100 90 67 … northern very high
Brazil 224 67 703 82 … southern high
… … … … … … … …
United States 92 63 5950 93 … northern very high
variable
*Google’s Transparency Report (2011).
16 / 22
Data Basics Types of variables
Example contd.
country cr_req cr_comply ud_req ud_comply … hemisphere hdi
Argentina 21 100 134 32 … southern very high
Australia 10 40 361 73 … southern very high
Belgium <10 100 90 67 … northern very high
Brazil 224 67 703 82 … southern high
… … … … … … … …
United States 92 63 5950 93 … northern very high
I cr req:Number
Number of content removal requests made discrete
to Google
cr_req: of content removal requests made to Google
numerical
I Discrete Numerical
17 / 22
Data Basics Types of variables
Example contd.
country cr_req cr_comply ud_req ud_comply … hemisphere hdi
Argentina 21 100 134 32 … southern very high
Australia 10 40 361 73 … southern very high
Belgium <10 100 90 67 … northern very high
Brazil 224 67 703 82 … southern high
… … … … … … … …
United States 92 63 5950 93 … northern very high
continuo
I cr comply: Percentage of content removal requests Google complied
cr_comply: Percentage of content removal requests Google complied with
with numeric
I Continuous Numerical
18 / 22
Data Basics Types of variables
Example contd.
country cr_req cr_comply ud_req ud_comply … hemisphere hdi
Argentina 21 100 134 32 … southern very high
Australia 10 40 361 73 … southern very high
Belgium <10 100 90 67 … northern very high
Brazil 224 67 703 82 … southern high
… … … … … … … …
United States 92 63 5950 93 … northern very high
I hemisphere: Hemisphere that the country
hemisphere: is located
Hemisphere incountry
that the (southern,
is located in
northern) categorical (southern, northern)
I Nominal Categorical
19 / 22
Data Basics Types of variables
Example contd.
country cr_req cr_comply ud_req ud_comply … hemisphere hdi
Argentina 21 100 134 32 … southern very high
Australia 10 40 361 73 … southern very high
Belgium <10 100 90 67 … northern very high
Brazil 224 67 703 82 … southern high
… … … … … … … …
United States 92 63 5950 93 … northern very high
Human Development
hdi:medium,
I hdi: Human Development Index (very high, high, low) Ind
(very high, high, medium, low)
I Ordinal Categorical
20 / 22
Data Basics Types of variables
Relationship between variables
Two variables that show some connection with one another are called
associated (dependent), otherwise they are said to be
independent.
Association can be either positive or negative!
21 / 22
Data Basics Types of variables
Thank you for listening!
22 / 22