0% found this document useful (0 votes)
9 views46 pages

Week 1 Lecture Notes

Uploaded by

kewchinloong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views46 pages

Week 1 Lecture Notes

Uploaded by

kewchinloong
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

SC2000/CZ2100/CE2100

Probability & Statistics

Week 1

1
Course Logistics
Part 1: Asst Prof. Themis Gouleakis
Email: [Link]@[Link]

Part 2: Assoc Prof. Kong Wai Kin Adams


Email: AdamsKong@[Link]
Room: LT2A-01-01 (LEVEL 1, NEAR MAE)
~ 24 Sessions of lectures
~ 11 Sessions of tutorials
~ 50 min per session

2
Accessing the Materials on NTU Learn

3
Quiz Logistics

Two quizzes for part 1 – 25% each

Quizzes are closed-book and in person,


during lecture hours.

You can do them using your own laptop. Labs


will also be made available.

Quiz one: 11 February,


Quiz two: 25 February.
4
What we will talk about in Part 1 (Week 1-7):
Ch1 Introduction to Statistics
Ch2 Presenting Data
Ch3 Summarizing Distributions
Ch4 Bivariate Data
Ch5 Probability Theory
Ch6 Probability Distribution
- Random Variables
- Discrete Distribution
- Continuous Distribution

5
• TEL supporting Materials for Ch1-4 (short video
clips and notes) are available in NTULearn.
Good to watch the video clips before attending
the lectures for Part 1 (Ch1-4).

• Lectures (Week 1 to 7)
– Discussion & worked examples on topics
covered in the TEL materials
– Additional topics not covered in TEL materials
(particularly those in Ch 5 and 6)

6
• Recess week (Self-study non-exam)
– Use of R programming for analysis and
presentation of statistical data (Practice
materials will be available in NTULearn)

• Tutorial – One session per week,


to begin in Week 3

All Course Materials are available in NTULearn.


Part Two starts from week 8.

7
Ch 1. Introduction to Statistics
● Descriptive & Inferential Statistics
● Types of Variables
● Percentiles
● Types of Measurement Scale
● Distributions
● Linear Transformations

9
Statistics involve gathering, organizing,
analyzing, interpreting and presenting data.

Statistics are being used everywhere:


▪ Number of students enrolled in this course

▪ Index measuring stock market

▪ Singapore household income

▪ Live data of new Covid-19 cases

▪ Opinion poll, benchmarks poll, tracking

polls, etc.

9
Statistics are obviously important.
▪ Predicting the spreading of diseases
▪ Weather forecasting based on statistics
▪ Provide informed choice on investment
decision
▪ AI or machine learning based on past
statistical data

10
But statistics can be misleading.

Example: A toothpaste manufacturer claims that


more than 80% of Dentists recommend a
particular brand of toothpaste. This was based on
surveys of dentists which allow selection of one or
more brands.

Why?
Because it may be understood that 80% of
dentists recommend this brand over the others. It
should be noted that other brands were also
recommended and may be as much as that
particular brand. 12
More examples: spurious correlations

Source: [Link] 13
Ice Cream Sales and Shark Attacks

• Observation: There is a positive correlation between ice cream


sales and shark attacks.
• Explanation: These two variables are not causally linked.
Instead, a third variable, hot weather, drives both. During
warmer months:
• More people buy ice cream.
• More people swim in the ocean, increasing the likelihood of shark encounters.
This is a classic example of a spurious correlation, where the observed relationship is due to a shared
underlying factor rather than a direct connection between the two variables.
◼ Descriptive & Inferential Statistics
Descriptive statistics – summarize and
describe important features of the data
collected. Does not generalize beyond the
data collected.

Inferential statistics - collection of sample


to draw inferences about the population,
i.e. formal guesses of statistical
parameters about the population by
looking at the samples.

14
A teacher wishes to know whether the males in his class have more
conservative attitudes than the females. A questionnaire is
distributed assessing attitudes and the males and the females are
compared.

Is this an example of descriptive or inferential statistics?

15
A cognitive psychologist is interested in comparing
two ways of presenting stimuli on subsequent
memory. Twelve subjects are presented with each
method and a memory test is given.

What would be the roles of descriptive and


inferential statistics in the analysis of these data?

Descriptive statistics – we describe and analyze the


data from the sample.
Inferential statistics – we use the data from the
sample to generalize to a larger population of
people.
16
◼ Types of Variables
In statistic, we can broadly group variables into two
categories: Qualitative and Quantitative.
Examples:

Qualitative: categorical variables (e.g., gender, marital


status, province).

Quantitative: numerical values (e.g., age, height). Can be


discrete or continuous.

17
◼ Percentile
In certain experiments, it is more meaningful to
compare the outcomes obtained.

Eg: if you know that your quiz marks is 80 out of


100, you may not know how well you have done
compared to others in your class.

A percentile is a comparison score between a


particular score and the scores of the rest of a
group.

18
◼ Percentile - calculation
Calculation of Pth Percentile for a set of N data:
P_Data arranged in the order of magnitude
1. Compute the rank R = 100 x (N + 1)

2. Let IR = Integer part of R and FR = Fractional part of R


3. Pth Percentile = Data at rank IR +
( Data at rank (IR+1) – Data at rank IR ) x FR
1 IR IR+1 N

19
◼ Percentile - calculation

R=6.4 20
Eg: Given data: [3, 5, 7, 8, 9, 11, 13, 15], compute the
25th and 75th percentile.

For 25th percentile-


Calculation of Pth Percentile for a set of N data:
Step 0: Are the numbers in ascending order? Yes. 1. Compute the rank R = P/100 x (N + 1)

Step 1: R = 25/100 x (8+1) = 2.25 2. Let IR = Integer part of R and FR = Fractional part of R

3. Pth Percentile = Data at rank IR +


Step 2: IR = 2 FR = 0.25 ( Data at rank (IR+1) – Data at rank IR ) x FR

Step 3: 25th percentile = 5 + (7-5) x 0.25 = 5.5

Practice at home: You should get 12.5 for 75th percentile.

21
Compute the 25th percentile for the data set of 20 values:

1 2 3 4 5 5 6 6 6 7
7 8 8 8 9 9 9 10 10 10

For 25th percentile-


Calculation of Pth Percentile for a set of N data:
Step 0: Are the numbers in ascending order? Yes.
1. Compute the rank R = P/100 x (N + 1)
Step 1: R = 25/100 x (20+1) = 5.25
2. Let IR = Integer part of R and FR = Fractional part of R

Step 2: IR = 5 FR = 0.25 3. Pth Percentile = Data at rank IR +


( Data at rank (IR+1) – Data at rank IR ) x FR
Step 3: 25th percentile = 5 + (5-5) x 0.25 = 5

22
Compute the 85th percentile for the data set of 20 values:
1 2 3 4 5 5 10 10 10 6

6 9 9 9 7 7 7 8 8 8

Step 0: Are the numbers in ascending order?

No! You need to re-arrange them first.

1 2 3 4 5 5 6 6 7 7

7 8 8 8 9 9 9 10 10 10

Practice at home: You should get 9.85 for the 85th


percentile.
23
Practical example:
Graduate Management Admissions Test - a standardized
test for application to graduate-level business programs.

24
Example: Percentile for benchmarking.
Suppose Jimmy earns a monthly wage of $6,200. He
would like to compare his wages with those working
in the same occupation and industry.

[Link] 25
SC2000/CZ2100/CE2100
Probability & Statistics
Ch 1. Introduction to Statistics

● Descriptive & Inferential Statistics


● Types of Variables
● Percentiles
● Types of Measurement Scale
● Distributions
● Linear Transformations
26
◼ Types of Measurement Scales
Four basic levels of measurement scales:
Nominal – names or labels with no specific order
Ordinal – variables in a specific order

Interval – numerical scales in which intervals have


the same interpretation throughout, but
no true zero
Ratio – include all the characteristics of interval
scale, plus it has zero position indicating the
absence of the quantity being measured
27
Specify the level of measurement used for the following
variables:

1. Calendar years
2. Colors
3. Amount of money in your pocket
4. Ranking of police

28
◼ Distributions
Frequency distribution
- Discrete variables
- Continuous variables

29
◼ Distributions

Probability distribution
- Probability mass function (pmf) for discrete variables
- Probability density function (pdf) for continuous variables

30
◼ Distributions

Shapes of distributions
Symmetric
- symmetric
- skewed to the right
- skewed to the left

Positive skew (to the right)

Negative skew (to the left)


31
◼ Distributions – other shapes:

Multi-modal distribution

Bimodal distribution
33
◼ Linear Transformations
Transform data from one measurement scale to
another.

Examples:
Convert length measured in X feet to
measurement in Y meters, i.e.
𝑌= 0.3048 𝑋

Convert temperature in Fahrenheit to Centigrade:


𝐶= 0.5556 𝐹 − 17.778

34
◼ Linear Transformations

35
Ch 2. Presenting Data
● Frequency tables and Charts
● Bar Charts
● Stem and Leaf Displays
● Histograms
● Box Plots
● others

36
▪ Presenting qualitative or discrete data:

Frequency table

Bar chart Pie chart


37
Examples of misleading presentations:

The battery on the


right has 70% more
capacity than the
one on t he left.
Is there any
Previous iPad New iPad Misleadin g display?
misleading
presentation?

38
▪ Stem-and-leaf Presentation
- Useful when data are not too numerous
Eg: No. of touchdown passes by each of the 31 football
teams.
37, 33, 33, 32, 29, 28, 28, 23, 22, 22, 22, 21, 21, 21, 20,
20, 19, 19, 18, 18, 18, 18, 16, 15, 14, 14, 14, 12, 12, 9, 6
Stem

3 | 2337
2 | 001112223889 Leaf
1 | 2244456888899
0 | 69
39
▪ Presenting continuous data:
Data range is divided in class intervals or bins

Grouped frequency distribution of certain test scores


40
▪ Presenting continuous data:
Data range is divided in class intervals or bins

Grouped frequency
Histogram of the distribution
test scoresof certain test scores
41
Presenting continuous data - Histogram:
Examples: Class interval size affects the visual presentation

A rule of thumb:
# of class intervals ≈

# of data

42
Question: You have to decide between displaying data with a
histogram or a stem-and-leaf display. What factor would
affect your choice?

With more data, a histogram can be very useful since it shows the overall
shape of the distribution.
Stem-and-leaf display is better for smaller sets of data.

43
Question:
Suppose you are constructing a histogram for describing the
distribution of salaries for CTOs.
(a) What is on the Y-axis? (b) What is on the X-axis?
(c) What would be the probable shape of the salary
distribution? Explain why.

a) The Y-axis would of the frequency of individuals.


b) Salary would be on the X-axis because this is the
variable whose distribution is of interest.
c) The distribution is expected to be positively skewed
with a some earning well above the average salary.

44
Left source: Business Insider 45
A fun site on salaries:
[Link]
/07/salary-guide-2024/[Link]

46

You might also like