0% found this document useful (0 votes)
29 views23 pages

Lecture 4

This lecture discusses different data distributions, focusing on normal distribution, its properties, and the concept of z-scores. It also covers skewness, kurtosis, binomial distribution, Poisson distribution, and correlation analysis using the chi-square test. Practical examples and problems are provided to illustrate these concepts.

Uploaded by

Ĵb Ĵôÿ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views23 pages

Lecture 4

This lecture discusses different data distributions, focusing on normal distribution, its properties, and the concept of z-scores. It also covers skewness, kurtosis, binomial distribution, Poisson distribution, and correlation analysis using the chi-square test. Practical examples and problems are provided to illustrate these concepts.

Uploaded by

Ĵb Ĵôÿ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

CSE303

Lecture 4: Different Data Distributions


DATA DISTRIBUTION
NORMAL DISTRIBUTION

• In statistics, a normal distribution or Gaussian distribution is a type of continuous


probability distribution for a real-valued random variable. The general form of
its probability density function is

3
NORMAL DISTRIBUTION
• In probability theory, the
normal (or Gaussian or
Gauss or Laplace-Gauss)
distribution is a very common
continuous probability
distribution
• The probability density of the
The Normal Distribution has:
normal distribution is •mean = median = mode
•symmetry about the center
•50% of values less than the mean
and 50% greater than the mean
PROPERTIES OF NORMAL DISTRIBUTION
EXAMPLE 1
• 95% of students at school are between 1.1m and 1.7m tall. Assuming this data is
normally distributed can you calculate the mean and standard deviation?
STANDARD SCORE OR “Z-SCORE”

• The number of standard deviations from the mean is also called the "Standard
Score", "sigma" or "z-score“
• Example 2: In that same school one of your friends is 1.85m tall. Find out his z-
score.
• z-score (for one sample) = (x – μ) / σ = 1.85 – 1.4 / 0.15 = 3.0
WHY DO WE NEED Z-SCORE?

• Example 4: Professor Willoughby is marking a test. Here are the students results (out of 60
points):
20, 15, 26, 32, 18, 28, 35, 14, 26, 22, 17
Most students didn't even get 30 out of 60, and most will fail.

• Professor decides to Standardize all the scores and only fail people 1 standard deviation below
the mean.
• The Mean is 23, and the Standard Deviation is 6.6, and these are the Standard Scores:
-0.45, -1.21, 0.45, 1.36, -0.76, 0.76, 1.82, -1.36, 0.45, -0.15, -0.91
• Now only 2 students will fail (the ones who scored 15 and 14 on the test)
• Much fairer!
STANDARD NORMAL DISTRIBUTION

9
ANOTHER EXAMPLE

• Your score in a recent test was 0.5 standard deviations above the average, how many
people scored lower than you did?
NORMAL DISTRIBUTIONS

11
SKEWNESS

• It is the degree of distortion from the symmetrical bell curve or the normal
distribution. It measures the lack of symmetry in data distribution.
• It differentiates extreme values in one versus the other tail. A symmetrical
distribution will have a skewness of 0.

12
KURTOSIS

• Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is
used to describe the extreme values in one versus the other tail. It is actually the
measure of outliers present in the distribution.

13
FORMULA FOR SKEWNESS AND KURTOSIS

14
BINOMIAL DISTRIBUTION

• A binomial distribution can be thought of as simply the probability of a SUCCESS or


FAILURE outcome in an experiment or survey that is repeated multiple times.
• The binomial is a type of distribution that has two possible outcomes (the prefix “bi”
means two, or twice). For example, a coin toss has only two possible outcomes: heads
or tails and taking a test could have two possible outcomes: pass or fail.
• Binomial Distribution Function: b(x; n, P) = nCx * px * (1 – p)n – x
• Mean = n * P
• Variance = n * P * (1-P)

15
PRACTICE PROBLEMS

• A coin is tossed 10 times. What is the probability of getting exactly 6 heads?

• 60% of people who purchase sports cars are men. If 10 sports car owners are
randomly selected, find the probability that exactly 7 are men.

16
POISSON DISTRIBUTION

• A Poisson distribution is a tool that helps to predict the probability of certain events
from happening when you know how often the event has occurred. It gives us the
probability of a given number of events happening in a fixed interval of time.
• Poisson Distribution Function: P(x; μ) = (e -μ * μx) / x!

17
PRACTICE PROBLEMS

• The average number of major storms in your city is 2 per year. What is the probability
that exactly 3 storms will hit your city next year?

18
CORRELATION ANALYSIS (NOMINAL DATA)

• Χ2 (chi-square) test
(Observed - Expected ) 2
c2 = å
Expected
• The larger the Χ2 value, the more likely the variables are related
• The cells that contribute the most to the Χ2 value are those
whose actual count is very different from the expected count
• Correlation does not imply causality
• # of hospitals and # of car-theft in a city are correlated
• Both are causally linked to the third variable: population

19
CHI-SQUARE CALCULATION: AN
EXAMPLE

Play chess Not play chess Sum (row)


Like science fiction 250(90) 200(360) 450

Not like science fiction 50(210) 1000(840) 1050

Sum(col.) 300 1200 1500

• Χ2 (chi-square) calculation (numbers in parenthesis are


expected counts calculated based on the data distribution
in the two categories)
(250 - 90) 2 (50 - 210) 2 (200 - 360) 2 (1000 - 840) 2
c =
2
+ + + = 507.93
90 210 360 840
• It shows that like_science_fiction and play_chess are
20
correlated in the group
PRACTICE PROBLEM

• Let's say you want to know if gender has anything to do with political party preference. You poll
440 voters in a simple random sample to find out which political party they prefer. The results
of the survey are shown in the table below:

21
CHI-SQUARE TABLE

22
THANK YOU

23

You might also like