0% found this document useful (0 votes)
45 views37 pages

Week 6+7+8

FDGS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views37 pages

Week 6+7+8

FDGS
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 37

STATISTICS

IN ECONOMICS AND BUSINESS

Nguyen Huyen Trang


Faculty of Statistics - National Economics University
[email protected]
LECTURE 6: DATA MEASUREMENT

Summary
Measures

Central Measures of
Tendency Dispersion

Standard
Mean Median Variance
Deviation

Coefficient of
Mode Variation
Quartile
OUTLINE

• Central Tendency
• Percentiles - Quartile
• Measures of Dispersion
CENTRAL TENDENCY

A summary measure that attempts to describe a whole


set of data with a single value that represents the middle
or center of its distribution.
• Mean
• Median
• Mode
MEAN

• The most common measure of central tendency

• Apply for quantitative only

• Have the same unit as original data

• Denote for the population mean: μ, for the sample mean: xത

• Formula:
➢ Arithmetic mean
➢ Geometric mean
ARITHMETIC MEAN

• Example: Student A’s grade in some courses


Course Grade Points
Algebra 3.63
Introduction to Logic 4.20 GPA???
Microeconomics 3.46
Statistics 4.00

 xi  xi
= x=
N n
WEIGHTED ARITHMETIC MEAN

• Example: Any difference if know more information about the


number of credits?
Course Number of Credits Grade Points
Algebra 3 3.63
Introduction to Logic 2 4.20
Microeconomics 3 3.46
Statistics 3 4.00

Weight wi Value xi
Each data is given a weight that reflects its importance
WEIGHTED ARITHMETIC MEAN

Number Grade Grade Points x


Course
of Credits Points Credits
Algebra 3 3.63 10.89
Introduction to Logic 2 4.20 8.40
Microeconomics 3 3.46 10.38
Statistics 3 4.00 12.00
Total 11 x 41.67

In general, for weighted data:


σ x i wi where:
xത = xi = value of observation i
σ wi wi = weight for observation i
GROUPED DATA

• The weighted mean computation can be used to obtain


approximations of the mean, variance, and standard deviation for the
grouped data.
• To compute the weighted mean, we treat the midpoint of each class
as though it were the mean of all items in the class.
• We compute a weighted mean of the class midpoints using the class
frequencies as weights.
• Similarly, in computing the variance and standard deviation, the class
frequencies are used as weights.
σ x i fi where:
xത = xi = midpoint of each class
σ fi fi = class frequencies
MEAN FOR GROUPED DATA

Example: SCCoast, an Internet provider in the Southeast, developed


the following frequency distribution on the age of Internet users.
Age Frequency
Number (fi)
of users xi x if i
10 up to 20 3 15 45
20 up to 30 7 25 175
30 up to 40 18 35 630
40 up to 50 20 45 900
50 up to 60 12 55 660
Total 60 2410
THE MEAN

• Compare the mean of following data:

– Data 1: {10, 10, 11, 12, 12}

– Data 2: {2, 3, 4, 6, 40}

• The mean is easily affected by the extreme values


or outliers → lead to biased comparison

• Use the other measure


MEDIAN

• The median of a data set is the value in the middle


when the data items are arranged in ascending
order.
• For an odd number of observations, the median is
the middle value.
• For an even number of observations, the median is
the average of the two middle values.
MEDIAN

• Median is the ‘cutoff point’ of lower 50% - upper 50% parts

• Denoted as Me

Lower
50%
Upper
50%

Median
MEDIAN

Example:

• Data: { 5, 6, 9, 5, 6}

Ordered data: { 5, 5, 6, 6, 9 }: Median = 6

• Ordered Data {6, 6, 7, 8, 9, 11} :


7+8
Median = = 7.5
2
MEDIAN

• Compare the mean and median of following data:


Data 1: {10, 10, 11, 12, 12}
Data 2: {2, 3, 4, 6, 40}
• The median is independent from the outliers
• Depends on the position
• Apply for quantitative variable only
MODE

• Could be applied for both quantitative and qualitative


variable
• The mode of a data set is the value that occurs with greatest
frequency
• Denoted as Mo
• Find the Mode:
➢ Qualitative Data
➢ Quantitative Data
MODE

• Qualitative Data
➢ Data: { Yellow, Yellow, Red, Blue, Green}
→ Mode is the category having the largest frequency
• Quantitative Data
➢ Data 1: { 5, 6, 6, 7, 7, 7, 9 }
➢ Data 2: { 5, 6, 7, 8, 9 }
➢ Data 3: { 5, 6, 9, 5, 6 }
➢ Data 4: { 5, 5, 5, 5, 5 }
→ Mode is the value having the largest frequency
There may be no mode or several modes
MEAN, MODE, MEDIAN

Negatively skewed Positively skewed

Left skewed Symmetric Right skewed

Mean
Median
Mean < Median < Mode Mode Mode < Median < Mean
PERCENTILES

❑ A percentile provides information about how the data are spread


over the interval from the smallest value to the largest value.
❑The pth percentile is a value
that divides the data into two
parts:
At least p% of the observations
are equal or less than the pth
percentile
At least (100 – p)% of the
observations are equal or
greater than the pth percentile
PERCENTILES

80% of people are shorter than you and your height is 1.85m

You are at the 80th percentile


Approximately 80% people shorter than (1.85m) and
20% people taller than 1.85m
PERCENTILES

A total of 10,000 people visited the shopping mall over 12 hours:


Time Cumulative 0 000 eople
(hours) Freq. 000
0 0 000
2 350 000

4 1100 000
5 000
6 2400
000
8 6500
000
10 8850 000
12 10000 000
i e in ours
0
• Estimate the 30th percentile 0 5 0

• Estimate what percentile of visitors had arrived after 11 hours


QUARTILES

Quartiles are specific percentiles, divides the data into 4 equal parts
by 3 cut-off points
• First Quartile Q1 = 25th Percentile
• Second Quartile Q2 = 50th Percentile = Median
• Third Quartile Q3 = 75th Percentile

25% 25% 25% 25%


Q1 Q2 Q3
MEASURES OF VARIABILITY

Firm A Firm B Mean A = Mean B = 1500


Worker 1 400 1480
Worker 2 400 1485
Worker 3 600 1486 Which firm’s worker salary is more
Worker 4 600 1488 fluctuated/stable?
Worker 5 700 1490
Worker 6 800 1503 Central Tendency may not provide
Worker 7 900 1505 efficient information of the data.
Worker 8 2000 1520 Data may have the same Mean,
Worker 9 2600 1521 Median, but differ in variability
Worker 10 6000 1522 (dispersion, spread)
MEASURES OF VARIABILITY

Tells about the spread of the data. Help us to compare the spread
in two or more distributions

▪ Range
▪ Variance
▪ Standard Deviation
▪ Coefficient of Variation
RANGE

• The difference between the largest and the smallest value in a


data set.
Firm A Firm B
R = xmax - xmin Worker 1 400 1480
Worker 2 400 1485
• Example:
Worker 3 600 1486
Range (A) = 6000 – 400 = 5600 Worker 4 600 1488
Worker 5 700 1490
Range (B) = 1522 – 1480 = 52 Worker 6 800 1503
Worker 7 900 1505
• Pros: simple Worker 8 2000 1520
Worker 9 2600 1521
• Cons: affected by outliers
Worker 10 6000 1522
INTERQUARTILE RANGE

• Interquartile Range is range between 3rd quartile and 1st quartile

• IQR is the width of 50% middle value of data


• It overcomes the sensitivity to extreme data values
VARIANCE

• Overcome the weakness of the range by using all the


values
➢ Data: x1, x2,…, xn → the mean
➢ Difference between the value of each observation (xi) and

the mean (x for a sample, μ for a population): xi - x



VARIANCE

• Formula:
 ( x −  ) 2
➢ Population Variance: 2 = i
N

➢ Sample Variance: s2 =  ( xi − x )
2

n −1
• If 𝑠𝑥2 > 𝑠𝑦2 then:

• x is more dispersed, widespread, fluctuated than y

• y is more stable, concentrated than x


VARIANCE FOR GROUPED DATA

• Formula:
 f ( M −  ) 2
➢ Population Variance:  2
= i i
N

➢ Sample Variance: s2 =  f i ( M i − x ) 2

n −1
STANDARD DEVIATION

• Is the square root of the variance

• It is measured in the same units as the data, making it more

easily comparable, than the variance, to the mean

• Formula:

➢ Population Standard Variance: σ = σ2

➢ Sample Standard Variance: s = s2


COEFFICIENT OF VARIATION

• Indicates how large the standard deviation is in relation to the mean

• This is the ratio of the standard deviation to the mean

SD
CV =  100
mean

Business Decision Making – Nguyen Minh Thu – [email protected]


COEFFICIENT OF VARIATION

An investor is considering the relative risks associated with two


projects:
• The first project has a mean expected profit of £5000 with a
standard deviation of £707.11
• The second project has a mean expected profit of £500 with a
standard deviation of £112.13
Use the measures of dispersion to establish which project has the
lowest degree of risk.
Business Decision Making – Nguyen Minh Thu – [email protected]
EXPLORATORY DATA ANALYSIS

• Five-Number Summary
• Box Plot
• Detecting Outlier
FIVE-NUMBER SUMMARY

• Smallest Value
• First Quartile
• Median
• Third Quartile
• Largest Value
=> use to draw box plot
BOX PLOT

• A box is drawn with its ends located at the first and third
quartiles.
• A vertical line is drawn in the box at the location of the
median.
• Limits are located (not drawn) using the interquartile range
(IQR).
✓ The lower limit is located 1.5(IQR) below Q1.
✓ The upper limit is located 1.5(IQR) above Q3.
✓ Data outside these limits are considered outliers.
(Value < Q1 – 1.5 IQR or Value > Q3 + 1.5 IQR)
BOX AND WHISKER PLOT

▪ Boxplot 1
min max
Q1 Q2 Q3

▪ Boxplot 2 IQR = Q3 – Q1
outlier
Q1 – 1.5IQR Q3 + 1.5IQR

Lower limit: the maximum of Upper limit: the minimum of


(min, Q1-1.5*IQR) (max, Q3+1.5*IQR)
BOX AND WHISKER PLOT

A B C D E F

Max 6 6 7 9 6 4

Q3 5 4 6 6 4 3

Q2 4.5 2.5 5.5 4.5 2.5 2.5

Q1 3 2 4 4 1 2

Min 1 1 1 3 -1 1

4.2 2.8 5.16 4.84 2.5 2.5

You might also like