Applied Statistics
Dr. Aya Ahmed
Assistant Professor of Econometrics Applied Statistics
Lecture Two
Types of Variables
Discrete Continuous
- Countable - Measurable
- There is a CLEAR GAP - There is NO GAP between
between each two each two consecutive
consecutive values values
# of children Salaries
# of Patients Profits
Costs
# of banks
Revenues
# of Stocks
Weights
# companies
Level of Measurement
Qualitative Data Quantitative Data
Dis. / Con.
Nominal Ordinal
Interval Ratio
Data is Data is
NOT ordered Zero is Zero is
ordered by nature NOT defined
by nature defined
Graphical Presentation
Qualitative Data Quantitative Data
Interval or ratio
Nominal Ordinal
Dis. Con.
Pie chart Bar chart
Bar chart Histogram
8
Histogram: Age Of Students
6
Frequency
4
0
5 15 25 35 45 55 More
Common Shapes for Histogram
The Majority of frequencies are in the middle
That not bad or good
Example: the shape of students marks is symmetric
that means the majority of students their marks on
average and small proposition got high marks and
low marks.
Symmetric
Common Shapes for Histogram
• The Majority of values in low intervals
Skewed to the right
Positively Skewed
Common Shapes for Histogram
• Marks
• Costs
• Revenues
• Recovery
• Evaluation
• Taxes
Skewed to the right
Positively Skewed
Common Shapes for Histogram
• The Majority of values in high intervals
Skewed to the left
Negatively Skewed
Common Shapes for Histogram
• Marks
• Costs
• Revenues
• Recovery
• Evaluation
• Taxes
Skewed to the left
Negatively Skewed
Application
Histogram Construction
Steps
Intervals
Step 1: Range = 90 - 40 – 50 million $ 40 – 50
50 – 60
Step 2: # of intervals = 5 60 – 70
70 – 80
Step 3: Interval Length = 50/5 = 10 million $ 80 – 90
Distribution table for Debts in million $
Relative
Intervals Tally Frequency Percentage
Frequency
40 – 50 // 2 2/25 = 0.08 8%
50 – 60 /// 3 3/25 = 0.12 12 %
60 – 70 //// 4 4/25 = 0.16 16 %
70 – 80 //// / 6 6/25 = 0.24 24 %
80 – 90 //// //// 10 10/25 = 0.40 40 %
Total 25 1 100%
Histogram of Debts
40
40
30
24
Percent
20
16
12
10 8
0
40 50 60 70 80 90
Debts
Comment
• The Histogram is skewed to the left which means that the majority of the debts are found in the
high intervals and when we go to the lower intervals the percentages in each interval of debts
start to go down
• The majority of the debts are found in the highest intervals of debts which is form 70 to 90
million dollars and they represent 64% from the total number of debts.
• Recommendations start collecting the debts specially the ones which are found in the highest
intervals debts. To apply different payment methods and plans.
Question Two: The United Nations Relief
committee decided to collect amount of
money which the 20 countries give it
directly to a Lebanon during the last
year as follows (number in millions):
65 98 55 62 79 59 48 90 72 56
70 62 66 80 94 79 63 73 71 86
Answer
Range= largest value – smallest value
= 98 − 48 = 50
50
Length of each class = = 10
5
Answer
Classes Tally Frequency Relative frequency % frequency
48 - 58 \\\ 3 3ൗ 3ൗ × 100% = 15%
𝟐𝟎 𝟐𝟎
58 - 68 \\\\ \ 6 6ൗ 6ൗ × 100% = 30%
𝟐𝟎 𝟐𝟎
68 - 78 \\\\ 4 4ൗ 4ൗ × 100% = 20%
𝟐𝟎 𝟐𝟎
78 - 88 \\\\ 4 4ൗ 4ൗ × 100% = 20%
𝟐𝟎 𝟐𝟎
88 - 98 \\\ 3 3ൗ 3ൗ × 100% = 15%
𝟐𝟎 𝟐𝟎
----- ---- 20 1 100%
Numerical Presentation
Marks are symmetric or Minimum 70 marks
Numerical Presentation
The main goal is to summarize all the values in the given dataset in a value or more,
where when we look at these values we can know what happened in the dataset.
What?
How?
When?
Numerical Presentation
POPULATION Sample
Parameter Statistic
𝜇 𝑥ҧ
𝜎2 𝑠2
𝑁 𝑛
Central Measures
The main goal is to summarize all the values in one value where the majority of the
values are around it.
**All if all values are contains the same value
Mean Median Mode
Mean
What does it indicate?
It is the value at the center of dataset where the majority of the values are
around it
Mean
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Mean
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Profits in million $
92, 85, 88, 95
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓
ഥ=
𝑿 = 𝟗𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
𝟒
Comment: the mean of the profits is 90 million $ which represents the value at the
center of dataset where the majority of the values are around it
Mean
In case we have a company In case we have another company
which has a profit of zero, which has a profit of zero, the
the mean will be mean will be
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 + 𝟎
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 ഥ=
𝑿
ഥ=
𝑿 𝟔
𝟓 = 𝟔𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
= 𝟕𝟐 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
Can we depend on 60 to
There a big difference represents the data
between 72 and zero
Mean
Outlier
It is a value which has different nature of the values in the given dataset.
By removing the outlier:
1-Sample size will be less
2-Less reliable estimates
3-We don’t only remove a value but we remove a feature from the sample that is
found in the population
Mean
When we can remove the outlier
Technical problem
Bad entry mistake
Mean
Advantages Disadvantages
• Easy to be • It is affected by
calculated
outliers
• Easy to be explained
• Takes all the values
into calculation
Thank you
See you next lecture