0% found this document useful (0 votes)
34 views33 pages

Lecture 1 - Part 2 - Tabulation & Graphical Presentation

The document provides an overview of applied statistics, focusing on types of variables, levels of measurement, and graphical presentations. It discusses discrete and continuous variables, qualitative and quantitative data, and various histogram shapes to represent data distributions. Additionally, it covers histogram construction, numerical presentation, and the concept of mean, including its advantages and disadvantages.

Uploaded by

Michael Yousry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views33 pages

Lecture 1 - Part 2 - Tabulation & Graphical Presentation

The document provides an overview of applied statistics, focusing on types of variables, levels of measurement, and graphical presentations. It discusses discrete and continuous variables, qualitative and quantitative data, and various histogram shapes to represent data distributions. Additionally, it covers histogram construction, numerical presentation, and the concept of mean, including its advantages and disadvantages.

Uploaded by

Michael Yousry
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

Applied Statistics

Dr. Aya Ahmed


Assistant Professor of Econometrics Applied Statistics

Lecture Two
Types of Variables

Discrete Continuous
- Countable - Measurable
- There is a CLEAR GAP - There is NO GAP between
between each two each two consecutive
consecutive values values
# of children Salaries
# of Patients Profits
Costs
# of banks
Revenues
# of Stocks
Weights
# companies
Level of Measurement

Qualitative Data Quantitative Data


Dis. / Con.
Nominal Ordinal
Interval Ratio
Data is Data is
NOT ordered Zero is Zero is
ordered by nature NOT defined
by nature defined
Graphical Presentation

Qualitative Data Quantitative Data


Interval or ratio
Nominal Ordinal
Dis. Con.
Pie chart Bar chart
Bar chart Histogram
8
Histogram: Age Of Students
6

Frequency
4

0
5 15 25 35 45 55 More
Common Shapes for Histogram

 The Majority of frequencies are in the middle


 That not bad or good
 Example: the shape of students marks is symmetric
that means the majority of students their marks on
average and small proposition got high marks and
low marks.

Symmetric
Common Shapes for Histogram

• The Majority of values in low intervals

Skewed to the right


Positively Skewed
Common Shapes for Histogram

• Marks
• Costs
• Revenues
• Recovery
• Evaluation
• Taxes

Skewed to the right


Positively Skewed
Common Shapes for Histogram

• The Majority of values in high intervals

Skewed to the left


Negatively Skewed
Common Shapes for Histogram

• Marks
• Costs
• Revenues
• Recovery
• Evaluation
• Taxes

Skewed to the left


Negatively Skewed
Application
Histogram Construction
Steps
Intervals
Step 1: Range = 90 - 40 – 50 million $ 40 – 50
50 – 60
Step 2: # of intervals = 5 60 – 70
70 – 80
Step 3: Interval Length = 50/5 = 10 million $ 80 – 90
Distribution table for Debts in million $

Relative
Intervals Tally Frequency Percentage
Frequency
40 – 50 // 2 2/25 = 0.08 8%
50 – 60 /// 3 3/25 = 0.12 12 %
60 – 70 //// 4 4/25 = 0.16 16 %
70 – 80 //// / 6 6/25 = 0.24 24 %
80 – 90 //// //// 10 10/25 = 0.40 40 %
Total 25 1 100%
Histogram of Debts
40
40

30

24
Percent

20
16

12

10 8

0
40 50 60 70 80 90
Debts
Comment
• The Histogram is skewed to the left which means that the majority of the debts are found in the

high intervals and when we go to the lower intervals the percentages in each interval of debts

start to go down

• The majority of the debts are found in the highest intervals of debts which is form 70 to 90

million dollars and they represent 64% from the total number of debts.

• Recommendations start collecting the debts specially the ones which are found in the highest

intervals debts. To apply different payment methods and plans.


Question Two: The United Nations Relief
committee decided to collect amount of
money which the 20 countries give it
directly to a Lebanon during the last
year as follows (number in millions):

65 98 55 62 79 59 48 90 72 56

70 62 66 80 94 79 63 73 71 86
Answer

 Range= largest value – smallest value


= 98 − 48 = 50
50
 Length of each class = = 10
5
Answer

Classes Tally Frequency Relative frequency % frequency

48 - 58 \\\ 3 3ൗ 3ൗ × 100% = 15%


𝟐𝟎 𝟐𝟎

58 - 68 \\\\ \ 6 6ൗ 6ൗ × 100% = 30%


𝟐𝟎 𝟐𝟎

68 - 78 \\\\ 4 4ൗ 4ൗ × 100% = 20%


𝟐𝟎 𝟐𝟎

78 - 88 \\\\ 4 4ൗ 4ൗ × 100% = 20%


𝟐𝟎 𝟐𝟎

88 - 98 \\\ 3 3ൗ 3ൗ × 100% = 15%


𝟐𝟎 𝟐𝟎

----- ---- 20 1 100%


Numerical Presentation
Marks are symmetric or Minimum 70 marks
Numerical Presentation

The main goal is to summarize all the values in the given dataset in a value or more,
where when we look at these values we can know what happened in the dataset.

What?

How?

When?
Numerical Presentation

POPULATION Sample

Parameter Statistic
𝜇 𝑥ҧ
𝜎2 𝑠2
𝑁 𝑛
Central Measures

The main goal is to summarize all the values in one value where the majority of the
values are around it.
**All if all values are contains the same value
Mean Median Mode
Mean

What does it indicate?

It is the value at the center of dataset where the majority of the values are
around it
Mean

𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔


𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Mean
𝑺𝒖𝒎 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔
𝑴𝒆𝒂𝒏 =
𝑪𝒐𝒖𝒏𝒕 𝒐𝒇 𝒕𝒉𝒆 𝒗𝒂𝒍𝒖𝒆𝒔

σ𝒏𝒊=𝟏 𝒙𝒊
ഥ=
𝑿
𝒏
Profits in million $
92, 85, 88, 95
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓
ഥ=
𝑿 = 𝟗𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
𝟒

Comment: the mean of the profits is 90 million $ which represents the value at the

center of dataset where the majority of the values are around it


Mean

In case we have a company In case we have another company


which has a profit of zero, which has a profit of zero, the
the mean will be mean will be

𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 + 𝟎
𝟗𝟐 + 𝟖𝟓 + 𝟖𝟖 + 𝟗𝟓 + 𝟎 ഥ=
𝑿
ഥ=
𝑿 𝟔
𝟓 = 𝟔𝟎 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $
= 𝟕𝟐 𝒎𝒊𝒍𝒍𝒊𝒐𝒏 $

Can we depend on 60 to
There a big difference represents the data
between 72 and zero
Mean
Outlier

It is a value which has different nature of the values in the given dataset.
By removing the outlier:
1-Sample size will be less
2-Less reliable estimates
3-We don’t only remove a value but we remove a feature from the sample that is
found in the population
Mean

When we can remove the outlier

Technical problem

Bad entry mistake


Mean

Advantages Disadvantages

• Easy to be • It is affected by
calculated
outliers
• Easy to be explained
• Takes all the values
into calculation
Thank you

See you next lecture

You might also like