0% found this document useful (0 votes)
108 views74 pages

Descreptive Statistics 1

Uploaded by

Aniza Arshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
108 views74 pages

Descreptive Statistics 1

Uploaded by

Aniza Arshad
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd

DESCRIPTIVE STATISTICS

Prof. Dr. Ayaz Muhammad


Khan
Why use it?
• Descriptive Statistics is used in research to
answer five basic questions based on five
key concepts.
Concept: Finding middle scores
• Question: What is the middle set of scores
for any data set?
Concept: Finding the spread of
scores
• Question: How spread out are the scores of
any data set?
Concept: Finding relationships
between variables
• Question: How are different variables
related in this data set?
DESCRIPTIVE STATISTICS

Descriptive statistics are summary measures which


define some important characteristics of data.

1. Measures of location
i. Measures of central tendency
ii. Measures of position
2. Measures of dispersion (variation)
Measures of Location
i. Measures of central tendency
Measures of central tendency are numerical
values that tend to locate in some sense the
middle of a set of data.

• Mean
• Median
• Mode
Measures of central tendency
Mean (Mean)
The mean is the most common measure of
the central tendency and is commonly used
for symetrical distributions. It is used to
summarize quantitative data.
n

Sum of all the observations (  x ) divided


i 1
i

by the number of the observations (n).


n

x
i 1
i
X 
n
Example
Age distribution of seven children attending to
a children clinic is given below
{1,3,6,7,2,3,5}
n 7

x x
i 1
i
i 1
i
1  3  6  7  2  3  5 27
X    
n 7 7 7

X  3,9 years
Feature of Mean

Features:
1. One advantage of the mean over the median is that it
uses all of the information in the data set.
2. it is affected by skewness in the distribution, and by
the presence of outliers in the data.
3. it cannot be used with ordinal data.
How does SPSS Measures Means
Strongly Disagree Undecided Agree Strongly Mean
Disagree Agree
1 2 3 4 5
Group A 13 25 44 19 57 3.51
Frequencies Proportions and
Means
Median
The median is the middle value of the set of
data when the data are ranked in order
according to magnitude.

When the data are put in order 50 % of the


observations are less than or equal to the
median, the rest is greater than the median .

th
Median value is  n 1 observation.
 
 2 
Example
n is odd: 5, 28, 8, 10, 9
Ordered data 5, 8, 9, 10, 28
i =(5+1)/2=3
Median is 3rd value which is 9.

n is even: 19, 20, 17, 27, 6, 21


Ordered data 6, 17, 19, 20, 21, 27
i=(6+1)/2=3.5
Median is halfway between the 3rd and 4th
values, which is 19.5.
The Features of Median

1. The median is that it is not much affected


by skewness in the distribution, or by the
presence of outliers.
2. It discards a lot of information, because it
ignores most of the values, apart from
those in the centre of the distribution.
Mode
The mode is the value of x that occurs most
frequently.

Data {1,3,7,3,2,3,6,7}
• Mode : 3

Data {1,3,7,3,2,3,6,7,1,1}
• Mode : 1 and 3

Data {1,3,7,0,2,-3, 6,5,-1}


• Mode : No mode
Features of the Mode

1. Themode is a measure of common-


ness or typical-ness.
2. The mode is not particularly useful
with metric continuous data where no
two values may be the same
Example
Suppose the age in years of the first 10 subjects enrolled in
your study are:
34, 24, 56, 52, 21, 44, 64, 44, 42, 46

Then the mean age of this group is 42.7 years

To find the median, first order the data:


21, 24, 34, 42, 44, 44, 46, 52, 56, 64

The median is (44+44)/2 = 44 years

The mode is 44 years.


Suppose the next patient enrolls and her age is
97 years.

How does the mean, median and mode change?

Ordered data:
21, 24, 34, 42, 44, 44, 46, 52, 56, 64, 97

Mean is 47.6 47.6


Median is 44 44
Mode is 44 44
Comparison of Mean and Median
• Mean is sensitive to “outliers” (a few very
large or small values), so sometimes mean
does not reflect the quantity desired.
20, 21, 22, 23, 24, 25, 26, 90 x 31,38
87.5% of observations
• Median is “resistant” to outliers
Median = 23.5
• Mean is attractive mathematically.
• 50% of sample is above the median, 50% of
sample is below the median.
2. Measures of dispersion
Range
The range is the simplest measure of dispersion. It
is the difference between the highest valued (H)
and the lowest valued (L) of the observations.
Range= H-L
Sensitive to extreme scores;
Inter Quartile Range
The interquartile range defines the difference between the third
and the first quartile. Quartiles are the partitioned values that
divide the whole series into 4 equal parts. So, there are 3 quartiles.

Interquartile Range Formula


The difference between the upper and lower quartile is known as
the interquartile range. The formula for the interquartile range is
given below
Interquartile range = Upper Quartile – Lower Quartile = Q­3 – Q­1

where Q1 is the first quartile and Q3 is the third quartile of the


series.
Median and Interquartile Range
The median is the middle value of the distribution of the given
data. The interquartile range (IQR) is the range of values that
resides in the middle of the scores. When a distribution is skewed,
and the median is used instead of the mean to show a central
tendency, the appropriate measure of variability is the Interquartile
range.
Q1 – Lower Quartile Part
Q2 – Median
Q3 – Upper Quartile Part
It is a measure of dispersion based on the lower and upper
quartile. Quartile deviation is obtained from interquartile range on
dividing by 2, hence also known as semi interquartile range.
The below figure shows the occurrence of
median and interquartile range for the data
set.
Standard Deviation and Variance
The Standard Deviation is a measure of how spread out
numbers are.

Its symbol is σ (the greek letter sigma for population SD)

The formula is easy: it is the square root of the Variance.

Variance
The average of the squared differences from the Mean.
To calculate the variance follow
these steps:
Work out the Mean (the simple average
of the numbers)
Then for each number: subtract the Mean
and square the result (the squared
difference).
Then work out the average of those
squared differences. (Why Square?)
Measures of dispersion
Standard deviation is the average distance of
observations to arithmetic mean.
n  x  2

 (x i  x) 2
x 2
i 
n
i

s i 1
or s
n 1 n 1

Variance is square of standard deviation.

(
n

 ix  x ) 2  x  2

x  2 i
i
s 2  i 1 or s2  n
n 1 n 1
Step 1 Step 3 Step 4
x (x  x) (x  x)2
x
 x 25
 5
Step 2
6 1 1 n 5
3 -2 4
8 3 9
5 0 0
Step 5 s2 
 ( x  x ) 2
18
 4.5
3 -2 4 n 1 4
25 0 18 s  s 2  4.5 2.12
x (x  x) (x  x)2
1 -4 16 x
 x 25
 5
3 -2 4 n 5
5 0 0
6
10
1
5
1
25 s2 
 (x  x)2

46
11.5
25 0 46 n 1 4

n
s  s 2
 11 .5 3.39
NOTE: The sum of the deviation,  ( xi  x ) , is always zero.
i 1
Comparison of class wise Achievement

Classes Mean Score SD

A 5 2.12

B 5 3.39
Chart Title
6

0
Class A Class B

Mean Score SD
The table compares two classes, A and B, based on their Mean
Score and Standard Deviation (SD).
Both classes have the same mean score of 5, indicating that the
average performance in both classes is equal.
However, the standard deviation differs:
Class A has a lower SD of 2.12, suggesting that the scores are
more closely clustered around the mean, indicating less
variability.
Class B has a higher SD of 3.39, meaning that the scores are
more spread out from the mean, reflecting greater variability
among students' performance in this class.
In summary, even though the average scores are the same,
Class A shows more consistency in performance, while Class
B exhibits a wider range of scores.
Mean, Median and Mode with SPSS
• Normal distribution, also known as the
Gaussian distribution, is a probability
distribution that appears as a "bell curve"
when graphed. The normal distribution
describes a symmetrical plot of data around
its mean value, where the width of the curve
is defined by the standard deviation. The
normal distribution appears as a "bell
curve" when graphed.
• Key Take Aways
• The normal distribution is the proper term
for a probability bell curve.
• In a normal distribution, the mean is zero
and the standard deviation is 1. It has zero
skew and a kurtosis of 3.
• Normal distributions are symmetrical, but
not all symmetrical distributions are
normal.
• Properties of Normal Distribution
• The normal distribution is the most common type
of distribution assumed in technical stock market
analysis. The standard normal distribution has two
parameters: the mean and the standard deviation.
In a normal distribution, mean (average), median
(midpoint), and mode (most frequent observation)
are equal. These values represent the peak or
highest point. The distribution then falls
symmetrically around the mean, the width of
which is defined by the standard deviation.
Skewness of distributions
• When the median and the mean are different, the
distribution is skewed. The greater the difference, the
greater the skew.
• Distributions that trail away to the left are negatively
skewed and those that trail away to the right are positively
skewed
• If the skewness is extreme, the researcher should either
transform the data to make them better resemble a normal
curve or else use a different set of statistics—
nonparametric statistics—to carry out the analysis
Positive and negative Skewness

Mode Median Mean


Mean Median Mode
Kurtosis
• Kurtosis refers to the degree of presence of
outliers (extreme values) in the distribution.

• Kurtosis is a statistical measure, whether


the data is heavy-tailed or light-tailed in a
normal distribution.
Shift Values
Transform Shift Values in name section
Recode into same Variable
Transform Recode into same variable old and new values
Change old value into new value
Check Data Correction

Analysis Descriptive Stat. Frequencies


statistics select minimum & maximum
Convert output in SPSS into APA
Edit Option Pivot Tables Academic Ok
SPSS file export to excel format
Right Click on table Export Change type in excel
Table
Table 9: Problem-Solving

Item SD D N A SA M Std.

Frequency (Percentage)

5 18 65 232 170
I analyze problems thoroughly before seeking solutions. 4.11 .84
(1.0) (3.7) (13.3) 47.3 34.7
4 12 79 227 168
I encourage innovative thinking to solve challenges. 4.11 .86
.8 2.4 16.1 46.3 34.3
3 9 78 240 160
I am adept at finding practical solutions to complex issues. 4.11 .77
.6 1.8 15.9 49.0 32.7
5
I think outside the box to generate innovative solutions to 1.0 8 96 244 137
4.02 .79
problems. 1.6 19.6 49.8 28.0

69
1 10 231 179
I encourage and value creative thinking among team members. 14.1 4.18 .75
.2 2.0 47.1 36.5
Interpretation
• A significant majority (47.3%) strongly agrees that they analyze
problems thoroughly before seeking solutions. The mean score is
4.11, indicating a high level of agreement. The standard deviation
(0.84) is moderate, suggesting some variability in respondents' self-
assessment of their problem analysis skills. A considerable proportion
(46.3%) strongly agrees that they encourage innovative thinking to
solve challenges. The mean score is 4.11, reflecting a positive self-
assessment. The standard deviation (0.86) suggests some variability,
indicating differing perceptions of the extent to which innovative
thinking is encouraged. Almost half of the respondents (49.0%)
strongly agrees that they are adept at finding practical solutions to
complex issues.
Continue
• The mean score is 4.11, indicating a generally positive self-
perception. The standard deviation (0.77) is relatively low, suggesting
a consistent understanding of practical problem-solving abilities. A
significant majority (49.8%) strongly agrees that they think outside
the box to generate innovative solutions to problems. The mean score
is 4.02, reflecting a positive self-assessment. The standard deviation
(0.79) suggests a moderate level of variability in responses,
indicating diverse perceptions of innovative thinking. A substantial
majority (47.1%) strongly agrees that they encourage and value
creative thinking among team members. The mean score is 4.18,
indicating a high level of agreement. The standard deviation (0.75) is
low, suggesting a consistent perception of the encouragement and
value placed on creative thinking.
Exploratory data analysis (EDA)
• Exploratory data analysis (EDA) is used by
data scientists to analyze and investigate
data sets and summarize their main
characteristics, often employing data
visualization methods.
Exploratory Data Analysis
• EDA is primarily used to see what data can reveal beyond
the formal modeling or hypothesis testing task and
provides a provides a better understanding of data set
variables and the relationships between them. It can also
help determine if the statistical techniques you are
considering for data analysis are appropriate. Originally
developed by American mathematician John Tukey in the
1970s, EDA techniques continue to be a widely used
method in the data discovery process today.
BOX PLOTS

Graphs Legacy Dialogs Boxplot


Select Variable and continue

You might also like