Levels of Measurement in Statistics
Levels of Measurement in Statistics
NOMINAL SCALES
When measuring using a nominal scale, one simply names or categorizes responses. Gender, handedness,
favorite color, and religion are examples of variables measured on a nominal scale. The essential point about
nominal scales is that they do not imply any ordering among the responses. For example, when classifying
people according to their favorite color, there is no sense in which green is placed "ahead of" blue. Responses
are merely categorized. Nominal scales embody the lowest level of measurement.
ORDINAL SCALES
A researcher wishing to measure consumers' satisfaction with their microwave ovens might ask them to specify
their feelings as either "very dissatisfied," "somewhat dissatisfied," "somewhat satisfied," or "very satisfied."
The items in this scale are ordered, ranging from least to most satisfied. This is what distinguishes ordinal from
nominal scales. Unlike nominal scales, ordinal scales allow comparisons of the degree to which two subjects
possess the dependent variable. For example, our satisfaction ordering makes it meaningful to assert that one
person is more satisfied than another with their microwave ovens. Such an assertion reflects the first person's
use of a verbal label that comes later in the list than the label chosen by the second person.
On the other hand, ordinal scales fail to capture important information that will be present in the other
scales we examine. In particular, the difference between two levels of an ordinal scale cannot be assumed to be
the same as the difference between two other levels. In our satisfaction scale, for example, the difference
between the responses "very dissatisfied" and "somewhat dissatisfied" is probably not equivalent to the
difference between "somewhat dissatisfied" and "somewhat satisfied." Nothing in our measurement procedure
allows us to determine whether the two differences reflect the same difference in psychological satisfaction.
Statisticians express this point by saying that the differences between adjacent scale values do not necessarily
1
Course: Educational Statistics (8614)
Semester: Autumn, 2020
represent equal intervals on the underlying scale giving rise to the measurements. (In our case, the underlying
scale is the true feeling of satisfaction, which we are trying to measure.)
What if the researcher had measured satisfaction by asking consumers to indicate their level of satisfaction
by choosing a number from one to four? Would the difference between the responses of one and two necessarily
reflect the same difference in satisfaction as the difference between the responses two and three? The answer is
No. Changing the response format to numbers does not change the meaning of the scale. We still are in no
position to assert that the mental step from 1 to 2 (for example) is the same as the mental step from 3 to 4.
INTERVAL SCALES
Interval scales are numerical scales in which intervals have the same interpretation throughout. As an example,
consider the Fahrenheit scale of temperature. The difference between 30 degrees and 40 degrees represents the
same temperature difference as the difference between 80 degrees and 90 degrees. This is because each 10-
degree interval has the same physical meaning (in terms of the kinetic energy of molecules).
Interval scales are not perfect, however. In particular, they do not have a true zero point even if one of the
scaled values happens to carry the name "zero." The Fahrenheit scale illustrates the issue. Zero degrees
Fahrenheit does not represent the complete absence of temperature (the absence of any molecular kinetic
energy). In reality, the label "zero" is applied to its temperature for quite accidental reasons connected to the
history of temperature measurement. Since an interval scale has no true zero point, it does not make sense to
compute ratios of temperatures. For example, there is no sense in which the ratio of 40 to 20 degrees Fahrenheit
is the same as the ratio of 100 to 50 degrees; no interesting physical property is preserved across the two ratios.
After all, if the "zero" label were applied at the temperature that Fahrenheit happens to label as 10 degrees, the
two ratios would instead be 30 to 10 and 90 to 40, no longer the same! For this reason, it does not make sense to
say that 80 degrees is "twice as hot" as 40 degrees. Such a claim would depend on an arbitrary decision about
where to "start" the temperature scale, namely, what temperature to call zero (whereas the claim is intended to
make a more fundamental assertion about the underlying physical reality).
RATIO SCALES
The ratio scale of measurement is the most informative scale. It is an interval scale with the additional property
that its zero position indicates the absence of the quantity being measured. You can think of a ratio scale as the
three earlier scales rolled up in one. Like a nominal scale, it provides a name or category for each object (the
numbers serve as labels). Like an ordinal scale, the objects are ordered (in terms of the ordering of the
numbers). Like an interval scale, the same difference at two places on the scale has the same meaning. And in
addition, the same ratio at two places on the scale also carries the same meaning.
The Fahrenheit scale for temperature has an arbitrary zero point and is therefore not a ratio scale. However,
zero on the Kelvin scale is absolute zero. This makes the Kelvin scale a ratio scale. For example, if one
2
Course: Educational Statistics (8614)
Semester: Autumn, 2020
temperature is twice as high as another as measured on the Kelvin scale, then it has twice the kinetic energy of
the other temperature.
Another example of a ratio scale is the amount of money you have in your pocket right now (25 cents, 55
cents, etc.). Money is measured on a ratio scale because, in addition to having the properties of an interval scale,
it has a true zero point: if you have zero money, this implies the absence of money. Since money has a true zero
point, it makes sense to say that someone with 50 cents has twice as much money as someone with 25 cents (or
that Bill Gates has a million times more money than you do).
Q.2 Differentiate between primary and secondary data. Give meaningful examples with explanation.
Primary data
An advantage of using primary data is that researchers are collecting information for the specific purposes of
their study. In essence, the questions the researchers ask are tailored to elicit the data that will help them with
their study. Researchers collect the data themselves, using surveys, interviews and direct observations.
In the field of workplace health research, for example, direct observations may involve a researcher watching
people at work. The researcher could count and code the number of times she sees practices or behaviors
relevant to her interest; e.g. instances of improper lifting posture or the number of hostile or disrespectful
interactions workers engage in with clients and customers over a period of time.
To take another example, let’s say a research team wants to find out about workers’ experiences in return to
work after a work-related injury. Part of the research may involve interviewing workers by telephone about how
long they were off work and about their experiences with the return-to-work process. The workers’ answers–
considered primary data–will provide the researchers with specific information about the return-to-work
process; e.g. they may learn about the frequency of work accommodation offers, and the reasons some workers
refused such offers.
It is the information taken first-hand by the researchers, who collects the data and completes the study process
without referring to any second-hand sources. Large companies with good funding prospects can perform
primary research such as surveys, face-to-face interviews, social media surveys/polls/feedback, analysis of
customer feedback, getting response via email etc.
Advantages
Research is oriented for specific goals and purpose, cutting out possibility of wasting resources. The researchers
can change the course of study whenever needed, and choose platforms for observation well-suited for projects.
Gives original research quality, and does not carry bias or opinions of third parties.
Secondary data
There are several types of secondary data. They can include information from the national population census
and other government information collected by Statistics Canada. One type of secondary data that’s used
increasingly is administrative data. This term refers to data that is collected routinely as part of the day-to-day
3
Course: Educational Statistics (8614)
Semester: Autumn, 2020
operations of an organization, institution or agency. There is any number of examples: motor vehicle
registrations, hospital intake and discharge records, workers’ compensation claims records, and more.
Compared to primary data, secondary data tends to be readily available and inexpensive to obtain. In addition,
administrative data tends to have large samples, because the data collection is comprehensive and routine.
What’s more, administrative data (and many types of secondary data) are collected over a long period. That
allows researchers to detect change over time.
Going back to the return-to-work study mentioned above, the researchers could also examine secondary data in
addition to the information provided by their primary data (i.e. survey results). They could look at workers’
compensation lost-time claims data to determine the amount of time workers were receiving wage replacement
benefits. With a combination of these two data sources, the researchers may be able to determine which factors
predict a shorter work absence among injured workers. This information could then help improve return to work
for other injured workers.
The type of data researchers choose can depend on many things including the research question, their budget,
their skills and available resources. Based on these and other factors, they may choose to use primary data,
secondary data–or both.
It is the information that someone has already researched on, prepared, and analyzed. The results are available
for use, and can help other future researchers in referring the data for studies. Some of the examples of
secondary researches are government consensus, public agency annual reports, magazines, newspapers,
journals, online databases etc.
Advantages
Cost-effect, readymade observations, less time spent on gathering information. Statistically reliable, less
requirement of expertise from internal team. Trustable and ethical practices existing to support or organize other
researches.
Q.3 Explain advantages and disadvantages of bar charts and scatter plot.
What Is a Bar Graph?
A bar graph represents data using a series of bars across two axes. The x-axis (the horizontal) classifies the
data by group, with one bar for each group. So for example, if you were displaying the number of beads of
each color in a jar, the x-axis would have a section for each color, and each color would have its own bar.
The y-axis (the vertical) shows the value for the category for each bar. In the bead example, this would be the
number of beads. So the bar for green beads might extend up to five, for example, whereas the bar for red
beads may extend up to only two. The y-axis can be many different values, though; for example, money, a
growth rate, an average speed or even a percentage of the whole. Similarly, the x-axis values and bars could
represent the same quantity at different points in time, and this capability shows a big difference between bar
graphs and pie charts.
4
Course: Educational Statistics (8614)
Semester: Autumn, 2020
What Is a Pie Chart?
Pie charts are circular graphs that display percentages of a whole as if they were slices of a pie. This is very
similar to bar graphs in that the individual slices of pie mean that the data has to have the potential to be
categorized. The “slices” of the pie have sizes to indicate the proportion of the whole they represent (although
a legend beside the chart usually shows the precise figures), but unlike bar graphs, pie charts can’t be used to
explicitly show absolute number values for each group. The shape is the most obvious difference between pie
charts and bar graphs, but the restriction to proportions with pie charts is the most important.
When to Use a Bar Graph
Bar graphs have a wide range of uses, many more than pie charts. The flexibility of a bar chart allows you to
use it to present percentages, totals, counts and many other things, provided you can find a reasonable way to
group the contents of the x-axis, whether by category or by time (e.g., one bar per year or per month). Unless
you have a specific reason to choose a pie chart, then a bar chart is probably the better choice. For example, if
you were showing the results for a class president election in school, each candidate would have his or her
own bar on the x-axis and the values on the y-axis would be number of votes the candidate received. If you
were showing the revenues of various companies, you could use a bar chart with a bar for each company and
the length corresponding to its revenue in dollars. In both of these cases, you can easily see at a glance the
category (in the examples, the candidate or company) that has the highest value (in votes or dollars, in the
example), and the graph conveys the main information in a straightforward and easy-to-interpret fashion.
When to Use a Pie Chart
Pie charts are less likely to be useful because they display proportions of a whole, and when the proportions
are close to each other, it can be difficult to determine if a specific slice is bigger than another. However, you
can use pie charts when the proportions are important in your data and especially if the proportions are
substantially different. If you have a specific point to make – for example, showing that a certain household
expense makes up over half of your outgoings – then a pie chart can be the best way to do so clearly. For
example, a pie chart works well for displaying a breakdown of sales for each item for a business. Your total
sales are the whole “pie,” but the slices tell you how much each product contributes. You might sell fruit, for
example, and a pie chart of the different types of fruit you sell shows that apples make up the largest chunk of
your sales, followed by bananas. If you were planning a regular school social event for some fellow students,
you could use a bar chart to show how your budget was being spent. For example, if you provide snacks, you
might find that 20 percent of your total spending each month is on food, and if the smaller slices of the pie
include more important expenses, the chart could make it clear that more of the budget should be spent on
those and less on food.
Bar graphs are a GUI visual for executive or higher management. Either used in a Presentation, report or
“macro” metric to management or other interested party. It’s not meant to be “micro” or granular. Pros and
5
Course: Educational Statistics (8614)
Semester: Autumn, 2020
Cons: Bar graphs are they at the same time both Pro and Con, they are both limited to X-axis and Y-axis
displays, so it’s 2 dimensional, E.g. it can show time based progression, or progression vs. volume
Pros: great for summary, Macro, mostly for management or executives
Cons: Macro not Micro, not granular, 2 dimensional, X-axis, Y-axis based, not for engineers or technical, for
that use spreadsheets which are ‘micro’ and granular.
Pie Cons
First, let us concede to few and his cohort the point that unlabeled pie charts are poor tools for conveying
precise metrics. Take the example below: without the labels, it would require real mental exertion to even
approximate the area percentage of each slice, and unless the labels include the category values, you have to
glance back and forth between chart and legend to get the full picture.
Pie Pros
But even Few admits that there’s one thing the pie chart does better than any other visualization, and that is
convey the part-to-whole relationship. While bar charts do show the hierarchical relationship between values
better than pies do, they make the viewer work to picture the whole. Bruce Gabrielle’s 5 comparison below says
it best. First, looking at the bar chart, it’s unclear at a glance that together the bars add up to 100%. Coming to
this conclusion would require good approximation skills and mental arithmetic. The pie chart, on the other
hand, clearly shows that companies B and C are dominating the market, with Company B controlling more than
companies F, E, A, and D combined. The only value of interest in this case is the 65%, which is conveyed in the
chart title; the pie isn’t telling a story about Company A, and so its precise value needn’t be conveyed. If this
visualization’s intended message were, rather, that Company a controls more of the market than Company E,
then a pie chart would be a poor tool for the purpose, since those slices are so similar in size. Gabrielle also cites
people’s affinity for rounded edges as a reason to use the pie chart, but basing the virtues of a data visualization
on the shifting sands of popular preference seems risky to us, so we’re going to stick with the utilitarian reasons
for supporting the pie chart. Still, Apple continues to capitalize on this love of roundness, so let’s not
completely ignore the pie chart’s ability to add diversity of shape to dashboards.
Q.4 Explain normal distribution. How does normality of data affect the analysis of data?
Normal distribution, also known as the Gaussian distribution, is a probability distribution that is symmetric
about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.
In graph form, normal distribution will appear as a bell curve. The normal distribution is the most common type
of distribution assumed in technical stock market analysis and in other types of statistical analyses. The standard
normal distribution has two parameters: the mean and the standard deviation. For a normal distribution, 68% of
the observations are within +/- one standard deviation of the mean, 95% are within +/- two standard deviations,
and 99.7% are within +- three standard deviations. The normal distribution model is motivated by the Central
Limit Theorem. This theory states that averages calculated from independent, identically distributed random
variables have approximately normal distributions, regardless of the type of distribution from which the
6
Course: Educational Statistics (8614)
Semester: Autumn, 2020
variables are sampled (provided it has finite variance). Normal distribution is sometimes confused
with symmetrical distribution. Symmetrical distribution is one where a dividing line produces two mirror
images, but the actual data could be two humps or a series of hills in addition to the bell curve that indicates a
normal distribution.
Skewness and Kurtosis
Real life data rarely, if ever, follow a perfect normal distribution. The skewness and kurtosis coefficients
measure how different a given distribution is from a normal distribution. The skewness measures the symmetry
of a distribution. The normal distribution is symmetric and has a skewness of zero. If the distribution of a data
set has a skewness less than zero, or negative skewness, then the left tail of the distribution is longer than the
right tail; positive skewness implies that the right tail of the distribution is longer than the left.
The kurtosis statistic measures the thickness of the tail ends of a distribution in relation to the tails of the normal
distribution. Distributions with large kurtosis exhibit tail data exceeding the tails of the normal distribution (e.g.,
five or more standard deviations from the mean). Distributions with low kurtosis exhibit tail data that is
generally less extreme than the tails of the normal distribution. The normal distribution has a kurtosis of three,
which indicates the distribution has neither fat nor thin tails. Therefore, if an observed distribution has a kurtosis
greater than three, the distribution is said to have heavy tails when compared to the normal distribution. If the
distribution has a kurtosis of less than three, it is said to have thin tails when compared to the normal
distribution. The assumption of a normal distribution is applied to asset prices as well as price action. Traders
may plot price points over time to fit recent price action into a normal distribution. The further price action
moves from the mean, in this case, the more likelihood that an asset is being over or undervalued. Traders can
use the standard deviations to suggest potential trades. This type of trading is generally done on very short time
frames as larger timescales make it much harder to pick entry and exit points. Similarly, many statistical
theories attempt to model asset prices under the assumption that they follow a normal distribution. In reality,
price distributions tend to have fat tails, and, therefore, have kurtosis greater than three. Such assets have had
price movements greater than three standard deviations beyond the mean more often than would be expected
under the assumption of a normal distribution. Even if an asset has went through a long period where it fits a
normal distribution, there is no guarantee that the past performance truly informs the future prospects.
SKEWNESS
Skewness is usually described as a measure of a dataset’s symmetry – or lack of symmetry. A perfectly
symmetrical data set will have a skewness of 0. The normal distribution has a skewness of 0. The skewness is
defined as (Advanced Topics in Statistical Process Control,
7
Course: Educational Statistics (8614)
Semester: Autumn, 2020
where n is the sample size, Xi is the ith X value, X is the average and s is the sample standard deviation. Note
the exponent in the summation. It is “3”. The skewness is referred to as the “third standardized central moment
for the probability model.”
Most software packages use a formula for the skewness that takes into account sample size:
This sample size formula is used here. It is also what Microsoft Excel uses. The difference between the two
formula results becomes very small as the sample size increases.
Figure 1 is a symmetrical data set. It was created by generating a set of data from 65 to 135 in steps of 5 with
the number of each value as shown in Figure 1. For example, there are 3 65's, 6 70's, 9 75's, etc.
KURTOSIS
Kurtosis was originally thought to be a measure the “peakedness” of a distribution. However, since the central
portion of the distribution is virtually ignored by this parameter, kurtosis cannot be said to measure peakedness
directly. While there is a correlation between peakedness and kurtosis, the relationship is an indirect and
imperfect one at best.
Dr. Wheeler defines kurtosis as:
“The kurtosis parameter is a measure of the combined weight of the tails relative to the rest of the distribution.”
So, kurtosis is all about the tails of the distribution – not the peakedness or flatness. It measures the tail-
heaviness of the distribution. Kurtosis is defined as:
where n is the sample size, Xi is the ith X value, X is the average and s is the sample standard deviation. Note
the exponent in the summation. It is “4”. The kurtosis is referred to as the “fourth standardized central moment
for the probability model.”
Here is where is gets a little tricky. If you use the above equation, the kurtosis for a normal distribution is 3.
Most software packages (including Microsoft Excel) use the formula below.
This formula does two things. It takes into account the sample size and it subtracts 3 from the kurtosis. With
this equation, the kurtosis of a normal distribution is 0. This is really the excess kurtosis, but most software
packages refer to it as simply kurtosis. The last equation is used here. So, if a dataset has a positive kurtosis, it
has more in the tails than the normal distribution. If a dataset has a negative kurtosis, it has less in the tails than
the normal distribution. Since the exponent in the above is 4, the term in the summation will always be positive
8
Course: Educational Statistics (8614)
Semester: Autumn, 2020
– regardless of whether Xi is above or below the average. Xi values close to the average contribute very little to
the kurtosis. The tail values of Xi contribute much more to the kurtosis.
The kurtosis decreases as the tails become lighter. It increases as the tails become heavier. Figure 4 shows an
extreme case. In this dataset, each value occurs 10 times. The values are 65 to 135 in increments of 5. The
kurtosis of this dataset is -1.21. Since this value is less than 0, it is considered to be a “light-tailed” dataset. It
has as much data in each tail as it does in the peak. Note that this is a symmetrical distribution, so the skewness
is zero.
Q.5 How is mean different from median? Explain the role of level of measurement in measure of central
tendency.
A measure of central tendency is a single value that attempts to describe a set of data by identifying the central
position within that set of data. As such, measures of central tendency are sometimes called measures of central
location. They are also classed as summary statistics. The mean (often called the average) is most likely the
measure of central tendency that you are most familiar with, but there are others, such as the median and the
mode.
The mean, median and mode are all valid measures of central tendency, but under different conditions, some
measures of central tendency become more appropriate to use than others. In the following sections, we will
look at the mean, mode and median, and learn how to calculate them and under what conditions they are most
appropriate to be used.
Mean (Arithmetic)
The mean (or average) is the most popular and well known measure of central tendency. It can be used with
both discrete and continuous data, although its use is most often with continuous data (see our Types of
Variable guide for data types). The mean is equal to the sum of all the values in the data set divided by the
number of values in the data set. So, if we have n values in a data set and they have values x1,x2, …,xn, the
sample mean, usually denoted by x¯ (pronounced "x bar"), is:
x¯=x1+x2+⋯+xnn
This formula is usually written in a slightly different manner using the Greek capitol letter, ∑, pronounced
"sigma", which means "sum of...":
x¯=∑xn
You may have noticed that the above formula refers to the sample mean. So, why have we called it a sample
mean? This is because, in statistics, samples and populations have very different meanings and these differences
are very important, even if, in the case of the mean, they are calculated in the same way. To acknowledge that
we are calculating the population mean and not the sample mean, we use the Greek lower case letter "mu",
denoted as μ:
μ=∑xn
9
Course: Educational Statistics (8614)
Semester: Autumn, 2020
The mean is essentially a model of your data set. It is the value that is most common. You will notice, however,
that the mean is not often one of the actual values that you have observed in your data set. However, one of its
important properties is that it minimises error in the prediction of any one value in your data set. That is, it is the
value that produces the lowest amount of error from all other values in the data set.
An important property of the mean is that it includes every value in your data set as part of the calculation. In
addition, the mean is the only measure of central tendency where the sum of the deviations of each value from
the mean is always zero.
When not to use the mean
The mean has one main disadvantage: it is particularly susceptible to the influence of outliers. These are values
that are unusual compared to the rest of the data set by being especially small or large in numerical value. For
example, consider the wages of staff at a factory below:
Staff 1 2 3 4 5 6 7 8 9 10
Salary 15k 18k 16k 14k 15k 15k 12k 17k 90k 95k
The mean salary for these ten staff is $30.7k. However, inspecting the raw data suggests that this mean value
might not be the best way to accurately reflect the typical salary of a worker, as most workers have salaries in
the $12k to 18k range. The mean is being skewed by the two large salaries. Therefore, in this situation, we
would like to have a better measure of central tendency. As we will find out later, taking the median would be a
better measure of central tendency in this situation.
Another time when we usually prefer the median over the mean (or mode) is when our data is skewed (i.e., the
frequency distribution for our data is skewed). If we consider the normal distribution - as this is the most
frequently assessed in statistics - when the data is perfectly normal, the mean, median and mode are identical.
Moreover, they all represent the most typical value in the data set. However, as the data becomes skewed the
mean loses its ability to provide the best central location for the data because the skewed data is dragging it
away from the typical value. However, the median best retains this position and is not as strongly influenced by
the skewed values. This is explained in more detail in the skewed distribution section later in this guide.
Median
The median is the middle score for a set of data that has been arranged in order of magnitude. The median is
less affected by outliers and skewed data. In order to calculate the median, suppose we have the data below:
65 55 89 56 35 14 56 55 87 45 92
We first need to rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89 92
Our median mark is the middle mark - in this case, 56 (highlighted in bold). It is the middle mark because there
are 5 scores before it and 5 scores after it. This works fine when you have an odd number of scores, but what
10
Course: Educational Statistics (8614)
Semester: Autumn, 2020
happens when you have an even number of scores? What if you had only 10 scores? Well, you simply have to
take the middle two scores and average the result. So, if we look at the example below:
65 55 89 56 35 14 56 55 87 45
We again rearrange that data into order of magnitude (smallest first):
14 35 45 55 55 56 56 65 87 89
Only now we have to take the 5th and 6th score in our data set and average them to get a median of 55.5.
Mode
The mode is the most frequent score in our data set. On a histogram it represents the highest bar in a bar chart or
histogram. You can, therefore, sometimes consider the mode as being the most popular option. An example of a
mode is presented below:
Normally, the mode is used for categorical data where we wish to know which is the most common category, as
illustrated below:
11
Course: Educational Statistics (8614)
Semester: Autumn, 2020
We can see above that the most common form of transport, in this particular data set, is the bus. However, one
of the problems with the mode is that it is not unique, so it leaves us with problems when we have two or more
values that share the highest frequency, such as below:
We are now stuck as to which mode best describes the central tendency of the data. This is particularly
problematic when we have continuous data because we are more likely not to have any one value that is more
frequent than the other. For example, consider measuring 30 peoples' weight (to the nearest 0.1 kg). How likely
12
Course: Educational Statistics (8614)
Semester: Autumn, 2020
is it that we will find two or more people with exactly the same weight (e.g., 67.4 kg)? The answer, is probably
very unlikely - many people might be close, but with such a small sample (30 people) and a large range of
possible weights, you are unlikely to find two people with exactly the same weight; that is, to the nearest 0.1 kg.
This is why the mode is very rarely used with continuous data.
Another problem with the mode is that it will not provide us with a very good measure of central tendency when
the most common mark is far away from the rest of the data in the data set, as depicted in the diagram below:
In the above diagram the mode has a value of 2. We can clearly see, however, that the mode is not
representative of the data, which is mostly concentrated around the 20 to 30 value range. To use the mode to
describe the central tendency of this data set would be misleading.
Skewed Distributions and the Mean and Median
We often test whether our data is normally distributed because this is a common assumption underlying many
statistical tests. An example of a normally distributed set of data is presented below:
13
Course: Educational Statistics (8614)
Semester: Autumn, 2020
When you have a normally distributed sample you can legitimately use both the mean or the median as your
measure of central tendency. In fact, in any symmetrical distribution the mean, median and mode are equal.
However, in this situation, the mean is widely preferred as the best measure of central tendency because it is the
measure that includes all the values in the data set for its calculation, and any change in any of the scores will
affect the value of the mean. This is not the case with the median or mode.
However, when our data is skewed, for example, as with the right-skewed data set below:
We find that the mean is being dragged in the direct of the skew. In these situations, the median is generally
considered to be the best representative of the central location of the data. The more skewed the distribution, the
greater the difference between the median and mean, and the greater emphasis should be placed on using the
median as opposed to the mean. A classic example of the above right-skewed distribution is income (salary),
where higher-earners provide a false representation of the typical income if expressed as a mean and not a
median.
If dealing with a normal distribution, and tests of normality show that the data is non-normal, it is customary to
use the median instead of the mean. However, this is more a rule of thumb than a strict guideline. Sometimes,
researchers wish to report the mean of a skewed distribution if the median and mean are not appreciably
different (a subjective assessment), and if it allows easier comparisons to previous research to be made.
14