Research Design
CORDA3
Quantitative data analysis
Eduvos (Pty) Ltd (formerly Pearson Institute of Higher Education) is registered with the Department of Higher Education and Training as a private higher education institution under the
Higher Education Act, 101, of 1997. Registration Certificate number: 2001/HE07/008
Chapter 15: Quantitative data analysis
Why do we need statistics?
Hypotheses
The null hypothesis
The alternative hypothesis
Data sets
Units, the sample and the population
Describing the data
Range
Central points
The spread of the data
The correlation coefficient
Probability
Drawing conclusions from data
Hypothesis testing
Presenting data
Summary
Needs of statistics
Applying statistical analysis to a set of data removes the
guesswork from the interpretation of data. Objective and
defensible conclusions can then be drawn from the
results of the analysis. Statistics consists of a set of
mathematical techniques to analyse a set of data.
Objective in this context means devoid of bias.
Defensible in this context means that the results show a
statistically significant difference between the status quo,
or an existing set of conditions (in this case, the use of
white plaster casts), and our proposed alternative position
(the use of red plaster casts).
Statistical Analysis
Inferential Statistics
Descriptive statistics
– measures of central tendency, measures of
dispersion, correlation coefficient, tables and
graphic presentation
Hypotheses
Hypotheses are statements or proposed explanations
made on the basis of limited evidence as a starting point
for further investigation.
The null hypothesis is the statement supporting the
status quo. The null hypothesis is denoted by the symbol
H0
The alternative hypothesis is a statement supporting a
change in the status quo, such as supporting a new
discovery or better technique — the game changer. The
alternative hypothesis is indicated by the symbol Ha or,
if we have more than one alternative hypothesis, by H1,
H2 and so on
DATA SETS
A data set is a collection of data. It consists of separate
units that make up the entire set.
For example, if we wanted to improve people’s future
health, we could ask different individuals to provide us
with information about their eating and exercise habits,
the number of hours of sleep they get per night, how
often they go on holiday, and so on. If we put all the
information we collect from all the different individuals
together, we have a data set
Units, the sample and the population
Units of measurement are associated with all numerical
measurements. The units of the International System of
Units, known as SI units, are often preferred.
Determining how to conduct a study, including
determining what sample size is sufficiently representative
of the population, may be considered a branch of statistics
in itself.
Describing the data
Descriptive statistics summarises the data and allows
some basic questions to be answered, for example:
What is the range of the data? This could refer to the
maximum and minimum value, for example: whose bones
healed the fastest and whose healed the slowest?
What is the central point of the data set? This can be
determined in terms of the mean (average), the median
(the middle value of a list) or the mode (the value that
occurs the most frequently).
Refresher
The "mean" is the "average" you're used to, where
you add up all the numbers and then divide by the
number of numbers.
The "median" is the "middle" value in the list of
numbers. ... If no number in the list is repeated, then
there is no mode for the list.
The correlation coefficient
Sometimes, we want to find out whether two values are
related in some way To examine such a possible
relationship, we calculate the correlation coefficient (often
also called Pearson’s coefficient) as shown below. The
correlation coefficient is represented by the symbol r.
The value of the correlation coefficient is always between
-1 and +1. If your answer is not between -1 and +1, you
have made a calculation error. The closer the number is to
-1 or +1, the stronger the correlation is.
The spread of the data
The spread of the data may be expressed in many ways.
The most useful and most common way of expressing the
spread of data is through the standard deviation.
The standard deviation represents the average distance
that the data values vary from the mean.
If the standard deviation is low, it means all the results are
close to the mean. If it is high, it means the numbers are
far away from the mean. To calculate the standard
deviation, we take each value and subtract the mean from
each.
A negative number indicates a negative correlation. This
means that if you increase the independent variable, the
dependent variable decreases. For example, the more
recreational drugs a student uses, the lower her or his test
scores will be.
A positive number indicates a positive correlation. In this
case, when the independent variable increases, the
dependent variable also increases. For example, the more
a student studies, the higher her or his test score will be.
Probability
Probability is the likelihood of a particular event
occurring. Mathematically, probabilities are stated to
be in the range of values from zero to one. The higher
the value, the higher the probability.
For example, we may be interested in knowing what
the probability is of getting heads or tails when we
toss a coin. If we think of tossing a coin, we know
intuitively that a toss will result in a head 50% of the
time (probability of 0.5), and a tail 50% of the time
(probability of 0.5).
Drawing conclusions from data
Once a data set has been gathered, we can use it to draw
conclusions or draw inferences from the data, using
inferential statistics.
For example, if we want to know what the world’s
population will be in ten years’ time, we would use
inferential statistical techniques to arrive at an estimate.
Such an estimate would be based on our knowledge of
previous years’ population growth, models of population
growth and projections. Gaining new knowledge requires
us to draw inferences from sets of data — both old and
newly gathered data, and data that is the result of our
own work or that of others.
Hypothesis testing
In the vast majority of the world’s legal systems, innocence
is presumed and the burden of proof lies with the
prosecution; in other words, they have to disprove the
accused person’s innocence.
In hypotheses testing, the null hypothesis is presumed to be
true and the ‘burden of proof ’ lies in attempting to
disprove the null hypothesis. We will follow an easy four
step process for testing hypotheses.
Presenting data
Graphs are an effective and, generally, universally
understood means of conveying trends (an upwards
slope means that something is increasing, and vice
versa).
It is generally not sufficient to present graphics and
expect a reader to gain the understanding we expect
them to immediately. Explanatory text explicitly stating
the intended meaning is required in order to ensure that
the reader has understood the point being made.
Summary
Statistics provide us with a set of accepted methods for
analysing sets of data. The two primary types of analysis
allow us to characterise data and to draw conclusions
from that data.
Descriptive statistics allow us to develop summaries and
to determine various key characteristics of the data, while
inferential statistics provide us with means of drawing
conclusions about populations based on a sample of data.
Inferential statistics attempt to determine whether a
particular observed outcome is the result of randomness
(supporting our understanding of existing knowledge) or
whether we have discovered something new, supported
by our gathered data. The accepted method for making
such a discovery is to test hypotheses using the following
four steps:
1. State the hypotheses.
2. Set the decision criteria.
3. Compute the test statistic.
4. Record the decision.
End
Q and A