Introduction to Statistics
Exercises
December 2020
Below are the exercises that you will work on during the group working sessions, as explained
in the schedule for this module. You need to tackle these questions in your own time before
you join your group for the group work sessions. Ideally you work on a question immediately
after you have watched the relevant parts of the lecture recordings so that you actively engage
with the material.
Here are the points at which you should tackle the different questions: Question 1 after Section
2; Questions 2 and 3 after Section 5; Question 4 after Section 6, Question 5 after Section 7;
Question 6 after Section 8; Questions 7 and 8 after Section 9.
1. Suppose you have been asked to examine whether smoking has negative health conse-
quences. Formulate a precise research question, describe your theory and what testable
hypotheses you might be able to derive from your theory.
2. How could you obtain a random sample of the voters in your country to ask them about
their voting intentions if there was a general election in the next week? What are the
problems that you are going to encounter in this process?
3. Allied commanders during WWII wanted to find ways to make their bombers more
resilient against German anti-aircraft guns. To make progress with this project they
examined all planes returning from Germany over the course of a month for anti-aircraft
gun damage. The resulting dataset showed that planes typically had many bullet holes
in the wings but few in the engine or cockpit area. They concluded from this evidence
that additional steel armour on the wings would be the best way to improve the safety
of bomber crews. Do you agree with their judgement?
4. Consider again the sample mean X̄ which is defined as:
n
1X
X̄ = xi
n i=1
The estimator X̄ can be shown to be an unbiased and consistent estimator of the popula-
tion mean µX . What would be an example of an alternative estimator of the population
mean that is biased but nevertheless consistent? What would be an example of an alter-
native estimator of the population mean that is unbiased but inconsistent?
5. Suppose that scores on an aptitude test used to screen applicants to university courses
have a mean of 500 points. Furthermore assume throughout this question that the esti-
mated standard deviation of test scores is 100 points. An evening school advertises that
1
it can improve the scores of those taking the test by roughly a third of a standard devi-
ation, or 30 points, if they attend a course which runs over three weeks. A statistician
of the consumer protection agency wants to evaluate whether the school is really that
effective.
(a) The consumer protection agency evaluates the claim of the school by sending 50
students to attend the classes of the school. Assume that the 50 students are a
random sample of people who are planning to take the aptitude test. One of the
students becomes sick during the course and drops out. What is the standard error
of the average score of the remaining 49 students?
(b) Assume that after completing the course offered by the school the 49 students take
the aptitude test and score an average of 520 points. Is this convincing evidence
either in favour or against the school’s claim?
(c) How would your conclusions change if 196 students took the course offered by the
school and then scored an average of 520 points on the aptitude test? What is the
reason for this?
6. Consider the quadratic function y = a + bx + cx2 . Suppose that a = 1, b = 2 and c = 3.
Use Excel to work out the shape of the function for values of x between −5 and 5.
7. Use Excel to compute the following changes of a variable y both as a percentage change
and with the log approximation. Does the log approximation over- or underestimate the
true percentage change?
(a) From 98 to 100
(b) From 20 to 22
(c) From 23 to 36
(d) From 54 to 111
(e) From 100 to 272
(f) From 80 to 78
(g) From 80 to 75
(h) From 85 to 40
8. Show that ln(y2 ) − ln(y1 ) ≈ (y2 − y1 )/y1 . What is the key approximation in this proof?