Advanced Statistics Study Guide
Advanced Statistics Study Guide
Topics in inference will be refreshed in the first week of the course, but at a fairly quick pace, while
introducing some new elements at the same time.
The first five chapters of O&L largely cover most of the prior knowledge required for this course. So,
going through these chapters may refresh your memory. Going through part 3of this introduction
should also help you to be prepared.
If you have little prior knowledge of statistics (left), and a refresher from these chapters did not
help, you are well advised to postpone this course and first follow one of the basic courses in
statistics at this university (Basic Statistics or (in Dutch) S2).
This course focuses upon the ideas behind the statistical methodology and on the applications.
Mathematics is kept at a minimum. The learning outcomes of the course are presented in a file on
Blackboard (LearningGoals.doc). They are also found for the various topics in the “Check yourself
boxes” in this Guide. The general learning goals and the way they are tested is given below in the
table “Assessment strategy”.
1.5 Examination
The written exam is a mix of open questions and multiple choice questions. It lasts 3 hours.
If there is proof (from attention lists) that you attended the computer practicals, obtaining 45 credits
out of 90 in the exam will result in a pass. Unrounded mark is: 1+Credits/10.
It is allowed to bring to the exam: a pocket calculator (or graphics calculator), a dictionary, the study
guide, the book of O&L, and a hand-written summary of your own making of at most two pages, to
the exam. Powerpoints and telephone are typically forbidden.
Written
exam
When we are interested in more than two population means or treatments, e.g. we compare four diets,
we will often use a class of models and associated statistical methods referred to as ANOVA (analysis
of variance - situations 4, 5, 6).When e.g. diets may work out differently for male and female animals,
we have a so-called factorial structure, with diet and gender as experimental factors. In that case it is
profitable to introduce the concepts of main effects and interactions (in two-way ANOVA)
When interested in the relationship between two numerical variables, we use correlation analysis or
simple linear regression (situation 7). If interest is in, e.g., predicting the percentage lean meat of a
pig carcass from several carcass measurements taken in the slaughter line, we use a class of models
and associated statistical methods referred to as multiple linear regression (situation 8). Both analysis
of variance and linear regression are part of a wider class of models referred to as the (general) linear
model. A large part of this course will be devoted to applications of the linear model. A special case is
the situation of a qualitative factor and a quantitative factor (situation 9).
In all these cases we will use statistical models, with specific model assumptions, usually comprising
the assumption that the data are independent observations from (a) Normal distribution(s), all with the
same variation, e.g. irrespective of the treatment applied. Model assumptions have to be checked.
When data are clearly not from a Normal distribution, other models and methods of analysis are
required. We will for instance discuss analysis of binary data (situations 10, 11): an individual is
diseased or healthy, an electronic circuit is functioning or not, … etc. This actually amounts to
inference on probabilities, e.g. how do different hygienic measures at farms affect the probability for
an animal to be diseased. Analysis of categorical data, e.g. data collected in contingency tables, also
involve inference on probabilities, and will be discussed as well (situations 12, 13, 14).
In fairly simple and straightforward situations (1a, 2a, 3a, 4a), we may perform inference based on
rank numbers rather than on the actual data, to relax the assumption of Normality, when this
assumption is in doubt. Some well-known tests based on rank numbers will be discussed.
The first table below describes situations in which the observations on the response are assumed to be
random drawings from (a) Normal distribution(s). The second table describes situations in which the
Normality assumption is not used. At first reading, at the start of this course, these tables may seem
abstract and of little practical use. But later on they may help to fix the different techniques more
firmly in your mind. The tables also serve as a table of contents for this course. Make sure you
regularly consult them for an overview of the material.
WUR-Biometris Advanced Statistics 4
I Situations with data on (a) quantitative response variable(s), assuming Normal distribution(s)
Situation description / Model Parameter of interest / Questions Inference* Name of test / Type of test or procedure Lecture O&L
1 1 random sample, 1 quantitative response y Population mean μ E, CI, T one-sample t-test 1 5.3 –
Model: yi´s independent, yi ~ N(μ, σ) , i = 1,..,n Population standard deviation σ E or calculation of a CI for μ 5.7
2 1 random sample, 2 quantitative responses x and μd (= μx - μy) E, CI, T Paired sample t-test, equivalent to a one- 1 6.4, 6.6
y, paired data d= x-y, d ~ N(μd, σd), indep. di’s sample t-test for μd
situation 1 applied to d (instead of y). σd E or calculation of a CI for μd
3 2 independent random samples (1 per population), μ1 - μ2 E, CI, T two independent samples t-test or 2 6.2, 6.6
with quantitative y, or CRD** with 2 treatments calculation of a CI for μ1 – μ2
Model: y1j and y2j are all independent σ1 and σ2 , usually a common σ is assumed E
y1j ~ N(μ1, σ1), j=1,..n1 , y2j ~ N(μ2, σ2), j=1,..n2
4 1 quantitative response y, 1 qualitative factor All means equal? T F-test for the factor / the model 8 Ch8,
(random samples from t sub-populations or CRD with μi E, CI, T t-procedure*** 9.4,
t treatments (t >2). 1-way ANOVA model: μi – μj E, CI, T t-procedure; ; for all pairs: LSD / Tukey 14.2
yij ~ N(μi, σ), i=1, .., t; j=1,.. n i or
yij = μi + εij = μ+τi+ εij , with εij’s indep N(0, σ) σ (Assumed) common st. deviation E
5 Experiment using RCBD**, b blocks, t treatments Treatment effect? Block effect ? T F-test for treatment / block 10 15.1,
(two-way ANOVA model without interaction) Treatment differences (pairs) μi.-μj. E, CI, T t-procedure; ; for all pairs: LSD / Tukey 15.2
yij = μ+τi+ βj+ εij , εij’s indep N(0, σ), i=1..t, j=1..b σ (assumed) common standard deviation E
6 2 qualitative factors (CRD with 2 experimental Any effect? / Interaction effect? T/T F-test for model / for interaction 9 14.3,
factors, or: 1 grouping factor in a population and 1 Main effect for factor A? T F-test for main effect 14.5
treatment factor, or 2 grouping factors in population) Treatment mean μij E, CI, T t-procedure
Treatment differences μij – μkl; (pairs) E, CI, T t-procedure; for all pairs: LSD / Tukey
two-way ANOVA model with interaction: Main effects μi..-μj.., …(pairs) t-procedure; for all pairs: LSD / Tukey
yijk=μij + εijk = μ+τi+ βj+ τβij+ εijk, εijk’s indep N(0, σ), σ (assumed) common standard deviation E
7 One quantitative factor; 1 data set on (x, y) β0 or β1 E, CI, T t-procedure for intercept / slope 5 Ch11
observational: 1 sample on (x, y) or Η0: β1=0 (no relationship between y and x) T F-test for the model
experimental: fixed values of x, observations on y μy or y, for a given x-value E, CI, T t-procedure
Simple Linear Regression model:
Model : yi = β0 +β1xi + εi,
εi’s independent from N(0, σ), i=1,..,n σ (assumed) common standard deviation E
8 More quantitative factors; data set on (x1, x2,,.. xk,y) βj (slope) or β0 E, CI, T t-procedure 6-7 Ch12,
observational: 1 sample, or All βj’s are zero (except β0) T F-test for the model Ch13
experimental (x-values fixed by experimenter) H0: Some specific slopes are zero T F-test for comparing full vs. reduced model
Multiple Linear Regression model: μy or y, for given x-values E, CI, T t-procedure
Model : yi = β0 +β1x1i + …+ βkxki + εi,
εi independent. from N(0, σ), i=1,..,n σ (assumed) common standard deviation E
9 qualitative factor with covariate (x) or Effect of qualitative factor? T F-procedure 11 12.7,
quantitative and qualitative factor Influence of quantitative factor? E, CI, T t-procedure, Ch16
ANCOVA model / Interaction model Lines per group E, CI, T Read from SPSS Parameter Estimates
yij = μ +τi + β1xij + εij , i=1..t, j=1..ni εij indep N(0, σ) Pairwise differences of treatment effects E, CI t-procedure; for all pairs: LSD / Tukey
yij = μ +τi + β1xij + λixij + εij i=…,j=.. εij indep N(0, σ)
* Type of Inference: E = (point) estimation, CI = confidence interval, T =testing *** t-procedure is either a t-test or calculation of a CI
** CRD and RCBD are names of experimental designs explained elsewhere
WUR-Biometris Advanced Statistics 5
Main topics in ANOVA and ANCOVA (situations 4, 5, 6, 9): model (equation and assumptions), formulating the model , interpretation of model parameters, aims of analysis,
the ANOVA and ANCOVA table, F-test for equality of means/interaction/main effects; point estimation and confidence intervals for treatment means and differences between
treatment means, checking model assumptions.
In SPSS, producing ANOVA table and F-tests, table for parameter estimates, profile plots, LSD analysis, Levene's test, residual plots.
Main topics in Regression (situation 7, 8, 9): model (equation and assumptions), formulating the model, interpretation of model parameters, aims of analysis, checking model
assumptions, R2 and R2adj. Judge how good the model is, what yardsticks to use to choose the best of several models, handling possible collinearity.
In SPSS: producing ANOVA table, table of estimated regression coefficients, saving predicted values and their standard errors / confidence bounds, saving residuals; producing
change statistic for reduced model (using two model blocks), residual plots.
II Situations where Normality is not assumed (because it does not seem to be appropriate)
Situation description Parameter(s) / Questions Inference Name / Type of test Lecture O&L
2a 1 random sample, quantitative responses x Systematic difference between distributions of T Wilcoxon signed rank test for di = xi – yi 3 6.5
and y, paired data x and of y?
3a 2 independent samples/ CRD with 2 Systematic difference in y between the 2 sub- T Wilcoxon rank sum test (Mann-Whitney test) 3 6.3
treatments, 1 quantitative response populations/ treatments? Shift alternative.
3 Prior knowledge
This course builds upon and assumes knowledge from an introductory course on statistics. At the start of
this course, students are expected to have this knowledge. Some of the relevant topics will be refreshed
in the first week of the course, but at a fairly quick pace, while introducing some new elements at the
same time.
The first five chapters of O&L largely cover most of the prior knowledge required for this course. So,
going through these chapters may refresh your memory. If you have little prior knowledge of statistics,
and a refresher from these chapters did not help, you are well advised to postpone this course and
first follow one of the basic courses in statistics (see p. ii).
Check-yourself box 1. Topics from Chapters 1, 2, 3 and 4 (important terms are in italics)
Can I
Define a random variable (RV) is and describe what a probability distribution is?
Distinguish between qualitative and quantitative variables?
Distinguish between discrete and continuous (quantitative) random variables?
Explain what a sample statistic is? Explain the (theoretical) definition and interpretation of a mean, variance, standard
deviation, median of a random variable?
Determine the right-tail and left-tail probability for a given value of a Standard Normal RV (z-value), using table 1?
Determine the z-value for a given right-tail or left-tail probability from the Standard Normal distribution, using table 1 or 2?
Carry out probability calculations for a random variable with a known distribution, with the help of PQRS?
Use the tables 1 and 2 in O&L to find tail probabilities (table 1) or quantiles (tables 1 and 2) for the standard Normal
distribution and t-distributions ?
State how to check Normality using data on a random variable?
Interpret a QQ –plot?
Calculate the sample mean, sample variance and sample standard deviation for data from a given sample?
Give the definition of an unbiased estimator?
Give the definition of the standard error of an estimator?
Explain what a (1-α)-confidence interval is for a population parameter?
Explain the meaning of the confidence coefficient (or confidence level) of a confidence interval?
Explain the difference between the terms estimator and estimate.
Explain what a test is, and enumerate the steps of a test either when using a Rejection Region or P-value (see below, p. 8).
Explain the meaning of and use in an analysis, the following terms:
null hypothesis, alternative hypothesis (research hypothesis), test statistic, null-distribution, size of a test, rejection region (or
critical region), one-sided / two-sided alternative hypothesis, P-value (left -, right- or two-sided).
Explain the terms used in experimental design: response, factors, factor levels, treatments, experimental units, measurement
units, blocks, covariate, completely randomized design (CRD) and randomized complete block design (RCBD). These
terms will be explained in week 1 of the course, so not all terms are prior knowledge.
Indicate for each of the terms to which they correspond from a description of the lay-out of an experiment.
WUR-Biometris Advanced Statistics
7
Terminology
One of the hardest parts of Statistics appears to be: terminology. As an illustration, when asking in a
group, “ what is Statistics”, one answer came, full of frustration: “It is a language”.
Much of the statistical terminology that we use in this course, e.g. in doing a statistical test, see the box
on the previous page, is supposedly known when the student comes to this course. This terminology is
used in the first lecture. Other sources for statistical terms are the Check yourself boxes.
All in all, the number of statistical terms that we use in this course is not very large. So it is really
worthwhile to invest in learning them. Go through the terminology, make a list of terms with definitions
and examples. Language (terminology) directs our thinking and mastering terminology helps our
thinking. The terms were invented to make life easier, not more difficult. This means that if you
understand the terms, it becomes easier to understand the problems to which Statistics provides answers.
Inference
Inference is drawing conclusions about a population from a limited set of observations, e.g. a random
sample. Note that this can either be a physical population (observational research) or a hypothetical
population (in case of experimental research). In inference we distinguish :
• point estimation;
• making confidence intervals; in Check-yourself-box 2 it is indicated how limits of confidence
intervals are calculated in this course
• hypothesis testing, i.e. null and alternative hypothesis, test statistic, size or significance level (α),
critical region or rejection region, P-value. The general testing procedure is given below, and
steps 2 and 3 in the test are made more specific in case we use a t-test.
Point estimation
For a given population parameter (e.g. mean maize yield in a certain region, or the difference in mean
blood pressure in a population of patients between treated and non-treated patients), how should we
estimate it, based on sample data or experimental data? By specifying a method (e.g. “we will use the
sample mean”) we define the estimator. In the two examples the intuitive estimators are: mean yield
from the sample plots, and the difference in observed mean blood pressure between the two groups. An
estimator is called unbiased if it would, on average, give the correct value, if one would repeat the
experiment “one million” times. In this course, all estimators are unbiased, except the one for a standard
deviation.
The outcome of the estimator from the experiment or sample is the estimate. Finally, the standard error
associated with this estimate is an indication of how uncertain the estimate is (how close or how far off it
may be from the true parameter). It is the standard deviation of the estimator.
8
Hypothesis testing
In Section 5.4, O&L present a statistical test as a procedure composed of five parts. In this course we will
use a procedure composed of eight steps, as shown below. This is an important topic which will come
back in nearly all lectures and practicals. Make sure to know and understand all the terms used, and
practice the correct notation in the various steps.
When asked to do a complete test, you are supposed to go through the steps listed in the table below.
4 Possible t-tests
For a given research situation, four t-tests can be distinguished. Which one we use depends on two
things.
1. Is the alternative hypothesis 1-sided or 2-sided? The choice depends on the research question, which
should be known before the data are collected.
2. Which method do we choose? Do we use RR (Rejection Region), or PV( P-value)?
When carried out properly, both methods lead to the same conclusion. So, usually we choose the method
that is easiest to carry out. We prefer to use the P-value method, if computer output is available. We then
reject H0 if PV≤ α. Note that the relevant type of P-value (right-, left- or two-tailed) is determined in
steps 4 and 5 of the test. Using the computer output we read or derive the value of the relevant P-value.
NB.
In case of a 2-sided t-test we could also use the confidence interval for the parameter of interest, if
available. We reject H0 if the H0-value of the parameter of interest is not in the Confidence Interval.
WUR-Biometris Advanced Statistics
9
Body height home-work exercise. This exercise is aimed to get used to the possible t- tests.
Suppose we investigate the mean μ of the body height (y) of Wageningen male students in 2015.
We assume that in 1985, mean body height was (exactly) 180 cm. We consider 3 different research
questions (see below: questions 1/2, 3/4, and 5/6/7), but note that in reality there is usually only one. To
answer the research question, we take a random sample of 25 male Wageningen students.
In all tests below, use α=0.05. Each time, go through the full procedure. The sample results are: 𝑦𝑦�=184,
sy=9. Using this, check manual calculations with the use of the SPSS output below, or vv .
If you have never worked with SPSS before, best is to go through the SPSS short guide, page 1. You can find the
document on Blackboard under the Practical-tab. Go through p.1 while you sit with a computer on which you run
SPSS. That should set you up for working with SPSS during the practicals. In the practicals it is often indicated
how to do a certain test, or make a graph, etc., when you do it for the first time. In subsequent cases, you are then
supposed to remember this or to be able to find that information back.
You can choose to use R instead of SPSS in one exercise per week. We will not teach how to write computer-code
in R, but for six exercises we made a small program with commands that you can ‘run’, and the produce the
necessary output. Looking at the code, you may get some idea of how R works. A few generalities about R-
programs will be discussed. You can download R and R-studio for free. You are not required to know anything at
all about R at the start of the course.
WUR-Biometris Advanced Statistics
10
Can I
Mention the two general t-procedures?
Mention the four elements that define a t-procedure?
Recognize situations 1, 2, and 3 (one-sample, paired observations and two independent samples)?
Give a research example for situations 1, 2, or 3?
Mention the assumptions upon which a t-procedure is based, for situations 1, 2, and 3 ?
Specify the four defining elements of the t-procedures for situations 1, 2, and 3.
Give the general formula for the bounds of a (1 - α) confidence interval for a parameter of interest?
Apply this formula for a specific research when situations 1, 2, or 3 applies?
Find which quantile of t-distribution to use (know what the number of degrees of freedom is)?
Find this quantile in a table of the t-distribution or with a Graphics calculator, and with the use of PQRS?
Calculate the bounds of a (1 - α) confidence interval for a model parameter if relevant data are available?
Carry out a t-test, given the four defining elements, and given H0 and Ha?
Give the degrees of freedom of the t-distribution to be used in such a t-test?
Decide when to use a one-sided P-value or R.R. and when to use a two-sided P-value or R.R.?
Determine a one-sided R.R. and determine a two-sided R.R.?
Derive a one-sided P-value from the two-sided P-value that is (by default) provided by SPSS?
Some answers:
The general form of the test-statistic of a t-test on a parameter is:
The general form for the bounds of a two-sided (1 − α) confidence interval for a parameter is:
With this basis of formal knowledge, practicing old exams should help you deliver at the exam what is required.
WUR-Biometris Advanced Statistics
11
Lecture 1. Experimental Design terms; t-procedures (CI estimation and testing)
This first lecture reviews two central aspects of statistical inference: 1) design and sampling with their
terminology and 2) t-procedures, applied to situation 1 (interest in the mean of a population or the
expected value of an experimentally observed response variable).
Students are advised to spend ample time going through part 3 of the introduction (Prior Knowledge).
t-procedures
The second part of the review contains the t-procedures: (i) determining the limits of a confidence
interval for a population characteristic or a parameter of the statistical model, (ii) t-test for a hypothesis
concerning a population characteristic or a parameter of the statistical model.
Students are supposed to be familiar with t-procedures. It is important to know both the applications in
the various situations (for week 1: situations 1, 2, and 3), and the general principles of these t-procedures.
All aspects are listed in the Prior Knowledge section, in particular Check-yourself box 2.
12
The two-sample t-test in case the two populations have unequal variances
A modified two-sample t-test compares two population means (two treatments), without assuming that
the variances of the two associated Normal distributions are the same. We will not discuss this test in
detail, but simply use the SPSS computer output (see e.g. output for O&L exercise 6.60 below) in the
following way: if Levene’s test indicates (P-value <0.05) that the variances in the two populations are
different, then we use the bottom line (equal variances not assumed) in the SPSS output for the two-
independent-samples t-test.
If the Levene-test P-value > 0.05, then we use the top line (equal variances assumed)
Theory to study
O&L Sections 5.2, 5.4 up to p238, 5.6, 5.7 up to p256, 6.1, 6.2, and 6.4. For the moment, we skip the
parts that relate to the type II error, power and required sample size (to be discussed in Lecture 2).
P
9
I
S D
M S M L U t d S
P B - . . - - - 1 .
• O&L, Exercise6.61, p352. Flare illumination Use SPSS output for Exercise 6.60
WUR-Biometris Advanced Statistics
13
Lecture 2. Sample size calculations. Wilcoxon tests.
Example: Suppose that we are interested in the mean difference in response (µ1-µ2), in a situation where
a two-sample t-test applies and that the null hypothesis is H0: µ1-µ2=0, Ha:µ1-µ2>0. For instance, Ha states
that mean piglet growth is higher for a new diet (1). Suppose that the extra cost involved in changing
from the standard diet (2) to the new diet is not economically worthwhile for a difference µ1 - µ2 smaller
than δ. Then, we suppose that the true difference is δ, and calculate how many observations we need so
that we will reject H0 with a probability that is at least equal to a pre-specified value π (power, =1-β). If
µ1 - µ2 is larger than δ, the probability that in an experiment H0 will be rejected will exceed π. β is the
type II error probability: the probability not to reject a null hypothesis that is not true.
For a planned experiment, assuming σ known as well as µ1 -µ2= δ, and given α, n, we can go the other
way around, and calculate the power of the test. If it is low, e.g.it is 0.25, we may decide that the
experiment will probably be a waste of money, because even if the real difference δ is relevant, our
experiment would only show that Ha is true with 25% probability. In practice the choice often is: do not
do the experiment, or do a larger experiment.
14
In this course we will not follow the approach of O&L for the Wilcoxon tests.
In the Wilcoxon rank sum test (situation 3a) we choose the rank sum for one of the groups as test statistic
(no Normal approximation), and we use PQRS or SPSS-output to draw the conclusion. The student
should know its expected value, in case H0 is true in order to judge if the test statistic outcome is higher
or lower than expected under H0.
Similarly, for the Wilcoxon signed rank test we use either T+ or T- as test statistic, not the minimum of
the two (as on p. 320 of O&L), or the Normal approximation. We will again use SPSS output or the
relevant PQRS picture of the distribution. The student should know the expected value of T+ under H0.
We do not consider confidence intervals.
0.0
10.3
10.1
-.1
Expected Normal Value
-.2
10.0
10.1
-.3
15
• O&L, Example 6.5, p307 reaction time vs alcohol / placebo
Flare
Mann-Whitney U 9.000
Wilcoxon W 64.000
Z -3.102
Mann-Whitney Test As ymp. Sig. (2-tailed) .002
Ranks Exact Sig. [2*(1-tailed a
.001
Sig.)]
Mixture N Mean Rank Sum of Ranks Exact Sig. (2-tailed) .001
Flare 1.00 10 6.40 64.00 Exact Sig. (1-tailed) .000
2.00 10 14.60 146.00 Point Probability .000
Total 20 a. Not corrected for ties.
b. Grouping Variable: Mixture
WUR-Biometris Advanced Statistics
16
• Exercise 6.29c academic vs non-academic twins. Extra question:
Test H0: the difference in score between the two persons in one twin has a symmetric distribution
around zero. Use SPSS output.
• Extra exercise and SPPS output for small and large plant species on Dutch ‘kwelders’
“Kwelders” are pieces of land outside the sea dikes that were formed through sedimentation of clay
from the sea. They abound on the Wadden islands (north of The Netherlands). It appears that on
older kwelders, protected from the sea by natural sand dunes, small plant species tend to disappear
through competition with larger species. A possible intervention, that may increase plant diversity
again, is the introduction of grazing cows (J.H. van Wijnen, PhD thesis, RuG, 1999).
To investigate the effect of grazing, an experiment was carried out. From a list of old kwelders,
dominated by large plant species, ten were randomly chosen and cows grazed on these kwelders for
four years. Ten other such kwelders, also randomly selected, were not grazed. At the end of four
years an index of biodiversity, sensitive to small plants and small plant species (response y, on a 0 to
2000 scale) was measured and analyzed with SPSS.
Ranks
Group Statistics
Treatment N Mean Rank Sum of Ranks
Std. Error
Treatment N Mean Std. Deviation Mean y grazed 10 12.90 129.00
y grazed 10 750.200 117.0772 37.0231 not grazed 10 8.10 81.00
not grazed 10 655.500 67.4706 21.3361 Total 20
y
Me a n S td . E rro r
Mann-Whitney U 26.000
F Sig . t df S ig . (2 -ta ile d ) D iffe re n c e D iffe re n c e Wilcoxon W 81.000
y E q u a l va ria n ce s Z -1.814
3 .4 1 8 .0 8 1 2 .2 1 6 18 .0 4 0 9 4 .7 0 00 4 2 .7 3 10
as s um ed
As ymp. Sig. (2-tailed) .070
E q u a l va ria n ce s
2 .2 1 6 1 4 .3 8 4 .0 4 3 9 4 .7 0 00 4 2 .7 3 10 Exact Sig. [2*(1-tailed a
n o t as s u m e d .075
Sig.)]
Exact Sig. (2-tailed) .075
a. Assuming Normally distributed observations, test if the mean plant diversity Exact Sig. (1-tailed) .038
for grazed “kwelders” is higher than for non-grazed kwelders. Point Probability .006
b. Test if mean plant diversity for grazed “kwelders” is systematically higher a. Not corrected for ties .
than for non-grazed “kwelders”, with a minimum set of model assumptions. b. Grouping Variable: Treatment
c. Compare the two P-values and conclusions. Is the result surprising?
WUR-Biometris Advanced Statistics
17
Lecture 3 Inference about one population proportion
Inference about the difference between two proportions or probabilities
We will discuss the analysis of a binary response, i.e. a response that can only take two possible values:
often denoted by 1 and 0, or “true” and “false” or “success” and “failure”. For instance, we may
randomly select an individual from a population and establish whether that individual is diseased (x = 1)
or healthy (x = 0). The expected value of the response x (or long term mean) is the probability, say π,
that we will draw a diseased individual if we randomly draw one from the population. This probability is
the proportion of people in the population that are diseased. In formula: E(x) = μx = π. The variance of x
(square of the standard deviation) is Var(x) = σ2x = π(1-π).
When we take a random sample of n individuals (the sampling units) the total number of successes y will
follow a Binomial distribution with parameters n (the number of observed units) and π (the individual
success probability). This summary statistic y is used in the binomial test for one proportion. The
theoretical mean and variance of variable y are:
E(y) = μy = nπ,
Var(y) = σ2y = nπ(1-π)
Note that there is no separate parameter for the variance: if we know π, then both the mean and the
variance of x are known. The sample mean response 𝑥𝑥̅ = (𝑥𝑥1 + ⋯ + 𝑥𝑥𝑛𝑛 )/𝑛𝑛 is the sample proportion of
successes y/n and can be used for inference about 𝜋𝜋.
The binomial distribution is discussed in Ch. 4 of O&L. The binomial test is discussed in class. Here we
give an example.
Example of a binomial test: test if the fraction of cows with walking problems is higher than 0.3, using a
random sample with size n=20, assuming that 9 cows in the sample have walking difficulty. The steps of
the test are as follows.
1. H0: π=0.3,. Ha : π>0.3 2. TS: y = number of cows that walk with difficulty in the sample,
3. If H0 is true, y~Binom (20, 0.3) 4. Under Ha y tends to larger values, so we use RPV.
5) Reject H0 if RPV≤0.05. 6. Outcome TS: y=9. 7. RPV= PH0(y≥9) = 0.0654+0.048=0.1134 > 0.05,
so H0 is not rejected, Ha is not proven, it is not shown that more than 30% of the cows in the population
walk with difficulty. In other words, although it seems that the fraction is larger than 0.3 (9/20=0.45) the
evidence is not strong enough to consider Ha proven.
Note 1: LPV= PH0(y≤9) = 0.0654 + 0.8867 = 0.9521, so here LPV+RPV≠1, because this distribution is
discrete, not continuous.
Note 2: two-tailed PV = 2xRPV = 2x .1134=.2268. if we apply the simple principle for 2-tailed PV:
2*min(LPV, RPV) (with max of 1), but his definition may not be appropriate for cases like this
with a non-symmetric distribution.
WUR-Biometris Advanced Statistics
18
Comparing two population proportions: confidence interval and Fisher’s exact test (situation 11)
Suppose that we want to compare the probabilities π1 and π2 to be diseased for individuals that are not
vaccinated and individuals that are vaccinated. We collect a random sample of size n1 of non-vaccinated
and a random sample of size n2 of vaccinated individuals, and count the number of diseased individuals
y1 and y2 in each sample. For inference about the unknown π1 and π2 we use the sample proportions
y1 / n1 and y2 / n2. O&L explain how a confidence interval for π1 – π2 can be constructed. This is also
explained in class. The z-test explained by O&L is skipped in this course.
For testing equality of two proportions we will only discuss Fisher’s exact test. In contrast with O&L, we
will not use complex calculations to obtain a P-value (see p. 512), but we will use either SPPS output or
a PQRS picture of the relevant Hypergeometric distribution. O&L does not explain that the distribution
from which we can calculate the P-value is a Hypergeometric distribution, which is the distribution used
for the so-called ‘Vase model’.
The Vase model describes the following situations: N balls are placed in a vase, K of these are white, N-
K are red. We draw n balls (without replacement) from the vase, so N-n stay in the vase. The number of
white balls in the sample, X, has a Hypergeometric distribution: X ~Hypergeometric (N, K, n).
Likewise, for the number V of white balls that stay in the vase: V ~ Hypergeometric (N, K, N-n), etc.
Example of Fisher’s exact test. Suppose, in the above example, we want to test the null hypothesis
1) H0: π1 – π2 = 0 versus the alternative hypothesis Ha : π1 – π2 > 0. (It is expected that non- vaccinated
individuals are diseased more often.) Also suppose, that n1 =12 and n2 = 10, and that for both groups 7
individuals are not diseased. See the tables below for the summary of the data.
2) we choose as the test statistic (for example): X = nr of vaccinated cows that are diseased.
3) Under H0, X ~ Hypergeometric (22, 10, 8). 4) Under Ha X tends to lower values, so 5) use LPV.
6) Outcome: X= 3 , so 7) LPV = 0.1563 + 0.2972 > 0.05 so H0 is not rejected, Ha is not proven. It is not
shown that vaccinated cows are diseased less frequently than non-vaccinated cows.
NB: we could have used X=number of vaccinated non-diseased cows, the Hypergeometric (22, 14, 10)
distribution (see first PQRS picture), and RPV. The P-value would be exactly the same.
In practice
People who work a lot with probabilities, e.g. in risk analysis or horse races gambling, do not usually
𝜋𝜋 /(1−𝜋𝜋1 )
consider differences between probabilities, but rather the ratio π1/π2 or the odds ratio 1 . These are
/(1−𝜋𝜋 )
𝜋𝜋2 2
not discussed in this course.
WUR-Biometris Advanced Statistics
19
Theory to study
Review: O&L Section 4.8 up to p. 165. O&L Section 4.13, p191-193 (skip the continuity correction).
O&L p. 499 – 502, line 7, 504, line 28 – 505, line 6 (or: 10.1, 10.2, but skip WAC interval and z-test).
Study guide and lectures: material on Binomial test.
O&L section 10.3 up to the end of p. 509, Fisher’s exact test on p511, without P-value calculation. Study
guide and lectures: material on Fisher’s exact test.
Note on PQRS:
Any of the four cell values could be used for the test. In
PQRS, however, only for the parameters (150, 75, 56) will
the Hypergeometric distribution give results. If the parameter values are too large, the following message
is given:
20
COMPUTER PRACTICALS
Introduction:
Each lecture is followed one day later by a computer practical, in which you learn how to do in practice the
analyses discussed in class, that is, on the computer. In the computer practicals you will use software programs
SPSS, PQRS and R. You work in pairs (teams of two) during the computer practicals.
The text of the computer practical exercises should speak for themselves. At the first practical there will be a short
introduction, but from then on, you can work through the exercises on your own.
Writing answers
Writing answers helps you to learn how to formulate, and may reveal errors in your thinking. A computer program
does not give answers, it just generates output that you ask it to give. But you have to give answers to the questions.
That is why you have to write answers to the questions in a readable way in a separate note-book or sheet (not in
the study guide). Write it so that practical teacher / assistant can easily see whether your answer is correct.
Getting started with R-studio (for those who choose to do one exercise per week in R)
1. Open R-studio. In the right hand side of the screen (in the middle) you will see the following:
21
Computer practical 1 t-procedures (Answers should be written down in your notebook.)
AIM: Learn how to use SPSS to carry out t-tests and to make confidence intervals for situations 1, 2, 3, The
directions for how to do things are given in the SPSS short Guide, chapters 5 and 5a.
1. Dissolved oxygen (5.43) Data are in “Oxygen.sav”
In a river the amount of dissolved oxygen is observed. It is feared that the level is too low due to dumping of
pollutants by a sewage treatment plant (plant = factory). Over a 2-month period, 8 times a small bucket of water
was taken from a river at a location 1 mile downstream for which the amount of dissolved oxygen was determined
in parts per million (ppm). The data (yi, i=1,..,8) are bucket 1 2 3 4 5 6 7 8
in the table. Note that in SPSS they are given in one Oxygen (ppm) 5.1 4.9 5.6 4.2 4.8 4.5 5.3 5.2
(vertical) column, not in a (horizontal) row.
a This is a case of “situation 1”, see p. 4/5. This means that we measured one variable in 1 sample of 8 buckets,
so in the data set we can expect one (vertical) column of 8 measurements. Open the data file in SPSS (File >
Open > Data) and check that the data set looks as was expected. [Note that in real life we might add columns for
e.g. location, date and time of the observation]
b Obtain the sample mean (𝑦𝑦�) and the sample standard deviation (sy), using Analyze > Descriptive > Explore. Read
the 95% confidence interval for µ = mean dissolved oxygen level during the 2-month period for the location.
c Get these outcomes again (𝑦𝑦�, s, and the 0.95-confidence interval for µ) using Analyze > Compare Means > One-
Sample T Test... This should result in the output given below.
Write down the formula for the confidence lower limit in numbers: 4.5737 = 4.95 – …… *……..
One-Sample Statistics
Std. Error
N Mean Std. Deviati on Mean
Oxygen 8 4.9500 .45040 .15924
Test V alue = 0
95% Confi denc e
Int erval of t he
Mean Di fference
t df Si g. (2-tail ed) Di fference Lower Upper
Ox ygen 31.085 7 .000 4.95000 4.5735 5.3265
d Fish can only survive if the mean dissolved oxygen level is at least 5.0 parts per million (ppm). A law suit
against the sewage treatment plant will be started if it is proven that the mean oxygen level is less than 5 ppm.
Test this research hypothesis: write down: 1) the null- hypothesis and alternative hypothesis , 2) the test-
statistic, 3) distribution of the test-statistic, if H0 is true 4) the outcome of the test-statistic, 5) the relevant (left,
right or two-tailed) p-value, and 6) give the conclusion, also in words. Check the p-value using PQRS.
[You can obtain the correct output by filling in the value “5” in the box for ‘test-value’ in the menu.
Answer: t=-0.31 and the relevant p-value is 0.38]
e The output obtained in question c (see tables above) shows a t-value of 31.1 and a 2-tailed p-value of 0.000.
What is the (nonsense) null-hypothesis and what is the alternative hypothesis that is tested with this outcome?
Answer: We can test H0: ....... =.......... vs. Ha: .................................
a What are the experimental units? Which situation is applicable here (see p. 4/5)? Take a look at the data also
using View > Value labels. The data are displayed in two ways, as 20 independent observations (20 rows) and as
10 pairs. Which one is correct?
b1 Is the mean load capacity of the new alloy different from the mean load capacity of the old alloy (α = 0.05)?
Go through the steps1 through 5 of the test. Then obtain the necessary SPSS output using: Analyze > Compare
Means > … T Test...]. Now finish the test by giving the outcome of the test statistic, the p-value of your test and
WUR-Biometris Advanced Statistics
22
the conclusion. (Hint 1: in the SPSS menu use the columns with 20 values for strength and for group. Hint 2: .
After entering the Grouping Variable, click Define Groups, and enter the values for the 2 groups as in the data
set. Here these values are 1 and 2. )
b2 Also give the 95% confidence interval for the mean difference in strength between the two alloys.
c The beams produced from the new alloy are more expensive than the beams produced from the currently used
alloy. Thus, the new alloy will be used only if the mean load capacity is more than 5 tons greater than the mean
load capacity of the currently used alloy. Based on this information, would you recommend that the company
use the new alloy ? What is H0 and Ha? In your formulation, use the parameter of interest:
H0: ................. = ....., Ha: ................. > .............
d Look at the result of Levene’s test. What is H0 in Levene’s test? What is the conclusion of the test?
Levene’s test is a test for testing the hypothesis H0: σ12 = σ22. A low p-value (sig.) indicates that the variances
differ. Levene’s test can also be used to compare more than two variances. The test will be explained in week 4.
For the case that the two variances are different, SPSS also provides the outcome of t' (notation of the book) and
the number of degrees of freedom that is being used in the so-called Satterthwaite-approximation (Equal variances
not assumed). This output is in the 2nd line of the table.
a Open the data file in SPSS and look at the way the data are organized.
What are the experimental units in this research (the units receiving a treatment)? What types of response
variables (binary, nominal, numerical discrete, numerical continuous) are used?
b First there is interest in the mean time required to read the passage. Test if the New reading program leads to
shorter required reading time at α=0.05. Write the outcome of the Test Statistic, the relevant distribution to
compare it with, the relevant P-value and the conclusion.
c Make a QQ-plot for the reading time per group (Analyze-Descriptives- Explore, use
factor = Group, and in the Plots-menu check the box “Normality plot with test”
see picture). Could the observations come from a Normal distribution?
d Give a 95% CI for the difference in mean comprehension. Argue why a
difference in mean comprehension is shown / is not shown?
Open the file with n=50. In Explore, choose all 10 variables in the dependent list, and no factor; then under display
choose Plots. Under Plots check the Normality plots with tests box; under Boxplots, choose Dependents together,
and do not check other plots. (Continue - OK)
In the output you will see three things of interest. 1. P-values for two tests on Normality are shown in the second
table in the output. Are the conclusions from the Shapiro-Wilk test correct all 10 times? 2. A QQ plot for each
variable: check that the first five show a pattern in line with Normality, the rest does not. 3. At the bottom: a side-
by-side Boxplot for the ten variables. You can clearly see the difference, e.g. in symmetry of the distributions.
Repeat the process for n=8 and n=20. Check that for n=8 it is much more difficult / to distinguish the Normal data
from the Exponential data.
23
Computer practical 2 t-procedures and non-parametric tests
(Please write down your answers)
AIM: Learn how to use SPSS to carry out tests for situations 1a, 2a, 3a. The directions for how to do things are
given in the SPSS short Guide, chapters 5 and 5a.
1. Change in pH after mining Data are in “Coal mining.sav”
After mining for coal, the mining company is required to restore the land to its condition prior to mining. One of
the many factors that is considered is the pH of the soil. (The pH is important for the types of plants that will
survive). The area was divided into over 1000 grids before the mining took place. Fifteen grids were randomly
selected and the soil pH was measured before mining. When the mining was completed, the land was restored and
on the same 15 grids pH was measured again.
a What are the sampling units? How many units are there? Does it correspond to the number of rows in the data
set? (General Rule: Number of rows in SPSS data = Number of independent experimental/sampling units).
Which situation applies here: situation 2 or 3?
b Produce output for the t-test for H0: "no mean difference in pH before and after" against the alternative that
there is a difference.
b1 Carry out (α = 0.05) the appropriate t-test: give test statistic, and, using SPSS output, outcome of test-statistic,
p-value and conclusion.
b2 Give the confidence interval for the difference in mean pH before and after. Explain how the CI confirms or
does not confirm the conclusion drawn under b1.
b3 Generate the difference d between after and before using
Transform-Compute. Use the 1-sample t-test to arrive at the
same conclusion as in b1
c Which graph(s) is (are) useful to find out if the t-test is valid? Make this/these graph(s). Conclusion?
d Suppose you would want to do a new investigation with the aim to test for an increase in mean pH with
α=0.05. If the true increase is 0.06, you want the power of the test to be at least 0.8. Write the sample size
formula (end of chapter 6), and calculate the required sample size.
e Test if the median of the difference in pH between is zero, using a non-parametric test. Click on Analyze >
Nonparametric Tests > …...(make the appropriate choices)
After obtaining the output, double-click on it in order to obtain more details.
Write down the (definition of the) test statistic in the test, the outcome of that test-statistic and the p-value. Is
the conclusion in line with your answer in question b1?
2. Plant density after oil spill (p 292 and 326) Data are in ”oil spill.sav”
On January 7, 1992, an underground oil pipeline broke and caused the contamination of a marsh along the Chiltipin
Creek in Texas, USA. The cleanup process consisted of burning the contaminated regions in the marsh. To evaluate
the influence of the oil spill on the mean flora density (μ), researchers studied plant growth 1 year after the burning.
They measured flora density in 40 randomly selected sites in the uncontaminated (1=control) region and in 40
randomly selected sites in the contaminated (2=burned) region.
Independent Samples Test
a Open the spreadsheet. Take a quick look at the data also using View > Value labels. Is the structure of the dataset
correct for the case of 2 independent samples? (yes / no; if no, why not?)
b Get the SPSS output for the 2-samples t-test. Inspect the summary statistics for the response variable in both
groups in the table Group Statistics. Compare the values with those on p327.
WUR-Biometris Advanced Statistics
24
c1 For the outcome of the test statistic and the p-value: do you use the top line of the table or the bottom line?
Why? Does it matter in this case?
c2 Test H0 : µ1 = µ2 against H1: µ1 > µ2 at α = 0.05, using the output that you generated under b. Give the
outcome of the test-statistic, the p-value for the one-sided problem and the conclusion of the test (in words).
d Determine the 0.95 confidence interval for µ1 – µ2 . Which conclusion follows for the test with Ha: µ1-µ2≠ 0?
e We now wish to apply a nonparametric test. Ha: the plant density after oil spill is systematically lower than in
similar, but uncontaminated, areas. Click on Analyze > Nonparametric Tests > …...(make the appropriate
choices). Note that the Mann-Whitney test is the equivalent of Wilcoxon’s rank sum test.
After obtaining the output, double-click on it in order to obtain more details.
Write down the (definition of the) test statistic of the Rank sum test, the outcome of that test-statistic and the p-
value. Is the conclusion in line with your answer in question c?
25
Computer practical 3 Binary data
AIM 1. Learn how to use SPSS / PQRS for doing inference on one proportion (Binomial test), and two
proportions (Fisher’s exact test). 2. Practice making a confidence interval for one proportion and the
difference between two proportion by hand.
a1 What are the sampling units in this investigation? How many are there? Use Analyze-Descriptive statistics-
Descriptives for the variable pass. The mean is 0.5, and n=2. Choose Data- Weight Cases. After weighing cases
by Freq, try Descriptives again What is n this time? What does the mean (0.6) represent in practical terms?
a2 Give estimator (formula) and estimate for π.
b1 (binomial test using PQRS) We want to test H0: π=0.7 vs. Ha: π<0.7 is the number of ‘successes’ in the sample.
Write down steps 2, 3, 4 and 5 of the test. Give outcome of the test statistic and use PQRS to find the exact p-
value. See PQRS graph below.
Check that you get the same result in PQRS with p=0.3, and outcome 60 (the number of cars that fail the test.)
b2 (binomial test SPSS – approximate p-value) Produce output for the binomial test in SPSS: Analyze >
Nonparametric Tests > One sample. Under Settings choose Binomial and under Options Test Proportion 0.70 and
Success values 1. Double-click the output-table to get more details. This also helps to check the output. See
output at the bottom below.
b3 (binomial test SPSS - exact p-value) This time use: Analyze > Nonparametric Tests > Legacy Dialogs > ……
Choose 0.7 as test proportion and click on Exact, etc. In the output check the Observed vs. Test proportion.
This should be 0.6 vs 0.7. [If it is 0.4 vs. 0.7, you should change 0.7 into 0.3 in the menu choices.] Double
click the table and the exact 1-tailed p-value and check it is 0.0057. See Binomial test output below.
SPSS gives an asymptotic p-value (0.005) based on the z-test that we do not discuss in the course.
It can also give (b3) the exact one-tailed p-value. The exact p-value based on the Binomial test is 0.0057.
c Use the formula in the book for the approximate 0.95 confidence interval for π and calculate the limits using
e.g. Excel. An exact 0.95 confidence interval is given above. This so-called Clopper-Pearson confidence
interval is not discussed in the course
WUR-Biometris Advanced Statistics
26
2. Comparison of two probabilities. Data file: “Instruction and exam result.sav”
An educational researcher wants to compare teaching English using a computer software
program to the traditional classroom system. She thinks the computer aided method will be
better than the traditional one, that is, this method will result in a higher fraction of students
passing a test.
The researcher randomly assigns 60 students from a class of 100 to instruction using the
computer. The remaining 40 students are instructed using the traditional method. At the end of a 6-week period, all
100 students are given an exam with the results (1=pass, 0=fail) in the table.
b3 Use SPSS. Descriptive Stats > Cross tables; choose Statistics and check the Chi-Square box. Then Read the 1-
tailed p-value for Fisher’s exact test. Check that this p-value is in line with what you found using PQRS.
c1 Calculate the 0.95 confidence interval (z-procedure) for the difference in fraction of students passing the exam
between the two methods of instruction using calculator or Excel.
c2 Use the two-independent-samples t-test to compare the Instruction methods. Read the 0.95 confidence interval
for the difference in fraction of students passing the exam between the two methods of instruction. Check that
this interval is similar to the one you calculated in c1.
WUR-Biometris Advanced Statistics
27
3. The Power of a test file “Simulation Power 2 samples.xlsx
A new alloy is proposed for the manufacture of steel beams (Practical 1.3). The following procedure is carried out.
For both types of alloy, ten beams are randomly selected from the produced beams. Their strengths are measured,
the data are entered in SPSS and finally a t-test is done to conclude if the new alloy gives stronger beams.
Now we can ask how good this procedure is. For example, if the real mean difference between the two alloys is 2
(e.g. 26 vs. 24), will the test give a significant result? This question can be answered if we also know what the
spread (the standard deviation) is between the beams for each alloy.
Another, more general question is: which number of beams should we choose?
Open the file “Simulation Power 2 samples.xlsx”. In this file it is simulated that 200 experiments are carried out. In
this way we get an insight in what would happen if we repeat an experiment many times. To be able to simulate the
data we need to know what the means in the populations are and what the standard deviation is for both
populations. We assume here that the standard deviations are equal. We also have to specify the sample size. Then
we simulate what happens if we repeat the procedure 200 times. We can then see how many times out of 200,
H0:µ1- µ2 =0 is rejected vs Ha: μ1-μ2>0. The relative frequency of rejecting H0 is a simulated value of the power of
the test. You can repeat the calculations by pressing F9 (Calculate now).
a1 Choose n = 10. Find out what the power of the t-test is if the real mean difference between the two alloys is
equal to 1 and if the standard deviation is 2.
Finish (in your notebook or on your note-sheet) the following statement:
Assuming σ = 2, α=0.05, if the real mean difference in strength between the two alloys is 1, in an experiment
with 10 beams for each treatment we will reject H0 of equality of the two mean strengths with probability .....
In other words, the power of this test-procedure is……………..
So, assuming σ=2, if a real mean difference of 1 would be relevant, is this a good experiment?
c In a and b it is demonstrated that the power of a test depends on the true mean difference and on the spread in
the population. These values cannot be varied by the researcher. The only thing that the researcher can change
is the sample size, n.
To decide on n, one has to have an idea of the standard deviation σ. This information has to come from
previous research (own research or published research). Let us assume that σ=2.
For µ1-µ2 we usually choose for the minimum relevant difference. So we ask ourselves: what is the smallest
µ1-µ2 that would be relevant in practice. For example, the difference in mean yield of 1 kg/ha (8000 vs. 8001
kg/ha) is not relevant, a difference of 1000 would be very relevant. Somewhere in between these two values
one needs to pick the minimum difference that is still relevant.
In our example, let us choose 1.5 to be the minimum relevant difference. If we require a power of 0.8 for the
one-sided test, what would be the required sample size? Try various values for n, and choose the sample size
that fulfils your precision requirements. Note that the
simulated power can vary somewhat. To get a somewhat n .. .. .. .. .. ..
more precise value of the power, you can press F9 a few Power
times.
d Compare the outcome of c with the outcome of the formula (p. 332) for the sample size (for a 1-sided Ha).
WUR-Biometris Advanced Statistics
28
4. Marker for greening in potato Data are in: “Marker for greening.sav”
In an experiment 151 potato cultivars are used to find associations of all kinds of traits
with genetic markers. One such genetic marker is binary: it has levels A and B. One of
the investigated traits is greening: the phenomenon that the potato tuber turns green if it is
exposed to sunlight. Out of 151cultivars 96 have marker value A (we call them A-
cultivars), out of which 63 have the greening trait, and 33 don’t. Of the 55 B-cultivars,
26 have the greening trait, 29 don’t have the greening trait.
a Open the data file. What are the units in this study? Are they randomly selected? How many units do we
have? Use Weight cases to feed this information to SPSS. [Note: after using the variable Freq in this way, one
should use it no more in any analysis. ]
b Give a 95% confidence interval for the fraction of greening cultivars for the A-cultivar and the B-cultivar
You can do that in one command in Explore using marker as factor. Do the two intervals overlap? Is overlap
indicative of a difference that is not significant? [Note: you can also do this with Data- Split File by Marker
followed by one-sample T-test. This gives output separate for each Marker level.]
c1 Test if the fraction of A-cultivars with the greening trait is the same as for B-cultivars. Use the exact test of
Fisher as follows: Descriptive Stats > Cross tables; choose Statistics and check the Chi-Square box ).
c2 Use PQRS and the hypergeometric (N=151, N1=96, n=62)-distribution to find the p-value of Fisher’s exact
test, when the outcome is 33. Check that the p-value for outcome 29 in the Hypergeometric(151, 62, 55) is the
same. See also the pictures of this distribution in the study guide, Lecture 3.
Note: using e.g. hypergeometric (N=151, N1=62, n=96)
should give the same result but because the numbers are so
large you will for that case receive an error-message.
d Give an approximate 95% confidence interval for the difference in fractions of greening cultivars between A
and B (using the two-independent samples t-test SPSS Guide 5.3). How is the conclusion of c confirmed?
WUR-Biometris Advanced Statistics
29
Aim of the PPP’s is to help digest the material offered. The form is such that it closely resembles the
set-up of the exam. The PPP meetings on Friday are meant to work at these exercises. You can alos
(partly) prepare them at home and ask questions about parts that are not clear, or you can ask the
the teachers to check your answer.
PPP week 0
These are home-exercise for probabilities and quantiles with tables 1 and 2.) Make a sketch (with a
few numbers on the x-axis) of the relevant distribution.
In all cases, you can use PQR or your Graphics calculator to check the answers.
A. Use table 1. Calculate the following probabilities using table 1 or your calculator. X ~N(0,1)
P(X>1.5) P(0<X<1.5) P(-1<X≤1.5)
B. Suppose Y ~ N(30, 10). Use the z=transformation z=(Y-30)/10 and table 1 to calculate:
P(Y>45) P(20 < Y < 40)
D. In a t-test:
1) under H0 t ~
t20. Outcome is 1.3. RPV=0.104. Give LPV, and 2-tailed PV.
2) under H0 t ~
t 10. Outcome is -0.7, 2-tailed PV is 0.5. Give LPV and RPV.
3) under H0 t ~
t 1945. Outcome of t is positive, 2-tailed PV is 0.09. Give LPV and RPV.
4) under H0 t ~
t 15. LPV is 0.968. Give RPV and 2-tailed PV. What is the outcome
approximately? Use table 2 in O&L.
5) under H0 t ~ t 15. Give the Rejection Region for the 2-sided test, α=0.05.
5) under H0 t ~ t 15. Give the Rejection Region for the left-sided test, α=0.05.
30
PPP week 1
In all tests, if α is not mentioned, use α =0.05
1. We take a random sample of 4 maize plants and measure N-content in the leaf. Observations for
y are: 12, 7, 8, 5. We assume y ~ N (μ, σ).
A. Calculate sy and 𝑦𝑦�;
B. Give a 0.95 Confidence Interval (CI) for µ, the population mean of y.
C. Test with the CI if µ could be equal to 10, or not. Give H0 and Ha, also give the conclusion and
the argument for the conclusion.
D. Test with the 8 steps, Rejection Region method, the research hypothesis that μ differs from 10.
E. Which sample size is needed to make a .95 CI for μ with a Error Margin of 2.5? Assume σ=3.
2. On a field 16 maize plants are randomly selected. For each plant the N-content in the stem is
determined. The research hypothesis is that the mean of all plants in the field is higher than 5.
To test this hypothesis we will use a t-test.
A. What are the four defining elements for this t-test? (The 4 elements are: Parameter of interest,
estimator, standard error of the estimator, and degrees of freedom for the relevant t-
distribution.)
From the data, we find: 𝑦𝑦� =6 and sy=2.
B. Go through the 8 steps of a test using a Rejection Region.
C. Write the 8 steps of the test again, but now with the p-value method. Find P-value with PQRS.
D. Suppose you want to do another experiment to test if the mean is more than 5, so a one-sided
test. Which sample size is needed, to achieve the following precision. If the real mean is 6 (or
more), then the power of the test should be at least 90% (while α=0.05).
3. In India in a region where irrigation is often applied, 18 farms with irrigation are randomly
selected and so are 18 farms where no irrigation is applied, 36 farms in total. For a one-acre plot
it is measured on each farm what the labour input has been for the plot on each farm.
A. Question is if, on average, irrigated farms use more labour for maize production then farms
where there is no irrigation. Use the output below to test this hypothesis. Write down all the
steps.
B. What are the four defining elements in the test mentioned in question A?
C. To make a 95% confidence interval for mean difference in labour use with a width of at most 4,
what sample size would be needed per group?
D. Which analysis would you do if we had used 18 farms with each one irrigated and one non-
irrigated one-acre plot, and per farm measurements were done on 1 non-irrigated plot and 1
irrigated plot?
WUR-Biometris Advanced Statistics
31
4. In the previous exercise suppose we had only 2x4=8 farms, with outcomes
non-irrig 12, 21, 17, 19 irrigated 25, 28, 20, 27
Use a non-parametric test at α =0.10 to see if labour use on irrigated fields is systematically
larger than on non-irrigated fields. Go through all the steps. Use PQRS to get the p-value.
A. Use a non-parametric test to test if labour use on irrigated fields is systematically larger than on
non-irrigated fields. Go through all the steps. Use PQRS to get the p-value.
B. Now assume that a t-procedure is appropriate. Which assumptions should then be
(approximately) valid? What are the four defining elements of the t-procedures?
C. Carry out the t-test, using the Rejection Region method. Do your own calculations. Check that
the relevant standard error is 1.92.
D. Make a 0.95 CI for mean difference in labour use between irrigated and non-irrigated 1-acre
fields.
The same data are used, but now we assume that the Normality assumption is violated.
E. what is the situation (see p. 4/5 of the study guide).
F. Calculate the outcome of the appropriate test statistic.
G. If available, use PQRS to derive the two-sided p-value.
WUR-Biometris Advanced Statistics
32
PPP week 2 (in all tests use α=0.05)
Part I
A. We first focus on the fraction of smokers among female students. Which situation applies?
B. Give a .95 Confidence interval for the fraction smokers among female students.
C. Test (two-sided) if the fraction of smokers among female students is 0.3 using the result of B.
D. As C, but now with an exact test. Write down all the steps of the test. Use PQRS (if available) or
your graphics calculator to find the relevant p-value.
E. Check with the SPSS output above that your result is correct.
Part II
F. Test with an exact test if the fractions of smokers for male and female students are the same.
Give outcome of the chosen test-statistic. Use PQRS to find the relevant p-value, and give your
conclusion.
G. Calculate the standard error of the difference between the two fractions.
H. Which situation applies for questions F and G?
2. In the US the 3 most common cancers according to their relative frequency are: breast (25%),
lung (21%), and colon (16%) of all cancer patients. The rest forms 38% of this patient group. (We
suppose that these numbers are exact and we ignore that a cancer patient can have more than one
cancer.) In the Netherlands 200 cancer patients are randomly selected. It is tested if the relative
frequencies in the Netherlands are the same as in the US.
We also obtained data from Canada and Type Breast Lung Colon Other Total
Sweden. Canada 20 40 40 50 150
Sweden 40 40 40 40 160
D. Are the differences in observed relative
frequencies for the 3 countries (NL, Sweden,
Canada) significant? Give the situation, the full
name of the test, and carry out the test using
the SPSS output given below.
E. For Canada-Breast and Sweden-Lung: calculate
the expected frequencies used in D.
F. The SPSS output says: 0% have expected count less than 5. Is that good? Explain!
WUR-Biometris Advanced Statistics
33
3. Correlation calculations
Data on (x, y) for 4 French students are given. They Unit 1 2 3 4
represent the mark for Statistics that they scored in x 2 5 8 1
Wageningen, and the one they scored for an earlier y 12 14 17 5
course in France (where the maximum score is 20).
x y x-𝑥𝑥̅ y-𝑦𝑦� (x-𝑥𝑥̅ )2 (y-𝑦𝑦�)2 (x-𝑥𝑥̅ )⋅
y 20 (y-𝑦𝑦�)
15 2 12
10 5 14
5 8 17
0 1 5
0 x 10 sum 0 0 Sxx= Syy= Sxy=
average -- -- -- --
A. Show by calculating it, that the sample correlation coefficient between x and y is rxy = 0.89.
B. Why is not meaningful to compare the averages of x and y.
C. The sample correlation is high; is it also significant? Show that t=2.74, but that the Rejection
Region for α = 0.05 is |t| > 4.303.
D. Without redoing the calculations, but just looking at the data/graph: what is the Spearman rank
correlation coefficient?
13
10
C.
Strong positive
7
wide r = 0.92 Correlation (!?)
narrow r = 0.59 Can you give an
4 example of such data?
8 12 16 20