0% found this document useful (0 votes)
135 views33 pages

Advanced Statistics Study Guide

This document provides an introduction to the Advanced Statistics course, including information about course structure, learning goals, assessment, and an overview of statistical situations covered. The course consists of lectures, computer practicals using software like SPSS, and pen-and-paper practicals. Students will learn to choose experimental designs, conduct statistical analyses, interpret results, and draw conclusions. The exam consists of open-ended and multiple choice questions. Statistical situations covered include one-sample, paired, two-sample, and ANOVA models for comparing multiple groups or factorial designs.

Uploaded by

vandal licious
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
135 views33 pages

Advanced Statistics Study Guide

This document provides an introduction to the Advanced Statistics course, including information about course structure, learning goals, assessment, and an overview of statistical situations covered. The course consists of lectures, computer practicals using software like SPSS, and pen-and-paper practicals. Students will learn to choose experimental designs, conduct statistical analyses, interpret results, and draw conclusions. The exam consists of open-ended and multiple choice questions. Statistical situations covered include one-sample, paired, two-sample, and ANOVA models for comparing multiple groups or factorial designs.

Uploaded by

vandal licious
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

WUR-Biometris Advanced Statistics 1

Introduction to the course Advanced Statistics (MAT-20306)


1. Practicalities and learning goals for the course
1.1 The setup of the course
This course runs for eight weeks, with six weeks of classes. Each week, there are three types of
classes: two lectures, two computer practicals (compulsory), and one pen&paper practical (PPP). Each
class lasts two hours.
Lectures consist of a presentation of the theory with examples. They will also specify for
each topic what is expected of you (learning outcomes). In this study guide it is stated for each lecture
what material will be covered, including examples to be discussed in class. Homework exercises are
suggested for you to do before or after the lectures.
In Computer practicals students do exercises with the computer, employing statistical
software (SPSS, but also PQRS and R) to do statistical inference. In doing the exercises, the link
between the data collection method, the model used in the analysis, the computer commands, the
computer output and the final conclusion(s) are important.
The computer practical exercises are found in the 2nd part of this study guide. If you are new to
SPSS, then download “SPSS short guide.pdf” before the first practical. It can be found at Blackboard
of MAT20306, under “Practical”. Start SPSS on any WUR computer, and go through the introduction
of this SPSS guide. It is only one page, an example with data on three students. Do this example
before the first practical. It should help you to get started in SPSS.
Pen&Paper practical: a class where students will do exercises that should help to ‘digest’
the material discussed in the lectures and practiced in the computer practicals. Using computer output,
you will do tests, determine confidence intervals, check your understanding of how computer output
is calculated, discuss design issues, recognize “situations” (see p. 4/5) and the extent to which data
analysis conclusions are valid, based on the sampling or experimental design. It contains the kind of
exercises you will do at the exam, hence PPP helps to prepare for the exam.

1.2 Study material


The study material consists of
• this study guide. The study guide gives a brief introduction to each lecture. It will also tell you
what parts of O&L to read, what examples and exercises will be discussed in class and what
exercises are suggested for home work. It sometimes provides some extra output of SPSS for the
Exercises, possibly followed by sections with output from SPSS and / or extra exercises (in
addition to O&L).
• material put on the Blackboard (website), such as the Lecture powerpoints, answers to Exercises,
old exams with answers, data and answers used in the computer practical. Link:
https://edu6.wur.nl. For certain parts username and pw are needed.
• the book, “An Introduction to Statistical Methods and Data Analysis” (6th edition), by R. Lyman
Ott and Michael Longnecker. It is referred to as “O&L” from now on. O&L is the principal
source of information for this course.
• PQRS Both for the practical and for use at home, it is important that you know how to work with
the probability calculus program PQRS. PQRS visualizes distributions and helps to see the
meaning of values that you can look up in tables (e.g. tables 1, 2, 7 and 8 in the book). PQRS is
easily googled and downloaded and available on most WUR computers.

1.3 Prior knowledge (important!)


This course builds upon and assumes knowledge from an introductory course on statistics. At the start
of this course, students are expected to be familiar with
• the idea of probability distributions and basic probability calculus for relevant families of
distributions: Normal, Binomial, F, chi-square and t, using PQRS, a calculator, or a table .
• the concept of random sampling
• graphs, e.g. box plot, histogram, Q-Q plot, bar chart, and tables with (relative) frequencies
• calculation of a mean and standard deviation from a sample
WUR-Biometris Advanced Statistics 2

• the concepts of correlation and simple linear regression


• the basic ideas of statistical inference. Inference is drawing conclusions about a population
based a limited set of observations, e.g. a random sample.

Topics in inference will be refreshed in the first week of the course, but at a fairly quick pace, while
introducing some new elements at the same time.

The first five chapters of O&L largely cover most of the prior knowledge required for this course. So,
going through these chapters may refresh your memory. Going through part 3of this introduction
should also help you to be prepared.
If you have little prior knowledge of statistics (left), and a refresher from these chapters did not
help, you are well advised to postpone this course and first follow one of the basic courses in
statistics at this university (Basic Statistics or (in Dutch) S2).

1.4 Aim of the course and learning outcomes


The aim of the course is that after successful completion of this course, students will be able to
• choose a design and decide on the sample size for an experiment, given some typical research
question, for the (fairly simple) situations discussed in the course
• in such situations choose a statistical model and carry out an appropriate analysis,
• draw appropriate conclusions, given the research question, the choice of experiment (or sampling
scheme), the data collected, and the computer output for the appropriate analysis

This course focuses upon the ideas behind the statistical methodology and on the applications.
Mathematics is kept at a minimum. The learning outcomes of the course are presented in a file on
Blackboard (LearningGoals.doc). They are also found for the various topics in the “Check yourself
boxes” in this Guide. The general learning goals and the way they are tested is given below in the
table “Assessment strategy”.

1.5 Examination
The written exam is a mix of open questions and multiple choice questions. It lasts 3 hours.
If there is proof (from attention lists) that you attended the computer practicals, obtaining 45 credits
out of 90 in the exam will result in a pass. Unrounded mark is: 1+Credits/10.

It is allowed to bring to the exam: a pocket calculator (or graphics calculator), a dictionary, the study
guide, the book of O&L, and a hand-written summary of your own making of at most two pages, to
the exam. Powerpoints and telephone are typically forbidden.

Assessment strategy (examination)


Computer
practical

Written
exam

learning outcomes \ assessed at


comprehend basic ideas of statistical inference, experimental design and data collection
for experimental and observational studies x
determine an appropriate statistical model and associated statistical inference procedure,
given the description of the experiment, the research question and the type of data x
carry out the analysis, for a given problem, usually with the help of SPSS output x x
interpret the results, and formulate conclusions, in terms of the actual problem x x
WUR-Biometris Advanced Statistics 3

2 Overview of the contents of the course


On the next two pages an overview is given of the “situations” covered in this course. By a “situation”
we mean a type of experiment or study that is performed with a particular goal in mind, which are
distinguished according to the type of response (qualitative or quantitative), the research aim, the
number of populations (observational research) or treatments involved (experimental research), or the
type of factor(s) involved (qualitative/categorical, or quantitative/numerical).

Short description of the situations discussed in the course


The simplest situation (1) is the “one-sample situation”, where we assume that the data are (or can be
regarded as) a random sample from some (large) population. In the “two-sample situation” (3),
interest is in the difference between two population means (in case of observational research) or two
treatment means (experimental research). Data are obtained from two independent random samples,
each from a different population, or a variable (the response) is measured on “units” that undergo one
of two treatments after random allocation of these “units” to the treatments. An example is the
comparison of the mean weight gain of animals for two different diets.
The situation of paired observations (2) applies when we have one sample (of plants, animals, …) and
two measurements are taken for each sampling unit, e.g. before and after a treatment, or a
measurement on the leaf from the top of a plant, and one from a leaf near the bottom of the plant. The
interest focuses on the mean of the difference between the variables.

When we are interested in more than two population means or treatments, e.g. we compare four diets,
we will often use a class of models and associated statistical methods referred to as ANOVA (analysis
of variance - situations 4, 5, 6).When e.g. diets may work out differently for male and female animals,
we have a so-called factorial structure, with diet and gender as experimental factors. In that case it is
profitable to introduce the concepts of main effects and interactions (in two-way ANOVA)

When interested in the relationship between two numerical variables, we use correlation analysis or
simple linear regression (situation 7). If interest is in, e.g., predicting the percentage lean meat of a
pig carcass from several carcass measurements taken in the slaughter line, we use a class of models
and associated statistical methods referred to as multiple linear regression (situation 8). Both analysis
of variance and linear regression are part of a wider class of models referred to as the (general) linear
model. A large part of this course will be devoted to applications of the linear model. A special case is
the situation of a qualitative factor and a quantitative factor (situation 9).

In all these cases we will use statistical models, with specific model assumptions, usually comprising
the assumption that the data are independent observations from (a) Normal distribution(s), all with the
same variation, e.g. irrespective of the treatment applied. Model assumptions have to be checked.

When data are clearly not from a Normal distribution, other models and methods of analysis are
required. We will for instance discuss analysis of binary data (situations 10, 11): an individual is
diseased or healthy, an electronic circuit is functioning or not, … etc. This actually amounts to
inference on probabilities, e.g. how do different hygienic measures at farms affect the probability for
an animal to be diseased. Analysis of categorical data, e.g. data collected in contingency tables, also
involve inference on probabilities, and will be discussed as well (situations 12, 13, 14).

In fairly simple and straightforward situations (1a, 2a, 3a, 4a), we may perform inference based on
rank numbers rather than on the actual data, to relax the assumption of Normality, when this
assumption is in doubt. Some well-known tests based on rank numbers will be discussed.

The first table below describes situations in which the observations on the response are assumed to be
random drawings from (a) Normal distribution(s). The second table describes situations in which the
Normality assumption is not used. At first reading, at the start of this course, these tables may seem
abstract and of little practical use. But later on they may help to fix the different techniques more
firmly in your mind. The tables also serve as a table of contents for this course. Make sure you
regularly consult them for an overview of the material.
WUR-Biometris Advanced Statistics 4
I Situations with data on (a) quantitative response variable(s), assuming Normal distribution(s)
Situation description / Model Parameter of interest / Questions Inference* Name of test / Type of test or procedure Lecture O&L

1 1 random sample, 1 quantitative response y Population mean μ E, CI, T one-sample t-test 1 5.3 –
Model: yi´s independent, yi ~ N(μ, σ) , i = 1,..,n Population standard deviation σ E or calculation of a CI for μ 5.7
2 1 random sample, 2 quantitative responses x and μd (= μx - μy) E, CI, T Paired sample t-test, equivalent to a one- 1 6.4, 6.6
y, paired data d= x-y, d ~ N(μd, σd), indep. di’s sample t-test for μd
 situation 1 applied to d (instead of y). σd E or calculation of a CI for μd
3 2 independent random samples (1 per population), μ1 - μ2 E, CI, T two independent samples t-test or 2 6.2, 6.6
with quantitative y, or CRD** with 2 treatments calculation of a CI for μ1 – μ2
Model: y1j and y2j are all independent σ1 and σ2 , usually a common σ is assumed E
y1j ~ N(μ1, σ1), j=1,..n1 , y2j ~ N(μ2, σ2), j=1,..n2
4 1 quantitative response y, 1 qualitative factor All means equal? T F-test for the factor / the model 8 Ch8,
(random samples from t sub-populations or CRD with μi E, CI, T t-procedure*** 9.4,
t treatments (t >2). 1-way ANOVA model: μi – μj E, CI, T t-procedure; ; for all pairs: LSD / Tukey 14.2
yij ~ N(μi, σ), i=1, .., t; j=1,.. n i or
yij = μi + εij = μ+τi+ εij , with εij’s indep N(0, σ) σ (Assumed) common st. deviation E
5 Experiment using RCBD**, b blocks, t treatments Treatment effect? Block effect ? T F-test for treatment / block 10 15.1,
(two-way ANOVA model without interaction) Treatment differences (pairs) μi.-μj. E, CI, T t-procedure; ; for all pairs: LSD / Tukey 15.2
yij = μ+τi+ βj+ εij , εij’s indep N(0, σ), i=1..t, j=1..b σ (assumed) common standard deviation E
6 2 qualitative factors (CRD with 2 experimental Any effect? / Interaction effect? T/T F-test for model / for interaction 9 14.3,
factors, or: 1 grouping factor in a population and 1 Main effect for factor A? T F-test for main effect 14.5
treatment factor, or 2 grouping factors in population) Treatment mean μij E, CI, T t-procedure
Treatment differences μij – μkl; (pairs) E, CI, T t-procedure; for all pairs: LSD / Tukey
two-way ANOVA model with interaction: Main effects μi..-μj.., …(pairs) t-procedure; for all pairs: LSD / Tukey
yijk=μij + εijk = μ+τi+ βj+ τβij+ εijk, εijk’s indep N(0, σ), σ (assumed) common standard deviation E
7 One quantitative factor; 1 data set on (x, y) β0 or β1 E, CI, T t-procedure for intercept / slope 5 Ch11
observational: 1 sample on (x, y) or Η0: β1=0 (no relationship between y and x) T F-test for the model
experimental: fixed values of x, observations on y μy or y, for a given x-value E, CI, T t-procedure
Simple Linear Regression model:
Model : yi = β0 +β1xi + εi,
εi’s independent from N(0, σ), i=1,..,n σ (assumed) common standard deviation E
8 More quantitative factors; data set on (x1, x2,,.. xk,y) βj (slope) or β0 E, CI, T t-procedure 6-7 Ch12,
observational: 1 sample, or All βj’s are zero (except β0) T F-test for the model Ch13
experimental (x-values fixed by experimenter) H0: Some specific slopes are zero T F-test for comparing full vs. reduced model
Multiple Linear Regression model: μy or y, for given x-values E, CI, T t-procedure
Model : yi = β0 +β1x1i + …+ βkxki + εi,
εi independent. from N(0, σ), i=1,..,n σ (assumed) common standard deviation E
9 qualitative factor with covariate (x) or Effect of qualitative factor? T F-procedure 11 12.7,
quantitative and qualitative factor Influence of quantitative factor? E, CI, T t-procedure, Ch16
ANCOVA model / Interaction model Lines per group E, CI, T Read from SPSS Parameter Estimates
yij = μ +τi + β1xij + εij , i=1..t, j=1..ni εij indep N(0, σ) Pairwise differences of treatment effects E, CI t-procedure; for all pairs: LSD / Tukey
yij = μ +τi + β1xij + λixij + εij i=…,j=.. εij indep N(0, σ)
* Type of Inference: E = (point) estimation, CI = confidence interval, T =testing *** t-procedure is either a t-test or calculation of a CI
** CRD and RCBD are names of experimental designs explained elsewhere
WUR-Biometris Advanced Statistics 5

Main topics in ANOVA and ANCOVA (situations 4, 5, 6, 9): model (equation and assumptions), formulating the model , interpretation of model parameters, aims of analysis,
the ANOVA and ANCOVA table, F-test for equality of means/interaction/main effects; point estimation and confidence intervals for treatment means and differences between
treatment means, checking model assumptions.
In SPSS, producing ANOVA table and F-tests, table for parameter estimates, profile plots, LSD analysis, Levene's test, residual plots.

Main topics in Regression (situation 7, 8, 9): model (equation and assumptions), formulating the model, interpretation of model parameters, aims of analysis, checking model
assumptions, R2 and R2adj. Judge how good the model is, what yardsticks to use to choose the best of several models, handling possible collinearity.
In SPSS: producing ANOVA table, table of estimated regression coefficients, saving predicted values and their standard errors / confidence bounds, saving residuals; producing
change statistic for reduced model (using two model blocks), residual plots.

II Situations where Normality is not assumed (because it does not seem to be appropriate)
Situation description Parameter(s) / Questions Inference Name / Type of test Lecture O&L

Inference based on ranks of (a) numerical, continuous, variable(s)


1a 1 random sample, 1 quantitative response Population median. H0: median = m T sign test (or: Wilcoxon signed rank test for di =yi – m) - 5.9

2a 1 random sample, quantitative responses x Systematic difference between distributions of T Wilcoxon signed rank test for di = xi – yi 3 6.5
and y, paired data x and of y?

3a 2 independent samples/ CRD with 2 Systematic difference in y between the 2 sub- T Wilcoxon rank sum test (Mann-Whitney test) 3 6.3
treatments, 1 quantitative response populations/ treatments? Shift alternative.

4a 1 quantitative response y, 1 qualitative Systematic differences in distribution of y T Kruskal-Wallis test 8 8.6


factor (random samples from t sub- between the treatments? Shift alternative.
populations or CRD with t treatments (t >2).

Inference for binary data and categorical data


10 1 random sample, binary variable X population fraction or success probability π E, CI z-procedure 3 10.2
Model: P(Xi=1) = π , P(Xi=0) = 1- π, i=1..n T Binomial test (SPSS / PQRS)
11 2 independent samples, binary variable X π1 – π2: difference in pop. fraction or success E, CI z-procedure. 3 10.3
or 2 treatments with CRD probability between sub-populat./treatments T Fisher’s exact test (SPSS / PQRS)
12 1 random sample, 1 nominal variable π1, π2, …, πk E, T Pearson’s chi-square test for goodness of fit 4 10.4
(variable with outcomes in k classes) H0: π1 = π10, π2 = π20, …, πk= πk0
13 1 random sample, 2 nominal variables πij (i =1…r; j = 1…c), E, T chi-square test for independence 4 10.5
(outcomes in contingency table with r rows probabilities in one population
and c columns) H0: πij = πi. * π.j
14 r samples, 1 nominal variable with c classes πij (i = 1…r; j = 1…c), E, T chi-square test for homogeneity 4 10.5
(outcomes in contingency table with r rows probabilities per population
and c columns) H0: π11 = π21 =…= πr1 … π1c = π2c =…= πrc
WUR-Biometris Advanced Statistics

3 Prior knowledge
This course builds upon and assumes knowledge from an introductory course on statistics. At the start of
this course, students are expected to have this knowledge. Some of the relevant topics will be refreshed
in the first week of the course, but at a fairly quick pace, while introducing some new elements at the
same time.

The first five chapters of O&L largely cover most of the prior knowledge required for this course. So,
going through these chapters may refresh your memory. If you have little prior knowledge of statistics,
and a refresher from these chapters did not help, you are well advised to postpone this course and
first follow one of the basic courses in statistics (see p. ii).

General topics students should be familiar with:


• basic probability calculus and the idea of probability distributions.
[Using the program PQRS or a calculator you should be able to answer any question of the type:
What is P(X ≥ v), what is P(X ≤ v) for a given v, and for a given probability distribution of X
and for which W is P(X ≥ W) = p, for given p, and for given distribution of X.
The relevant distribution types are: Normal, t, χ2 (chi-square) and F. ]
A file with a description of how to use a Graphics Calculator is provided on Blackboard.
• the concept of random sampling
• graphs, e.g. box plot, histogram, Q-Q plot, bar chart, and tables with (relative) frequencies
• the concepts of correlation and simple linear regression
• the basic ideas of statistical inference (see below)

Check-yourself box 1. Topics from Chapters 1, 2, 3 and 4 (important terms are in italics)

Can I

Define a random variable (RV) is and describe what a probability distribution is?
Distinguish between qualitative and quantitative variables?
Distinguish between discrete and continuous (quantitative) random variables?
Explain what a sample statistic is? Explain the (theoretical) definition and interpretation of a mean, variance, standard
deviation, median of a random variable?
Determine the right-tail and left-tail probability for a given value of a Standard Normal RV (z-value), using table 1?
Determine the z-value for a given right-tail or left-tail probability from the Standard Normal distribution, using table 1 or 2?
Carry out probability calculations for a random variable with a known distribution, with the help of PQRS?
Use the tables 1 and 2 in O&L to find tail probabilities (table 1) or quantiles (tables 1 and 2) for the standard Normal
distribution and t-distributions ?
State how to check Normality using data on a random variable?
Interpret a QQ –plot?

Calculate the sample mean, sample variance and sample standard deviation for data from a given sample?
Give the definition of an unbiased estimator?
Give the definition of the standard error of an estimator?
Explain what a (1-α)-confidence interval is for a population parameter?
Explain the meaning of the confidence coefficient (or confidence level) of a confidence interval?
Explain the difference between the terms estimator and estimate.

Explain what a test is, and enumerate the steps of a test either when using a Rejection Region or P-value (see below, p. 8).
Explain the meaning of and use in an analysis, the following terms:
null hypothesis, alternative hypothesis (research hypothesis), test statistic, null-distribution, size of a test, rejection region (or
critical region), one-sided / two-sided alternative hypothesis, P-value (left -, right- or two-sided).

Explain the terms used in experimental design: response, factors, factor levels, treatments, experimental units, measurement
units, blocks, covariate, completely randomized design (CRD) and randomized complete block design (RCBD). These
terms will be explained in week 1 of the course, so not all terms are prior knowledge.
Indicate for each of the terms to which they correspond from a description of the lay-out of an experiment.
WUR-Biometris Advanced Statistics

7
Terminology
One of the hardest parts of Statistics appears to be: terminology. As an illustration, when asking in a
group, “ what is Statistics”, one answer came, full of frustration: “It is a language”.
Much of the statistical terminology that we use in this course, e.g. in doing a statistical test, see the box
on the previous page, is supposedly known when the student comes to this course. This terminology is
used in the first lecture. Other sources for statistical terms are the Check yourself boxes.

All in all, the number of statistical terms that we use in this course is not very large. So it is really
worthwhile to invest in learning them. Go through the terminology, make a list of terms with definitions
and examples. Language (terminology) directs our thinking and mastering terminology helps our
thinking. The terms were invented to make life easier, not more difficult. This means that if you
understand the terms, it becomes easier to understand the problems to which Statistics provides answers.

Inference
Inference is drawing conclusions about a population from a limited set of observations, e.g. a random
sample. Note that this can either be a physical population (observational research) or a hypothetical
population (in case of experimental research). In inference we distinguish :

• point estimation;
• making confidence intervals; in Check-yourself-box 2 it is indicated how limits of confidence
intervals are calculated in this course
• hypothesis testing, i.e. null and alternative hypothesis, test statistic, size or significance level (α),
critical region or rejection region, P-value. The general testing procedure is given below, and
steps 2 and 3 in the test are made more specific in case we use a t-test.

Point estimation
For a given population parameter (e.g. mean maize yield in a certain region, or the difference in mean
blood pressure in a population of patients between treated and non-treated patients), how should we
estimate it, based on sample data or experimental data? By specifying a method (e.g. “we will use the
sample mean”) we define the estimator. In the two examples the intuitive estimators are: mean yield
from the sample plots, and the difference in observed mean blood pressure between the two groups. An
estimator is called unbiased if it would, on average, give the correct value, if one would repeat the
experiment “one million” times. In this course, all estimators are unbiased, except the one for a standard
deviation.
The outcome of the estimator from the experiment or sample is the estimate. Finally, the standard error
associated with this estimate is an indication of how uncertain the estimate is (how close or how far off it
may be from the true parameter). It is the standard deviation of the estimator.

The Confidence Interval (C.I.)


We only discuss two-sided confidence intervals. Bounds of confidence intervals in this course always
take the form: estimate ± (table-value) x (the standard error of the estimator),
where the table-value is a quantile from an appropriate t-distribution (Table 2 in O&L) or the standard
Normal distribution (Table 1, or bottom line of Table 2). The point estimate is the central value in the
interval (the middle of the interval). The width of the interval indicates how precisely we have estimated
the unknown parameter. A narrow interval corresponds to a precise estimate. A wide interval
corresponds to an imprecise estimate.
For situations 1 thru 9, the table-value is always a value from a t-distribution. There are four defining
elements for calculating a confidence interval (or doing a t-test):
1) the parameter of interest 2) the estimator 3) the standard error of the estimator
4) the degrees of freedom of the relevant t-distribution. Knowing these is sufficient to be able to calculate
the confidence interval limits, or do the t-test.
WUR-Biometris Advanced Statistics

8
Hypothesis testing
In Section 5.4, O&L present a statistical test as a procedure composed of five parts. In this course we will
use a procedure composed of eight steps, as shown below. This is an important topic which will come
back in nearly all lectures and practicals. Make sure to know and understand all the terms used, and
practice the correct notation in the various steps.
When asked to do a complete test, you are supposed to go through the steps listed in the table below.

Test : List of the 8 steps of a test-procedure

1. the null hypothesis H0 and the alternative hypothesis Ha


2. the definition of the test statistic (the formula of what to calculate from the data)
3. the behavior of the test statistic under H0 (null distribution)
4. the qualitative behavior of the test statistic under Ha and (from this)
5. choose the type of P-value (left, right or two-tailed) / type of R.R;
if the RR method is used, specify the RR using the value chosen for α (usually 0.05)
Only in the steps 6-8 will the data be used.

6. the outcome of the test statistic


7. the appropriate P-value (left, right, or two-tailed) and compare it to α (when using P-value)
whether the outcome is in the R.R. or not (when using the Rejection Region)
8. State that H0 is rejected or not, that Ha is proven or not, and your conclusion in words in
terms relevant for the particular problem. (Always use Ha for the conclusion in words.)

4 Possible t-tests
For a given research situation, four t-tests can be distinguished. Which one we use depends on two
things.
1. Is the alternative hypothesis 1-sided or 2-sided? The choice depends on the research question, which
should be known before the data are collected.
2. Which method do we choose? Do we use RR (Rejection Region), or PV( P-value)?
When carried out properly, both methods lead to the same conclusion. So, usually we choose the method
that is easiest to carry out. We prefer to use the P-value method, if computer output is available. We then
reject H0 if PV≤ α. Note that the relevant type of P-value (right-, left- or two-tailed) is determined in
steps 4 and 5 of the test. Using the computer output we read or derive the value of the relevant P-value.

If no computer output is available, we use the


Four T-tests method
Rejection Region. For t-tests, this requires the use of
table 2 in the book (or computer software). We reject Research Rejection P-value
H0 if the outcome of the Test Statistic is in the question Region
Rejection Region. 1-sided Ha 1-sided RR 1-tailed PV
2-sided Ha 2-sided RR 2-tailed PV

NB.
In case of a 2-sided t-test we could also use the confidence interval for the parameter of interest, if
available. We reject H0 if the H0-value of the parameter of interest is not in the Confidence Interval.
WUR-Biometris Advanced Statistics

9
Body height home-work exercise. This exercise is aimed to get used to the possible t- tests.
Suppose we investigate the mean μ of the body height (y) of Wageningen male students in 2015.
We assume that in 1985, mean body height was (exactly) 180 cm. We consider 3 different research
questions (see below: questions 1/2, 3/4, and 5/6/7), but note that in reality there is usually only one. To
answer the research question, we take a random sample of 25 male Wageningen students.
In all tests below, use α=0.05. Each time, go through the full procedure. The sample results are: 𝑦𝑦�=184,
sy=9. Using this, check manual calculations with the use of the SPSS output below, or vv .

1. Test if µ>180 with RR. 2. Test if µ > 180, using P-value


3. Test if µ<180 with RR. 4. Test if μ < 180, using P-value
5. Test if µ≠ 180 with RR. 6. Test if µ≠ 180, using P-value
7. Calculate a 2-sided 95% Confidence Interval for µ.

Expected / desired prior skills for the computer practicals

SPSS, PQRS and R.


It is assumed that you know how PQRS works. Best is to google it, download it (very easy), and practice with it in
your own time, but it will not take long before you know how it works. It is available on all WUR-computer.
There are 2 important skills, that is, for a random variable X that has some known distribution (e.g. t20, or N(25, 4),
or F420), you should be able to
1) calculate probabilities: e.g. P(X≥4), or P(X≤0), or….
2) determine quantiles: e.g. find the value V for which P(X≥V) = 0.025, or P(X≤V) = 0.05.
See page 79 (PPP 0) for a few specific exercises.

If you have never worked with SPSS before, best is to go through the SPSS short guide, page 1. You can find the
document on Blackboard under the Practical-tab. Go through p.1 while you sit with a computer on which you run
SPSS. That should set you up for working with SPSS during the practicals. In the practicals it is often indicated
how to do a certain test, or make a graph, etc., when you do it for the first time. In subsequent cases, you are then
supposed to remember this or to be able to find that information back.

You can choose to use R instead of SPSS in one exercise per week. We will not teach how to write computer-code
in R, but for six exercises we made a small program with commands that you can ‘run’, and the produce the
necessary output. Looking at the code, you may get some idea of how R works. A few generalities about R-
programs will be discussed. You can download R and R-studio for free. You are not required to know anything at
all about R at the start of the course.
WUR-Biometris Advanced Statistics

10

Check-yourself box 2. Using a t-test or making a confidence interval using a t-distribution

Can I
Mention the two general t-procedures?
Mention the four elements that define a t-procedure?
Recognize situations 1, 2, and 3 (one-sample, paired observations and two independent samples)?
Give a research example for situations 1, 2, or 3?
Mention the assumptions upon which a t-procedure is based, for situations 1, 2, and 3 ?
Specify the four defining elements of the t-procedures for situations 1, 2, and 3.

Give the general formula for the bounds of a (1 - α) confidence interval for a parameter of interest?
Apply this formula for a specific research when situations 1, 2, or 3 applies?
Find which quantile of t-distribution to use (know what the number of degrees of freedom is)?
Find this quantile in a table of the t-distribution or with a Graphics calculator, and with the use of PQRS?
Calculate the bounds of a (1 - α) confidence interval for a model parameter if relevant data are available?

Carry out a t-test, given the four defining elements, and given H0 and Ha?
Give the degrees of freedom of the t-distribution to be used in such a t-test?
Decide when to use a one-sided P-value or R.R. and when to use a two-sided P-value or R.R.?
Determine a one-sided R.R. and determine a two-sided R.R.?
Derive a one-sided P-value from the two-sided P-value that is (by default) provided by SPSS?

Some answers:
The general form of the test-statistic of a t-test on a parameter is:

parameter estimator − parametervalue under Ho


t = .
(estimated) standard error of the parameter estimator
A few examples of test statistics for a t-test
2 independent samples; response: y Multiple linear regression One-sample situation; response y
The pop. means of y are called µ1 and µ2. µy=β0+β1x1+β2x2+β3x3 The parameter of interest:: µ=µy
The parameter of interest: is µ1-µ2. The parameter of interest: is, e.g. β2. H0: µ=36
H0: µ1-µ2=0 H0: β2 = 1
y − 36
T.S. : ( y1 − y 2 ) − 0 βˆ − 1 , T.S. : t=
t= T.S. : t = 2 sy / n
1 1
sp ( + ) se( βˆ2 )
n1 n 2
(assuming equal variances) where β̂ 2 is the Least Squares
estimator of β2

The general form for the bounds of a two-sided (1 − α) confidence interval for a parameter is:

parameter estimate ± tdf(α/2) * standard error of estimator,


with tdf (α/2) a quantile from the relevant t-distribution, to be found in table 2, Ott&Longnecker.

4 Advice on how to study


A) The basis is prior knowledge on: I. Probability – see page 6 above: PQRS and tables 1, 2, 7, 8 in the book and
II. t-procedures (confidence interval and t-test). These t-procedures are always based on: 1. the population of
interest, and in this population, the parameter of interest (a mean, difference between two means, a regression
coefficient,...), 2. The estimator 3. Its standard error 4. Df = degrees of freedom (for estimating the spread)
B) A good overview of the material using p. 4/5. There is a situation-exercise-file on Blackboard.
C) Mastering the check-yourself boxes.
D) For each of the situations, using book and slides, know how the various tests work.
E) Understanding of the design aspects of experiments and observational studies
F) Know how to build and use and interpret models for regression, ANOVA and ANCOVA.

With this basis of formal knowledge, practicing old exams should help you deliver at the exam what is required.
WUR-Biometris Advanced Statistics

11
Lecture 1. Experimental Design terms; t-procedures (CI estimation and testing)

This first lecture reviews two central aspects of statistical inference: 1) design and sampling with their
terminology and 2) t-procedures, applied to situation 1 (interest in the mean of a population or the
expected value of an experimentally observed response variable).
Students are advised to spend ample time going through part 3 of the introduction (Prior Knowledge).

Experimental design and sampling and their terminology


First some essentials are discussed about the data collection procedures, in particular sampling and
design of experiments. In this course, the design of experiments will receive (much) more attention.
Sampling is associated with observational research and experimental design with experimental research.
It is important to know the terminology used in the data collection procedures, see last part of Check-
yourself box 1. Best is to learn the definitions of the various terms by heart. The relevant chapter is
chapter 2 in O&L.

t-procedures
The second part of the review contains the t-procedures: (i) determining the limits of a confidence
interval for a population characteristic or a parameter of the statistical model, (ii) t-test for a hypothesis
concerning a population characteristic or a parameter of the statistical model.
Students are supposed to be familiar with t-procedures. It is important to know both the applications in
the various situations (for week 1: situations 1, 2, and 3), and the general principles of these t-procedures.
All aspects are listed in the Prior Knowledge section, in particular Check-yourself box 2.

The statistical model for Situations 1, 2 and 3


A statistical model comprises all assumptions that are made about the observed responses (e.g.
independence, drawings from a Normal distributions, equal spread across sub-groups). When doing a t-
test in SPSS, we should realize that SPSS output on P-values and confidence intervals is based on these
assumptions. If one of the assumptions is not correct, the calculated confidence interval or P-value will
not be (exactly) correct.
In situation 1 we assume that data on the response are independent observations from a (one)
Normal distribution.
In situation 2 (paired data), with two response variables x and y, and interest is in μx – μy, then it
is assumed that the differences d between x and y are Normally distributed. So the model is (or: the
assumptions are): the di’s are independent drawings from N(μd, σd).
In situation 3 interest is in the difference between two population means, µ1 and µ2. It is often
assumed that the observations are independent random drawings from two Normal distributions, with the
same variance.

Checking model assumptions


Model assumptions have to be checked, to make sure that conclusions from the analysis are valid.
The independence assumption (which is the most important assumption) is correct in case of correct
randomization, so if a random sample is taken (situation1 and 2) or if two random samples are taken
(situation 3, observational research) or if the units used in the experiment are randomly assigned to the 2
treatments (situation 3, experimental research). Independence can therefore only be checked if one
knows the set-up of the experiment was carried out
Normality will be checked by looking at a QQ plot (normal quantile plot, O&L Section 4.14, p196) of
the observations (situation 1), of the di’s (situation 2), of the observations per sample (situation 3). The
equal variance assumption in situations 1 and 2 is automatically correct, if observations come from one
random sample (from the same population). In situation 3, the two sample standard deviations s1 and s2,
will give an impression of the validity of the assumption of equal variances for the two-sample t-test.
Another tool that is often used is a side-by-side box plot (p. 308), or scatterplot with on the x-axis 1 and
2, and on the y-axis the response. But in situation 3 we will base the decision to use or not use the
assumption of equal variances on the outcome of Levene’s test for equality of variances, as provided by
SPSS output for the two independent samples t-test. The Levene’s test will not be discussed in detail, we
will just read the P-value.
WUR-Biometris Advanced Statistics

12
The two-sample t-test in case the two populations have unequal variances
A modified two-sample t-test compares two population means (two treatments), without assuming that
the variances of the two associated Normal distributions are the same. We will not discuss this test in
detail, but simply use the SPSS computer output (see e.g. output for O&L exercise 6.60 below) in the
following way: if Levene’s test indicates (P-value <0.05) that the variances in the two populations are
different, then we use the bottom line (equal variances not assumed) in the SPSS output for the two-
independent-samples t-test.
If the Levene-test P-value > 0.05, then we use the top line (equal variances assumed)

Theory to study
O&L Sections 5.2, 5.4 up to p238, 5.6, 5.7 up to p256, 6.1, 6.2, and 6.4. For the moment, we skip the
parts that relate to the type II error, power and required sample size (to be discussed in Lecture 2).

Exercises for homework (all tests: do the 8-steps procedure)


• Body Height example: to practice 4 t-tests, see above.
• O&L Exercise 5.34, p280. Test on nicotine content
• O&L Exercise 5.44, p282 Recycled paper Use RR-method for a, use PQRS for c.
• O&L Exercise 5.70, p287 Oxygen, only b
• O&L, Exercise 6.29abd, p342 twins (acad. and non-acad. environment). In b, give 95% C.I.
• O&L, Exercise 6.57, p351 strip mining for coal. You can use output below.
P
P
S
N C S M N S M
P B 1 . . P B 1 1 . .
1 A 1 1 . .

P
9
I
S D
M S M L U t d S
P B - . . - - - 1 .

• O&L Exercise 6.59, p351 Potency of drugs


• O&L, Exercise 6.60, p352 2 mixtures for flare illumination. (no plot required)
Choose which test (equal/unequal variances), carry it out using P-value. Use SPSS output.
Extra questions: Give the Rejection Region for the 2-sided test
Show how to calculate the confidence interval limits from the other output
Independent Samples Test

Levene's Test for


Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
illumination Equal variances
1.063 .316 -4.088 18 .001 -23.4000 5.7236 -35.4249 -11.3751
assumed
Equal variances
-4.088 16.417 .001 -23.4000 5.7236 -35.5086 -11.2914
not assumed

• O&L, Exercise6.61, p352. Flare illumination Use SPSS output for Exercise 6.60
WUR-Biometris Advanced Statistics

13
Lecture 2. Sample size calculations. Wilcoxon tests.

Sample size calculations


Before we start an experiment, we have to decide how many observations will be taken. For this decision
goals have to be specified: the criteria that the resulting experiment and analysis should satisfy. We call
these criteria the precision criteria. First learning goal is that the student knows which criteria to specify.
This depends on the research aim: (1) constructing a confidence interval or (2) testing.
• If the aim is to make a confidence interval for a mean or a mean difference, the precision criteria
are: 1) the (average) width of a confidence interval or the error margin (which is half the interval
width and 2) the confidence level 1-α (usually 95%). The number of observations that is required to
obtain a (1-α) confidence interval narrower than some pre-chosen width is calculated.
• If the aim is to carry out a test, the precision criteria are: 1) significance level α, also known as the
type-I error probability (usually 5%) 2) the minimum relevant difference δ between the true value
and the H0- value of the parameter of interest and 3) the power of the test (π) or the type-II
probability error β (=1- π), for that given value of δ.

Example: Suppose that we are interested in the mean difference in response (µ1-µ2), in a situation where
a two-sample t-test applies and that the null hypothesis is H0: µ1-µ2=0, Ha:µ1-µ2>0. For instance, Ha states
that mean piglet growth is higher for a new diet (1). Suppose that the extra cost involved in changing
from the standard diet (2) to the new diet is not economically worthwhile for a difference µ1 - µ2 smaller
than δ. Then, we suppose that the true difference is δ, and calculate how many observations we need so
that we will reject H0 with a probability that is at least equal to a pre-specified value π (power, =1-β). If
µ1 - µ2 is larger than δ, the probability that in an experiment H0 will be rejected will exceed π. β is the
type II error probability: the probability not to reject a null hypothesis that is not true.
For a planned experiment, assuming σ known as well as µ1 -µ2= δ, and given α, n, we can go the other
way around, and calculate the power of the test. If it is low, e.g.it is 0.25, we may decide that the
experiment will probably be a waste of money, because even if the real difference δ is relevant, our
experiment would only show that Ha is true with 25% probability. In practice the choice often is: do not
do the experiment, or do a larger experiment.

Situations 1a, 2a and 3a (see p. 4/5)


If in situations 1, 2 or 3, the Normality of the observations is in doubt, two problems may surface. First,
the parameter of interest may change, for example from the mean income in a region to the median
income. (Mean, mode and median are the same in symmetric distributions, but not in non-symmetric
distributions.) Second, if the data are not Normally distributed, the P-value of the t-test and/or the limits
of a confidence interval for µ will no longer be exactly right. The latter problem is negligible if the
number of observations is ‘large’, but may be relevant if the sample size is ‘small’.
If the sample size(s) is/are small and the observations non-Normal (situations 1a, 2a, 3a), we can
use procedures based on rank numbers (Wilcoxon rank sum test / Wilcoxon signed rank test). In these
tests, the original observations are replaced by rank numbers. The rank procedures are called distribution
free (no distributional assumption), or non-parametric (H0/ Ha do not use specific parameters, but the
inference is about the two distributions in general). However, we do make additional assumptions. In the
two-sample case we often adopt the idea of a shift alternative: we assume that the distributions involved
for the two populations (treatments) have the same form, but may be shifted relative to each other. Note
that in situation 3, under the assumption of Normality, the assumption of equal variances also ensures
that the form of the two Normal distributions is the same. In the paired observations situation 2a we
assume that the difference d has a symmetric distribution around the median of d.
In situation 1a (and therefore 2a as well) we could also use the sign-test, based on counting plus-
signs or minus-signs for the differences between observation and median M0 under H0 (situation 1a) or
differences between x and y. The null-hypothesis about the median (M=M0) is equivalent to a null-
hypothesis about e.g. the probability of a difference above M0 which under H0 should be 0.5. This means
we can apply a binomial test, see Ch. 4, but to be discussed again in Lecture 3.
WUR-Biometris Advanced Statistics

14
In this course we will not follow the approach of O&L for the Wilcoxon tests.
In the Wilcoxon rank sum test (situation 3a) we choose the rank sum for one of the groups as test statistic
(no Normal approximation), and we use PQRS or SPSS-output to draw the conclusion. The student
should know its expected value, in case H0 is true in order to judge if the test statistic outcome is higher
or lower than expected under H0.
Similarly, for the Wilcoxon signed rank test we use either T+ or T- as test statistic, not the minimum of
the two (as on p. 320 of O&L), or the Normal approximation. We will again use SPSS output or the
relevant PQRS picture of the distribution. The student should know the expected value of T+ under H0.
We do not consider confidence intervals.

Theory to study: O&L Sections 5.3, 5.5, 6.6 (sample sizes)


6.3, 6.5 (without parts on confidence intervals, p 307, etc., and p. 322)
Erratum in the book: page 333, sample size formula 12a and 12b : the “2” should be deleted (the
formula’s on page 325 are correct).

• O&L, Exercise 5.15, p277 sample size


• O&L, Exercise 5.26, p279 sample size
• See output for O&L 6.57 in exercises for Lecture 1.1: Extra questions:
1. We want to detect a mean difference in pH before and after mining of 0.1 with a probability of
0.90 and a size α = 0.05 test. How many grids do we need?
• 2. Check the assumption of normality with an appropriate QQ-plot (see output below).
Normal Q-Q Plot of BEFORE Normal Q-Q Plot of AFTER Normal Q-Q Plot of DIFF
10.2
10.4 .1

0.0
10.3
10.1
-.1
Expected Normal Value

Expected Normal Value

Expected Normal Value


10.2

-.2
10.0

10.1
-.3

9.9 10.0 -.4


9.8 9.9 10.0 10.1 10.2 10.0 10.1 10.2 10.3 10.4 -.4 -.3 -.2 -.1 0.0 .1

Observed Value Observed Value Observed Value

• See output for O&L 6.60 in exercises for Lecture 1.1:


How many blends are needed such that the expected width of the 0.95 confidence interval for the
difference between the mixtures is 10? First derive a pooled estimate for a common variance σ2
from the data, when s1=14.65, s2=10.63, n1=n2=10
• O&L, Example 6.6, p311; oxygen before / after cleanup
carry out Wilcoxon rank sum test, with SPSS output
Test Statisticsb
Group Statistics
oxygen
Std. Error Mann-Whitney U 6.000
trt N Mean Std. Deviation Mean
Wilcoxon W 84.000
Rank of oxygen before 12 18.0000 4.30116 1.24164
Z -3.817
after 12 7.0000 4.43642 1.28068
Asymp. Sig. (2-tailed) .000
Exact Sig. [2*(1-tailed a
Ranks .000
Sig.)]
trt N Mean Rank Sum of Ranks Exact Sig. (2-tailed) .000
oxygen before 12 18.00 216.00 Exact Sig. (1-tailed) .000
after 12 7.00 84.00 Point Probability .000
Total 24
a. Not corrected for ties.
b. Grouping Variable: trt
WUR-Biometris Advanced Statistics

15
• O&L, Example 6.5, p307 reaction time vs alcohol / placebo

Extra questions (see output):


The QQ-plot displays observations from which the treatment mean is subtracted (thus creating
residuals). Is the Normality assumption reasonable?
Carry out the Wilcoxon Rank Sum test using the SPSS output.

• O&L, Exercise 6.60, p352 flare illumination


Suppose that the researcher is not sure about the Normality assumption and decides to perform an
analysis based on ranks.
Argue whether this is a situation with two independent samples or with paired data.
Select the appropriate output and test (α=0.05) whether systematic differences exist between the two
mixtures with respect to the mean flare-illumination value.
Ra nks Test Statistics b
N Mean Rank Sum of Ranks
Flare_Mix2 -
Flare_Mix 2 - Flare_Mix1 Negat ive Rank s 1a 2.00 2.00
Flare_Mix1
Positive Rank s 9b 5.89 53.00
Z -2.599 a
Ties 0c
Total 10
As ymp. Sig. (2-tailed) .009
a. Flare_Mix 2 < Flare_Mix1 a. Based on negative ranks .
b. Flare_Mix 2 > Flare_Mix1 b. Wilcoxon Signed Ranks Test
c. Flare_Mix 2 = Flare_Mix1
Test Statistics b

Flare
Mann-Whitney U 9.000
Wilcoxon W 64.000
Z -3.102
Mann-Whitney Test As ymp. Sig. (2-tailed) .002
Ranks Exact Sig. [2*(1-tailed a
.001
Sig.)]
Mixture N Mean Rank Sum of Ranks Exact Sig. (2-tailed) .001
Flare 1.00 10 6.40 64.00 Exact Sig. (1-tailed) .000
2.00 10 14.60 146.00 Point Probability .000
Total 20 a. Not corrected for ties.
b. Grouping Variable: Mixture
WUR-Biometris Advanced Statistics

16
• Exercise 6.29c academic vs non-academic twins. Extra question:
Test H0: the difference in score between the two persons in one twin has a symmetric distribution
around zero. Use SPSS output.

• Extra exercise and SPPS output for small and large plant species on Dutch ‘kwelders’
“Kwelders” are pieces of land outside the sea dikes that were formed through sedimentation of clay
from the sea. They abound on the Wadden islands (north of The Netherlands). It appears that on
older kwelders, protected from the sea by natural sand dunes, small plant species tend to disappear
through competition with larger species. A possible intervention, that may increase plant diversity
again, is the introduction of grazing cows (J.H. van Wijnen, PhD thesis, RuG, 1999).
To investigate the effect of grazing, an experiment was carried out. From a list of old kwelders,
dominated by large plant species, ten were randomly chosen and cows grazed on these kwelders for
four years. Ten other such kwelders, also randomly selected, were not grazed. At the end of four
years an index of biodiversity, sensitive to small plants and small plant species (response y, on a 0 to
2000 scale) was measured and analyzed with SPSS.

Ranks
Group Statistics
Treatment N Mean Rank Sum of Ranks
Std. Error
Treatment N Mean Std. Deviation Mean y grazed 10 12.90 129.00
y grazed 10 750.200 117.0772 37.0231 not grazed 10 8.10 81.00
not grazed 10 655.500 67.4706 21.3361 Total 20

Inde pende nt S a m ple s Te s t


Mann-Whitney Test
L e ve n e 's Te s t fo r
E q u a lity o f Va ria n c e s t-te s t fo r E q u a lity o f M e a ns Test Statistics b

y
Me a n S td . E rro r
Mann-Whitney U 26.000
F Sig . t df S ig . (2 -ta ile d ) D iffe re n c e D iffe re n c e Wilcoxon W 81.000
y E q u a l va ria n ce s Z -1.814
3 .4 1 8 .0 8 1 2 .2 1 6 18 .0 4 0 9 4 .7 0 00 4 2 .7 3 10
as s um ed
As ymp. Sig. (2-tailed) .070
E q u a l va ria n ce s
2 .2 1 6 1 4 .3 8 4 .0 4 3 9 4 .7 0 00 4 2 .7 3 10 Exact Sig. [2*(1-tailed a
n o t as s u m e d .075
Sig.)]
Exact Sig. (2-tailed) .075
a. Assuming Normally distributed observations, test if the mean plant diversity Exact Sig. (1-tailed) .038
for grazed “kwelders” is higher than for non-grazed kwelders. Point Probability .006
b. Test if mean plant diversity for grazed “kwelders” is systematically higher a. Not corrected for ties .
than for non-grazed “kwelders”, with a minimum set of model assumptions. b. Grouping Variable: Treatment
c. Compare the two P-values and conclusions. Is the result surprising?
WUR-Biometris Advanced Statistics

17
Lecture 3 Inference about one population proportion
Inference about the difference between two proportions or probabilities

Inference about a population proportion; the binomial distribution (situation 10)


For situation 10 the student should know three aspects: construct a confidence interval, do a test, and
calculate the required sample size if the aim is to derive a confidence interval for a proportion in a
physical population (observational research) or hypothetical population (experimental research).

We will discuss the analysis of a binary response, i.e. a response that can only take two possible values:
often denoted by 1 and 0, or “true” and “false” or “success” and “failure”. For instance, we may
randomly select an individual from a population and establish whether that individual is diseased (x = 1)
or healthy (x = 0). The expected value of the response x (or long term mean) is the probability, say π,
that we will draw a diseased individual if we randomly draw one from the population. This probability is
the proportion of people in the population that are diseased. In formula: E(x) = μx = π. The variance of x
(square of the standard deviation) is Var(x) = σ2x = π(1-π).

When we take a random sample of n individuals (the sampling units) the total number of successes y will
follow a Binomial distribution with parameters n (the number of observed units) and π (the individual
success probability). This summary statistic y is used in the binomial test for one proportion. The
theoretical mean and variance of variable y are:
E(y) = μy = nπ,
Var(y) = σ2y = nπ(1-π)

The PQRS image presents the


Binomial(20, 0.3) distribution. It
shows the ‘less than’ -, the 'equal
to'- and ‘larger than’-
probabilities for the outcome y=9.
These 3 probabilities add up to 1.

Note that there is no separate parameter for the variance: if we know π, then both the mean and the
variance of x are known. The sample mean response 𝑥𝑥̅ = (𝑥𝑥1 + ⋯ + 𝑥𝑥𝑛𝑛 )/𝑛𝑛 is the sample proportion of
successes y/n and can be used for inference about 𝜋𝜋.

The binomial distribution is discussed in Ch. 4 of O&L. The binomial test is discussed in class. Here we
give an example.

Example of a binomial test: test if the fraction of cows with walking problems is higher than 0.3, using a
random sample with size n=20, assuming that 9 cows in the sample have walking difficulty. The steps of
the test are as follows.
1. H0: π=0.3,. Ha : π>0.3 2. TS: y = number of cows that walk with difficulty in the sample,
3. If H0 is true, y~Binom (20, 0.3) 4. Under Ha y tends to larger values, so we use RPV.
5) Reject H0 if RPV≤0.05. 6. Outcome TS: y=9. 7. RPV= PH0(y≥9) = 0.0654+0.048=0.1134 > 0.05,
so H0 is not rejected, Ha is not proven, it is not shown that more than 30% of the cows in the population
walk with difficulty. In other words, although it seems that the fraction is larger than 0.3 (9/20=0.45) the
evidence is not strong enough to consider Ha proven.

Note 1: LPV= PH0(y≤9) = 0.0654 + 0.8867 = 0.9521, so here LPV+RPV≠1, because this distribution is
discrete, not continuous.
Note 2: two-tailed PV = 2xRPV = 2x .1134=.2268. if we apply the simple principle for 2-tailed PV:
2*min(LPV, RPV) (with max of 1), but his definition may not be appropriate for cases like this
with a non-symmetric distribution.
WUR-Biometris Advanced Statistics

18

Comparing two population proportions: confidence interval and Fisher’s exact test (situation 11)
Suppose that we want to compare the probabilities π1 and π2 to be diseased for individuals that are not
vaccinated and individuals that are vaccinated. We collect a random sample of size n1 of non-vaccinated
and a random sample of size n2 of vaccinated individuals, and count the number of diseased individuals
y1 and y2 in each sample. For inference about the unknown π1 and π2 we use the sample proportions
y1 / n1 and y2 / n2. O&L explain how a confidence interval for π1 – π2 can be constructed. This is also
explained in class. The z-test explained by O&L is skipped in this course.
For testing equality of two proportions we will only discuss Fisher’s exact test. In contrast with O&L, we
will not use complex calculations to obtain a P-value (see p. 512), but we will use either SPPS output or
a PQRS picture of the relevant Hypergeometric distribution. O&L does not explain that the distribution
from which we can calculate the P-value is a Hypergeometric distribution, which is the distribution used
for the so-called ‘Vase model’.

The Vase model describes the following situations: N balls are placed in a vase, K of these are white, N-
K are red. We draw n balls (without replacement) from the vase, so N-n stay in the vase. The number of
white balls in the sample, X, has a Hypergeometric distribution: X ~Hypergeometric (N, K, n).
Likewise, for the number V of white balls that stay in the vase: V ~ Hypergeometric (N, K, N-n), etc.

Example of Fisher’s exact test. Suppose, in the above example, we want to test the null hypothesis
1) H0: π1 – π2 = 0 versus the alternative hypothesis Ha : π1 – π2 > 0. (It is expected that non- vaccinated
individuals are diseased more often.) Also suppose, that n1 =12 and n2 = 10, and that for both groups 7
individuals are not diseased. See the tables below for the summary of the data.

Diseased In sample? Illustration of how the


experimental data fit into
Yes No Total Yes No Total
the Vase model
Non-vaccinated 5 7 12 White 5 7 12
NB: color and presence
Vaccinated 3 7 10 Red 3 7 10 in the sample could have
Total 8 14 22 Total 8 14 22 been swapped.

2) we choose as the test statistic (for example): X = nr of vaccinated cows that are diseased.
3) Under H0, X ~ Hypergeometric (22, 10, 8). 4) Under Ha X tends to lower values, so 5) use LPV.
6) Outcome: X= 3 , so 7) LPV = 0.1563 + 0.2972 > 0.05 so H0 is not rejected, Ha is not proven. It is not
shown that vaccinated cows are diseased less frequently than non-vaccinated cows.
NB: we could have used X=number of vaccinated non-diseased cows, the Hypergeometric (22, 14, 10)
distribution (see first PQRS picture), and RPV. The P-value would be exactly the same.

In practice
People who work a lot with probabilities, e.g. in risk analysis or horse races gambling, do not usually
𝜋𝜋 /(1−𝜋𝜋1 )
consider differences between probabilities, but rather the ratio π1/π2 or the odds ratio 1 . These are
/(1−𝜋𝜋 )
𝜋𝜋2 2
not discussed in this course.
WUR-Biometris Advanced Statistics

19
Theory to study
Review: O&L Section 4.8 up to p. 165. O&L Section 4.13, p191-193 (skip the continuity correction).
O&L p. 499 – 502, line 7, 504, line 28 – 505, line 6 (or: 10.1, 10.2, but skip WAC interval and z-test).
Study guide and lectures: material on Binomial test.
O&L section 10.3 up to the end of p. 509, Fisher’s exact test on p511, without P-value calculation. Study
guide and lectures: material on Fisher’s exact test.

Exercises for homework


• O&L, Exercise 10.4, p546
• O&L, Exercise 10.6a, p547 NB: consider what a good definition is of P(false positive)
• O&L, Exercise 10.6b, p547 using the Confidence interval from 10.6 a.
• O&L, Exercise 10.10bc, p548 (part b: so π is at most 0.2)
• O&L, Exercise 10.12, p548 π= Prob. of finding a spider. Test Ha: π≠ 0.1, binomial test with PQRS.
Repeat the test if the PhD student (b) found 4 spiders (c) found 8 spiders
• O&L, Exercise 10.14c, p548
• O&L, Exercise 10.15ab, p549

Extra question A seed producer guarantees that the


emergence rate of the seeds is at least 0.8. You buy a
package of 20 seeds, and only 12 emerge. Using a test with
α=0.05, can you prove the statement to be false?
See PQRS picture

• O&L, Exercise 10.20c, p550.


• O&L, Exercise 10.21cd, p550, for c: see PQRS picture
• O&L, Exercise 10.22b, p550
• O&L, Exercise 10.24bc, p551. Use data in the table
below.

Note on PQRS:
Any of the four cell values could be used for the test. In
PQRS, however, only for the parameters (150, 75, 56) will
the Hypergeometric distribution give results. If the parameter values are too large, the following message
is given:

10.24 data Killed all insects?


yes no total
• O&L, Exercise 10.24b, p550. This time suppose that the
NewFormula 49 26 75
NewFormula killed all spider ants in 53 out of 75 containers.
Again use PQRS to find the P-value for the one-sided test. AntKiller 45 30 75
Total 94 56 150
WUR-Biometris Advanced Statistics

20

COMPUTER PRACTICALS
Introduction:
Each lecture is followed one day later by a computer practical, in which you learn how to do in practice the
analyses discussed in class, that is, on the computer. In the computer practicals you will use software programs
SPSS, PQRS and R. You work in pairs (teams of two) during the computer practicals.
The text of the computer practical exercises should speak for themselves. At the first practical there will be a short
introduction, but from then on, you can work through the exercises on your own.

Writing answers
Writing answers helps you to learn how to formulate, and may reveal errors in your thinking. A computer program
does not give answers, it just generates output that you ask it to give. But you have to give answers to the questions.
That is why you have to write answers to the questions in a readable way in a separate note-book or sheet (not in
the study guide). Write it so that practical teacher / assistant can easily see whether your answer is correct.

Preparation required for every practical


It is important to come prepared, or you will not be able to finish the exercises in time.
Preparation means: 1) study the material discussed in the previous lecture.
2) read the introductions to the exercises of the computer practical and specify which
situation (see p. 4/5) applies for that exercise. This is often sub-question a.

SPSS, PQRS and R.


It is assumed that you know how PQRS works. Best is to google it, download it (very easy), and practice with it in
your own time, but it will not take long before you know how it works. It is available on all WUR-computers.
If you have never worked with SPSS before, best is to go through the SPSS short guide, page 1. You can
find the document on Blackboard under the Practical-tab. Go through p.1 while you sit with a computer on which
you run SPSS. That should set you up for working with SPSS during the practicals. In the practicals it is often
indicated how to do a certain test, or make a graph, etc., when you do it for the first time. In subsequent cases, you
are then supposed to remember this or to be able to find that information back.
You can choose to use R in stead of SPSS in one exercise per week. We will not teach how to write
computer-code in R, but for six exercises we made a small program with commands that you can ‘run’, and the
produce the necessary output. Looking at the code, you may get some idea of how R works. A few generalities
about R-programs will be discussed.

Preparation before/at Practical 1.


0. Best is to do this before the first practical, so at the practical you will be all set to go.
1. Create a folder on your M-drive, e.g. “M:\AdvStat\Practical”. The local hard-drive is emptied every day, so do
not place the files there.
2. “Open” the data zip-file from the MAT20306 Blackboard, under the tab “Practical (CP and PPP)”.
3. Then “extract” “all files” into the folder that you created. Working with the data from the zipped file may/will
create problems, e.g. SPSS or R stops for no apparent reason.
4. The files are: data files for SPSS (.sav), Excel files (.xlsx), data files for R (.prn), and script files for R (.r)

Getting started with R-studio (for those who choose to do one exercise per week in R)
1. Open R-studio. In the right hand side of the screen (in the middle) you will see the following:

a) Choose the Files tab.


b) Then click on the 3 dots on the very right side. A screen resembling Windows Explorer appears. Select the
folder you created in the Preparation described above (“M:\AdvStats\Practical”).
c) Click on More (see picture above) and select Set As Working Directory. Now you will see that on the left in
the “console” that R has recognized a command and executed it. “setwd(“M:/AdvStats/Practical”). If you type
in the console getwd() (get the working directory), R will tell you what the current working directory is.
2. Now in the folder on the right search the file: ASPractical1.R. Click on it, and it will open in the top left part of
the screen. Text preceded by ## gives explanation and comment. The other lines contain commands that can be
executed one by one, or as a group, by selecting the line(s) and choosing Run or pressing Ctrl-R,
or Ctrl-Enter.
WUR-Biometris Advanced Statistics

21
Computer practical 1 t-procedures (Answers should be written down in your notebook.)

AIM: Learn how to use SPSS to carry out t-tests and to make confidence intervals for situations 1, 2, 3, The
directions for how to do things are given in the SPSS short Guide, chapters 5 and 5a.
1. Dissolved oxygen (5.43) Data are in “Oxygen.sav”
In a river the amount of dissolved oxygen is observed. It is feared that the level is too low due to dumping of
pollutants by a sewage treatment plant (plant = factory). Over a 2-month period, 8 times a small bucket of water
was taken from a river at a location 1 mile downstream for which the amount of dissolved oxygen was determined
in parts per million (ppm). The data (yi, i=1,..,8) are bucket 1 2 3 4 5 6 7 8
in the table. Note that in SPSS they are given in one Oxygen (ppm) 5.1 4.9 5.6 4.2 4.8 4.5 5.3 5.2
(vertical) column, not in a (horizontal) row.

a This is a case of “situation 1”, see p. 4/5. This means that we measured one variable in 1 sample of 8 buckets,
so in the data set we can expect one (vertical) column of 8 measurements. Open the data file in SPSS (File >
Open > Data) and check that the data set looks as was expected. [Note that in real life we might add columns for
e.g. location, date and time of the observation]
b Obtain the sample mean (𝑦𝑦�) and the sample standard deviation (sy), using Analyze > Descriptive > Explore. Read
the 95% confidence interval for µ = mean dissolved oxygen level during the 2-month period for the location.
c Get these outcomes again (𝑦𝑦�, s, and the 0.95-confidence interval for µ) using Analyze > Compare Means > One-
Sample T Test... This should result in the output given below.
Write down the formula for the confidence lower limit in numbers: 4.5737 = 4.95 – …… *……..
One-Sample Statistics

Std. Error
N Mean Std. Deviati on Mean
Oxygen 8 4.9500 .45040 .15924

One-S am ple Test

Test V alue = 0
95% Confi denc e
Int erval of t he
Mean Di fference
t df Si g. (2-tail ed) Di fference Lower Upper
Ox ygen 31.085 7 .000 4.95000 4.5735 5.3265

d Fish can only survive if the mean dissolved oxygen level is at least 5.0 parts per million (ppm). A law suit
against the sewage treatment plant will be started if it is proven that the mean oxygen level is less than 5 ppm.
Test this research hypothesis: write down: 1) the null- hypothesis and alternative hypothesis , 2) the test-
statistic, 3) distribution of the test-statistic, if H0 is true 4) the outcome of the test-statistic, 5) the relevant (left,
right or two-tailed) p-value, and 6) give the conclusion, also in words. Check the p-value using PQRS.
[You can obtain the correct output by filling in the value “5” in the box for ‘test-value’ in the menu.
Answer: t=-0.31 and the relevant p-value is 0.38]

e The output obtained in question c (see tables above) shows a t-value of 31.1 and a 2-tailed p-value of 0.000.
What is the (nonsense) null-hypothesis and what is the alternative hypothesis that is tested with this outcome?
Answer: We can test H0: ....... =.......... vs. Ha: .................................

2. New and old alloy steel beams Data are in “Steel_beams.sav”.


A new alloy is proposed for the manufacture of steel beams. A study is designed to compare the strength of the new
alloy to the currently used alloy. Ten beams of each type of alloy are manufactured. The load capacities (y, in tons)
for the 20 beams are determined and are given here. It is assumed that the observations can be regarded as
independent random drawings from two Normal populations with the same standard deviations.

a What are the experimental units? Which situation is applicable here (see p. 4/5)? Take a look at the data also
using View > Value labels. The data are displayed in two ways, as 20 independent observations (20 rows) and as
10 pairs. Which one is correct?
b1 Is the mean load capacity of the new alloy different from the mean load capacity of the old alloy (α = 0.05)?
Go through the steps1 through 5 of the test. Then obtain the necessary SPSS output using: Analyze > Compare
Means > … T Test...]. Now finish the test by giving the outcome of the test statistic, the p-value of your test and
WUR-Biometris Advanced Statistics

22
the conclusion. (Hint 1: in the SPSS menu use the columns with 20 values for strength and for group. Hint 2: .
After entering the Grouping Variable, click Define Groups, and enter the values for the 2 groups as in the data
set. Here these values are 1 and 2. )
b2 Also give the 95% confidence interval for the mean difference in strength between the two alloys.
c The beams produced from the new alloy are more expensive than the beams produced from the currently used
alloy. Thus, the new alloy will be used only if the mean load capacity is more than 5 tons greater than the mean
load capacity of the currently used alloy. Based on this information, would you recommend that the company
use the new alloy ? What is H0 and Ha? In your formulation, use the parameter of interest:
H0: ................. = ....., Ha: ................. > .............
d Look at the result of Levene’s test. What is H0 in Levene’s test? What is the conclusion of the test?

Levene’s test is a test for testing the hypothesis H0: σ12 = σ22. A low p-value (sig.) indicates that the variances
differ. Levene’s test can also be used to compare more than two variances. The test will be explained in week 4.
For the case that the two variances are different, SPSS also provides the outcome of t' (notation of the book) and
the number of degrees of freedom that is being used in the so-called Satterthwaite-approximation (Equal variances
not assumed). This output is in the 2nd line of the table.

3. Reading time and comprehension Data are in “Reading time.sav”


A new program for individual reading is evaluated in the fourth grade at a very large elementary school. Half of the
pupils use the old program, the other half use the new program. From each group a random sample of pupils is
taken. At the end of the year, all sampled pupils read a standardized passage, and answer a set of questions.
Reading times (minutes) and comprehension scores (0-100) are given in the data file.

a Open the data file in SPSS and look at the way the data are organized.
What are the experimental units in this research (the units receiving a treatment)? What types of response
variables (binary, nominal, numerical discrete, numerical continuous) are used?

b First there is interest in the mean time required to read the passage. Test if the New reading program leads to
shorter required reading time at α=0.05. Write the outcome of the Test Statistic, the relevant distribution to
compare it with, the relevant P-value and the conclusion.
c Make a QQ-plot for the reading time per group (Analyze-Descriptives- Explore, use
factor = Group, and in the Plots-menu check the box “Normality plot with test”
see picture). Could the observations come from a Normal distribution?
d Give a 95% CI for the difference in mean comprehension. Argue why a
difference in mean comprehension is shown / is not shown?

4. Checking the assumption of Normality


3 data files: “Normal and Exponential data n=….sav”
If a t-procedure is applied, the calculated confidence interval, and the calculated P-
value is only exactly correct if the data come from a Normal distribution. This
assumption can be checked using the data, e.g. through a Q-Q plot. We will
demonstrate that this is only useful if the number of observations (n) is large enough,
e.g. n≥20. We will use 3 data sets, each with 10 simulated variables, five from a N(20,5) and five from an
Exponential(0.05) distribution.

Open the file with n=50. In Explore, choose all 10 variables in the dependent list, and no factor; then under display
choose Plots. Under Plots check the Normality plots with tests box; under Boxplots, choose Dependents together,
and do not check other plots. (Continue - OK)

In the output you will see three things of interest. 1. P-values for two tests on Normality are shown in the second
table in the output. Are the conclusions from the Shapiro-Wilk test correct all 10 times? 2. A QQ plot for each
variable: check that the first five show a pattern in line with Normality, the rest does not. 3. At the bottom: a side-
by-side Boxplot for the ten variables. You can clearly see the difference, e.g. in symmetry of the distributions.
Repeat the process for n=8 and n=20. Check that for n=8 it is much more difficult / to distinguish the Normal data
from the Exponential data.

[PS. To obtain random Normal data, use: Transform-Compute


WUR-Biometris Advanced Statistics

23
Computer practical 2 t-procedures and non-parametric tests
(Please write down your answers)
AIM: Learn how to use SPSS to carry out tests for situations 1a, 2a, 3a. The directions for how to do things are
given in the SPSS short Guide, chapters 5 and 5a.
1. Change in pH after mining Data are in “Coal mining.sav”
After mining for coal, the mining company is required to restore the land to its condition prior to mining. One of
the many factors that is considered is the pH of the soil. (The pH is important for the types of plants that will
survive). The area was divided into over 1000 grids before the mining took place. Fifteen grids were randomly
selected and the soil pH was measured before mining. When the mining was completed, the land was restored and
on the same 15 grids pH was measured again.

a What are the sampling units? How many units are there? Does it correspond to the number of rows in the data
set? (General Rule: Number of rows in SPSS data = Number of independent experimental/sampling units).
Which situation applies here: situation 2 or 3?
b Produce output for the t-test for H0: "no mean difference in pH before and after" against the alternative that
there is a difference.
b1 Carry out (α = 0.05) the appropriate t-test: give test statistic, and, using SPSS output, outcome of test-statistic,
p-value and conclusion.
b2 Give the confidence interval for the difference in mean pH before and after. Explain how the CI confirms or
does not confirm the conclusion drawn under b1.
b3 Generate the difference d between after and before using
Transform-Compute. Use the 1-sample t-test to arrive at the
same conclusion as in b1
c Which graph(s) is (are) useful to find out if the t-test is valid? Make this/these graph(s). Conclusion?
d Suppose you would want to do a new investigation with the aim to test for an increase in mean pH with
α=0.05. If the true increase is 0.06, you want the power of the test to be at least 0.8. Write the sample size
formula (end of chapter 6), and calculate the required sample size.
e Test if the median of the difference in pH between is zero, using a non-parametric test. Click on Analyze >
Nonparametric Tests > …...(make the appropriate choices)
After obtaining the output, double-click on it in order to obtain more details.
Write down the (definition of the) test statistic in the test, the outcome of that test-statistic and the p-value. Is
the conclusion in line with your answer in question b1?

2. Plant density after oil spill (p 292 and 326) Data are in ”oil spill.sav”
On January 7, 1992, an underground oil pipeline broke and caused the contamination of a marsh along the Chiltipin
Creek in Texas, USA. The cleanup process consisted of burning the contaminated regions in the marsh. To evaluate
the influence of the oil spill on the mean flora density (μ), researchers studied plant growth 1 year after the burning.
They measured flora density in 40 randomly selected sites in the uncontaminated (1=control) region and in 40
randomly selected sites in the contaminated (2=burned) region.
Independent Samples Test

Levene's Test for


Equality of
Variances t-test for Equality of Means
90% Confidence
Interval of the
Sig. Mean Std. Error Difference
F Sig. t df (2-tailed) Difference Difference Lower Upper
plant density Equal variances
5.209 .025 3.821 78 .000 11.55 3.023 6.518 16.582
assumed
Equal variances
3.821 64.103 .000 11.55 3.023 6.505 16.595
not assumed

a Open the spreadsheet. Take a quick look at the data also using View > Value labels. Is the structure of the dataset
correct for the case of 2 independent samples? (yes / no; if no, why not?)
b Get the SPSS output for the 2-samples t-test. Inspect the summary statistics for the response variable in both
groups in the table Group Statistics. Compare the values with those on p327.
WUR-Biometris Advanced Statistics

24
c1 For the outcome of the test statistic and the p-value: do you use the top line of the table or the bottom line?
Why? Does it matter in this case?
c2 Test H0 : µ1 = µ2 against H1: µ1 > µ2 at α = 0.05, using the output that you generated under b. Give the
outcome of the test-statistic, the p-value for the one-sided problem and the conclusion of the test (in words).
d Determine the 0.95 confidence interval for µ1 – µ2 . Which conclusion follows for the test with Ha: µ1-µ2≠ 0?
e We now wish to apply a nonparametric test. Ha: the plant density after oil spill is systematically lower than in
similar, but uncontaminated, areas. Click on Analyze > Nonparametric Tests > …...(make the appropriate
choices). Note that the Mann-Whitney test is the equivalent of Wilcoxon’s rank sum test.
After obtaining the output, double-click on it in order to obtain more details.
Write down the (definition of the) test statistic of the Rank sum test, the outcome of that test-statistic and the p-
value. Is the conclusion in line with your answer in question c?

3. Price discrimination Data are in “Car repair costs.sav”.


A study was conducted to determine whether automobile repair charges are higher for female than for male
customers. Twenty car repair shops were randomly selected from the telephone book. Two cars of the same age,
brand and engine problem were used in the study. For each repair shop, the two cars were randomly assigned to a
man and a woman participant and then taken to the shop for an estimate of repair cost. These estimated repair costs
(in dollars) are given in the data file.
a What are the units in this study? Are they sampling units or experimental units? Which situation applies here?
Is the lay-out of the data set in line with your answers?
b1 Have SPSS calculate (Transform – Compute) the variable d, the difference in costs between Female and Male
customers. Which procedure, t or Wilcoxon is more appropriate in this situation? Why ?
b2 Are repair costs generally higher for female customers than for male customers ? Use α = 0.05.
Give the outcome of the test-statistic, the relevant distribution to which the outcome is compared, the relevant
p-value (L/R/2-tailed?) and the conclusion in words.
c1 The time that it takes before the repair is done is also observed. We want to test if there is a difference in repair
time. Again, check which test (t-test or a non-parametric test) is most appropriate
c2 Carry out the most appropriate test. Give outcome of the test statistic. What is the expected value of the test
statistic if there is no systematic difference in repair time for female and male customers? Give the relevant p-
value, and the conclusion.
c3 Also carry out the less appropriate test. Give p-value, and compare with the p-value found in c2.
WUR-Biometris Advanced Statistics

25
Computer practical 3 Binary data
AIM 1. Learn how to use SPSS / PQRS for doing inference on one proportion (Binomial test), and two
proportions (Fisher’s exact test). 2. Practice making a confidence interval for one proportion and the
difference between two proportion by hand.

1. Unfair inspection of sports cars? Data are in “Sports car inspection.sav”


Sports car owners complain that the state Arizona inspects their cars more strictly than family type cars. Previous
records indicate that 70% of all family type non-sports cars pass the inspection in Arizona. (This is taken as a fixed
known number.) Now it appears that in a random sample of 150 sports cars in Arizona, 90 pass the inspection,
and 60 fail. The question is if the population fraction of passes for sports cars (π) is lower than 0.7. We define the
variable pass for a car as 1, if the car passes the inspection. For a car that does not pass the inspection, pass= 0.
In the data set it seems as if there are only 2 observations.

a1 What are the sampling units in this investigation? How many are there? Use Analyze-Descriptive statistics-
Descriptives for the variable pass. The mean is 0.5, and n=2. Choose Data- Weight Cases. After weighing cases
by Freq, try Descriptives again What is n this time? What does the mean (0.6) represent in practical terms?
a2 Give estimator (formula) and estimate for π.
b1 (binomial test using PQRS) We want to test H0: π=0.7 vs. Ha: π<0.7 is the number of ‘successes’ in the sample.
Write down steps 2, 3, 4 and 5 of the test. Give outcome of the test statistic and use PQRS to find the exact p-
value. See PQRS graph below.
Check that you get the same result in PQRS with p=0.3, and outcome 60 (the number of cars that fail the test.)
b2 (binomial test SPSS – approximate p-value) Produce output for the binomial test in SPSS: Analyze >
Nonparametric Tests > One sample. Under Settings choose Binomial and under Options Test Proportion 0.70 and
Success values 1. Double-click the output-table to get more details. This also helps to check the output. See
output at the bottom below.
b3 (binomial test SPSS - exact p-value) This time use: Analyze > Nonparametric Tests > Legacy Dialogs > ……
Choose 0.7 as test proportion and click on Exact, etc. In the output check the Observed vs. Test proportion.
This should be 0.6 vs 0.7. [If it is 0.4 vs. 0.7, you should change 0.7 into 0.3 in the menu choices.] Double
click the table and the exact 1-tailed p-value and check it is 0.0057. See Binomial test output below.

SPSS gives an asymptotic p-value (0.005) based on the z-test that we do not discuss in the course.
It can also give (b3) the exact one-tailed p-value. The exact p-value based on the Binomial test is 0.0057.
c Use the formula in the book for the approximate 0.95 confidence interval for π and calculate the limits using
e.g. Excel. An exact 0.95 confidence interval is given above. This so-called Clopper-Pearson confidence
interval is not discussed in the course
WUR-Biometris Advanced Statistics

26
2. Comparison of two probabilities. Data file: “Instruction and exam result.sav”
An educational researcher wants to compare teaching English using a computer software
program to the traditional classroom system. She thinks the computer aided method will be
better than the traditional one, that is, this method will result in a higher fraction of students
passing a test.
The researcher randomly assigns 60 students from a class of 100 to instruction using the
computer. The remaining 40 students are instructed using the traditional method. At the end of a 6-week period, all
100 students are given an exam with the results (1=pass, 0=fail) in the table.

a What are the experimental units in this study?


Open the data set. How many rows does it have? Is that in line
with the number of experimental units? (No!)
Choose Data- Weight Cases. After weighing cases by Count, and
again choosing Data- Weight Cases, you should see this picture.
By weighing the cases using “Count”, SPSS understands that
there are not just 4 experimental units, but 100 (16+44+17+23).
[If you see the Current Status as in this picture, Click “Cancel”,
otherwise choose: Weight cases by Count. ]
b1 Fisher’s exact test. First write the observed numbers of observations in a the nij Result
contingency table on the right.
Instruction Pass Fail Total
Then define an appropriate test-statistic for that test and give the outcome.
Computer
Write these down. Note: there are 4 possible correct definitions and outcomes.
Traditional
What is the behavior of the test statistic, if the researcher’s idea is correct?
Total 100
(Conclusion of the test will follow in b2 and b3.)
b2 Use PQRS: Open PQRS and choose Hypergeometric distribution with N=100, N1=Number of students with
one of the instruction methods, n=number of students with a certain result (either fail or pass). Then fill up the
outcome and read the appropriate p-value. Is the educational researcher right in what she thinks about the two
methods (α=0.05)?

b3 Use SPSS. Descriptive Stats > Cross tables; choose Statistics and check the Chi-Square box. Then Read the 1-
tailed p-value for Fisher’s exact test. Check that this p-value is in line with what you found using PQRS.
c1 Calculate the 0.95 confidence interval (z-procedure) for the difference in fraction of students passing the exam
between the two methods of instruction using calculator or Excel.
c2 Use the two-independent-samples t-test to compare the Instruction methods. Read the 0.95 confidence interval
for the difference in fraction of students passing the exam between the two methods of instruction. Check that
this interval is similar to the one you calculated in c1.
WUR-Biometris Advanced Statistics

27
3. The Power of a test file “Simulation Power 2 samples.xlsx
A new alloy is proposed for the manufacture of steel beams (Practical 1.3). The following procedure is carried out.
For both types of alloy, ten beams are randomly selected from the produced beams. Their strengths are measured,
the data are entered in SPSS and finally a t-test is done to conclude if the new alloy gives stronger beams.

Now we can ask how good this procedure is. For example, if the real mean difference between the two alloys is 2
(e.g. 26 vs. 24), will the test give a significant result? This question can be answered if we also know what the
spread (the standard deviation) is between the beams for each alloy.
Another, more general question is: which number of beams should we choose?

Open the file “Simulation Power 2 samples.xlsx”. In this file it is simulated that 200 experiments are carried out. In
this way we get an insight in what would happen if we repeat an experiment many times. To be able to simulate the
data we need to know what the means in the populations are and what the standard deviation is for both
populations. We assume here that the standard deviations are equal. We also have to specify the sample size. Then
we simulate what happens if we repeat the procedure 200 times. We can then see how many times out of 200,
H0:µ1- µ2 =0 is rejected vs Ha: μ1-μ2>0. The relative frequency of rejecting H0 is a simulated value of the power of
the test. You can repeat the calculations by pressing F9 (Calculate now).

a1 Choose n = 10. Find out what the power of the t-test is if the real mean difference between the two alloys is
equal to 1 and if the standard deviation is 2.
Finish (in your notebook or on your note-sheet) the following statement:
Assuming σ = 2, α=0.05, if the real mean difference in strength between the two alloys is 1, in an experiment
with 10 beams for each treatment we will reject H0 of equality of the two mean strengths with probability .....
In other words, the power of this test-procedure is……………..

So, assuming σ=2, if a real mean difference of 1 would be relevant, is this a good experiment?

a2 The power of the test is often displayed as a function of µ1-µ2 1 2 3 4 6 10


the true mean difference between the two treatments or
populations. Find the simulated values of the power, in Power
case the true mean difference is 2, 3, 4, 6, 10 (still
assuming that n=10, α =0.05, and σ=2).

b We now vary the standard deviation σ (values: 1, 2, 3, and 4). We see σ 1 2 3 4


how the power changes. Fill up the adjacent table, assuming µ1-µ2 = 2.
Is it correct to say that the power depends on σ and µ1-µ2 through their Power
ratio σ/(µ1-µ2)?

c In a and b it is demonstrated that the power of a test depends on the true mean difference and on the spread in
the population. These values cannot be varied by the researcher. The only thing that the researcher can change
is the sample size, n.
To decide on n, one has to have an idea of the standard deviation σ. This information has to come from
previous research (own research or published research). Let us assume that σ=2.
For µ1-µ2 we usually choose for the minimum relevant difference. So we ask ourselves: what is the smallest
µ1-µ2 that would be relevant in practice. For example, the difference in mean yield of 1 kg/ha (8000 vs. 8001
kg/ha) is not relevant, a difference of 1000 would be very relevant. Somewhere in between these two values
one needs to pick the minimum difference that is still relevant.

In our example, let us choose 1.5 to be the minimum relevant difference. If we require a power of 0.8 for the
one-sided test, what would be the required sample size? Try various values for n, and choose the sample size
that fulfils your precision requirements. Note that the
simulated power can vary somewhat. To get a somewhat n .. .. .. .. .. ..
more precise value of the power, you can press F9 a few Power
times.

d Compare the outcome of c with the outcome of the formula (p. 332) for the sample size (for a 1-sided Ha).
WUR-Biometris Advanced Statistics

28
4. Marker for greening in potato Data are in: “Marker for greening.sav”
In an experiment 151 potato cultivars are used to find associations of all kinds of traits
with genetic markers. One such genetic marker is binary: it has levels A and B. One of
the investigated traits is greening: the phenomenon that the potato tuber turns green if it is
exposed to sunlight. Out of 151cultivars 96 have marker value A (we call them A-
cultivars), out of which 63 have the greening trait, and 33 don’t. Of the 55 B-cultivars,
26 have the greening trait, 29 don’t have the greening trait.
a Open the data file. What are the units in this study? Are they randomly selected? How many units do we
have? Use Weight cases to feed this information to SPSS. [Note: after using the variable Freq in this way, one
should use it no more in any analysis. ]
b Give a 95% confidence interval for the fraction of greening cultivars for the A-cultivar and the B-cultivar
You can do that in one command in Explore using marker as factor. Do the two intervals overlap? Is overlap
indicative of a difference that is not significant? [Note: you can also do this with Data- Split File by Marker
followed by one-sample T-test. This gives output separate for each Marker level.]
c1 Test if the fraction of A-cultivars with the greening trait is the same as for B-cultivars. Use the exact test of
Fisher as follows: Descriptive Stats > Cross tables; choose Statistics and check the Chi-Square box ).
c2 Use PQRS and the hypergeometric (N=151, N1=96, n=62)-distribution to find the p-value of Fisher’s exact
test, when the outcome is 33. Check that the p-value for outcome 29 in the Hypergeometric(151, 62, 55) is the
same. See also the pictures of this distribution in the study guide, Lecture 3.
Note: using e.g. hypergeometric (N=151, N1=62, n=96)
should give the same result but because the numbers are so
large you will for that case receive an error-message.
d Give an approximate 95% confidence interval for the difference in fractions of greening cultivars between A
and B (using the two-independent samples t-test SPSS Guide 5.3). How is the conclusion of c confirmed?
WUR-Biometris Advanced Statistics

29

Pen and Paper Practicals

Aim of the PPP’s is to help digest the material offered. The form is such that it closely resembles the
set-up of the exam. The PPP meetings on Friday are meant to work at these exercises. You can alos
(partly) prepare them at home and ask questions about parts that are not clear, or you can ask the
the teachers to check your answer.

PPP week 0

These are home-exercise for probabilities and quantiles with tables 1 and 2.) Make a sketch (with a
few numbers on the x-axis) of the relevant distribution.
In all cases, you can use PQR or your Graphics calculator to check the answers.
A. Use table 1. Calculate the following probabilities using table 1 or your calculator. X ~N(0,1)
P(X>1.5) P(0<X<1.5) P(-1<X≤1.5)

B. Suppose Y ~ N(30, 10). Use the z=transformation z=(Y-30)/10 and table 1 to calculate:
P(Y>45) P(20 < Y < 40)

C. For which V is: P(X>V) = 0.2, if X ~ N(0,1)?


For which W is: P(X<W) = 0.4?

D. In a t-test:
1) under H0 t ~
t20. Outcome is 1.3. RPV=0.104. Give LPV, and 2-tailed PV.
2) under H0 t ~
t 10. Outcome is -0.7, 2-tailed PV is 0.5. Give LPV and RPV.
3) under H0 t ~
t 1945. Outcome of t is positive, 2-tailed PV is 0.09. Give LPV and RPV.
4) under H0 t ~
t 15. LPV is 0.968. Give RPV and 2-tailed PV. What is the outcome
approximately? Use table 2 in O&L.
5) under H0 t ~ t 15. Give the Rejection Region for the 2-sided test, α=0.05.
5) under H0 t ~ t 15. Give the Rejection Region for the left-sided test, α=0.05.

E. Use PQRS to answer the following questions


1) Suppose X1 ~ F(3, 24). What is P(X1> 2) and P(X1(>4) and P(X1≥4) ?
2) Suppose X2 ~χ23 (chi-square distribution with df=3) : what is P(X2>2), P(X2>5.5) ?
3) Suppose X3 ~ Binomial(24, 0.3). What is P(X3=3.5), P(X3> 7.2), P(X3>4), P(X3≥4)?
WUR-Biometris Advanced Statistics

30
PPP week 1
In all tests, if α is not mentioned, use α =0.05

1. We take a random sample of 4 maize plants and measure N-content in the leaf. Observations for
y are: 12, 7, 8, 5. We assume y ~ N (μ, σ).
A. Calculate sy and 𝑦𝑦�;
B. Give a 0.95 Confidence Interval (CI) for µ, the population mean of y.
C. Test with the CI if µ could be equal to 10, or not. Give H0 and Ha, also give the conclusion and
the argument for the conclusion.
D. Test with the 8 steps, Rejection Region method, the research hypothesis that μ differs from 10.
E. Which sample size is needed to make a .95 CI for μ with a Error Margin of 2.5? Assume σ=3.

2. On a field 16 maize plants are randomly selected. For each plant the N-content in the stem is
determined. The research hypothesis is that the mean of all plants in the field is higher than 5.
To test this hypothesis we will use a t-test.
A. What are the four defining elements for this t-test? (The 4 elements are: Parameter of interest,
estimator, standard error of the estimator, and degrees of freedom for the relevant t-
distribution.)
From the data, we find: 𝑦𝑦� =6 and sy=2.
B. Go through the 8 steps of a test using a Rejection Region.
C. Write the 8 steps of the test again, but now with the p-value method. Find P-value with PQRS.
D. Suppose you want to do another experiment to test if the mean is more than 5, so a one-sided
test. Which sample size is needed, to achieve the following precision. If the real mean is 6 (or
more), then the power of the test should be at least 90% (while α=0.05).

3. In India in a region where irrigation is often applied, 18 farms with irrigation are randomly
selected and so are 18 farms where no irrigation is applied, 36 farms in total. For a one-acre plot
it is measured on each farm what the labour input has been for the plot on each farm.
A. Question is if, on average, irrigated farms use more labour for maize production then farms
where there is no irrigation. Use the output below to test this hypothesis. Write down all the
steps.

B. What are the four defining elements in the test mentioned in question A?
C. To make a 95% confidence interval for mean difference in labour use with a width of at most 4,
what sample size would be needed per group?
D. Which analysis would you do if we had used 18 farms with each one irrigated and one non-
irrigated one-acre plot, and per farm measurements were done on 1 non-irrigated plot and 1
irrigated plot?
WUR-Biometris Advanced Statistics

31
4. In the previous exercise suppose we had only 2x4=8 farms, with outcomes
non-irrig 12, 21, 17, 19 irrigated 25, 28, 20, 27
Use a non-parametric test at α =0.10 to see if labour use on irrigated fields is systematically
larger than on non-irrigated fields. Go through all the steps. Use PQRS to get the p-value.

5. In another “investigation”, 5 farms are used where


labour use is measured on a 1-acre field with Farm 1 2 3 4 5
irrigation and on a 1-acre field without irrigation. Irrigated 27 21 27 22 26
Suppose the results are as follows: Non-irrig 22 18 19 23 16

A. Use a non-parametric test to test if labour use on irrigated fields is systematically larger than on
non-irrigated fields. Go through all the steps. Use PQRS to get the p-value.
B. Now assume that a t-procedure is appropriate. Which assumptions should then be
(approximately) valid? What are the four defining elements of the t-procedures?
C. Carry out the t-test, using the Rejection Region method. Do your own calculations. Check that
the relevant standard error is 1.92.
D. Make a 0.95 CI for mean difference in labour use between irrigated and non-irrigated 1-acre
fields.

6. Four 75+ couples are randomly selected from an


elderly home. They are asked to get up from their
Couple 1 2 3 4
own chair. For husband and wife of one couple the ym 6 18 11 7
same chair is used. Between couples this varies. The yf 9 14 20 2
number of seconds this takes (y = duration) is measured. We test if there is a difference in
mean duration of getting up between men and women of over 75 years old (in that home). Data
are given below. Of the two SPSS output tables from different analyses, only one should be
used.

A. Which situation applies ? (see p. 4/5 of the study guide).


B. What are the 4 defining elements of the t-procedure in this case?
C. Read the outcome of the test statistic.
D. Give relevant p-value and state your conclusion.

The same data are used, but now we assume that the Normality assumption is violated.
E. what is the situation (see p. 4/5 of the study guide).
F. Calculate the outcome of the appropriate test statistic.
G. If available, use PQRS to derive the two-sided p-value.
WUR-Biometris Advanced Statistics

32
PPP week 2 (in all tests use α=0.05)

In all tests, if α is not mentioned, use α =0.05.


When doing a test, make a sketch (with a few numbers on the x-axis) of the relevant distribution. It
will help you to visualize what kind of values to expect for the Test Statistic in the case that H0 is
true. To check the distribution, you can use PQRS or your Graphics calculator.

1. We are interested in the proportion of


students that smoke in a big university. Two
random samples, one of male and one of female
students, are drawn, each of size 100. Of the
female students 34 smoke, of the male students
25.

Part I
A. We first focus on the fraction of smokers among female students. Which situation applies?
B. Give a .95 Confidence interval for the fraction smokers among female students.
C. Test (two-sided) if the fraction of smokers among female students is 0.3 using the result of B.
D. As C, but now with an exact test. Write down all the steps of the test. Use PQRS (if available) or
your graphics calculator to find the relevant p-value.
E. Check with the SPSS output above that your result is correct.

Part II
F. Test with an exact test if the fractions of smokers for male and female students are the same.
Give outcome of the chosen test-statistic. Use PQRS to find the relevant p-value, and give your
conclusion.

G. Calculate the standard error of the difference between the two fractions.
H. Which situation applies for questions F and G?

2. In the US the 3 most common cancers according to their relative frequency are: breast (25%),
lung (21%), and colon (16%) of all cancer patients. The rest forms 38% of this patient group. (We
suppose that these numbers are exact and we ignore that a cancer patient can have more than one
cancer.) In the Netherlands 200 cancer patients are randomly selected. It is tested if the relative
frequencies in the Netherlands are the same as in the US.

Type Breast Lung Colon Other Total


A. Which situation applies? Number 40 80 40 40 200
B. Define the Test Statistic and calculate it.
C. Give the null-distribution of the test statistic and the critical value, and give your conclusion.

We also obtained data from Canada and Type Breast Lung Colon Other Total
Sweden. Canada 20 40 40 50 150
Sweden 40 40 40 40 160
D. Are the differences in observed relative
frequencies for the 3 countries (NL, Sweden,
Canada) significant? Give the situation, the full
name of the test, and carry out the test using
the SPSS output given below.
E. For Canada-Breast and Sweden-Lung: calculate
the expected frequencies used in D.
F. The SPSS output says: 0% have expected count less than 5. Is that good? Explain!
WUR-Biometris Advanced Statistics

33
3. Correlation calculations
Data on (x, y) for 4 French students are given. They Unit 1 2 3 4
represent the mark for Statistics that they scored in x 2 5 8 1
Wageningen, and the one they scored for an earlier y 12 14 17 5
course in France (where the maximum score is 20).
x y x-𝑥𝑥̅ y-𝑦𝑦� (x-𝑥𝑥̅ )2 (y-𝑦𝑦�)2 (x-𝑥𝑥̅ )⋅
y 20 (y-𝑦𝑦�)
15 2 12
10 5 14
5 8 17
0 1 5
0 x 10 sum 0 0 Sxx= Syy= Sxy=
average -- -- -- --

A. Show by calculating it, that the sample correlation coefficient between x and y is rxy = 0.89.
B. Why is not meaningful to compare the averages of x and y.
C. The sample correlation is high; is it also significant? Show that t=2.74, but that the Rejection
Region for α = 0.05 is |t| > 4.303.
D. Without redoing the calculations, but just looking at the data/graph: what is the Spearman rank
correlation coefficient?

4. Correlation and internet


A. Explain the joke on the right.

B. Which lesson can be learnt from the


scatterplot below: correlation coefficients
for situations with a wide range and a
narrow range of X-values.

13

10
C.
Strong positive
7
wide r = 0.92 Correlation (!?)
narrow r = 0.59 Can you give an
4 example of such data?
8 12 16 20

5. Passing the exam


Students from 4 different very large programs at a University do an exam. From each program we
obain the results for 40 randomly chosen students. Numbers of passed students:
Program 1: 26, Program 2: 20, Program 3: 18, Program 4: 28.
A. We want to test if the proportions of students that pass the exam are different for the 4
programs. Which test should we do?
B. Set up the calculations for the Chi-square test: make tables with observed and expected values.
Check that the outcome of Chi-square is: 6.957
C. What is the relevant distribution, the rejection region and the question D
conclusion of the test?
D. To check the most extreme case, we could test if Programs 3
and 4 have different proportions, using Fisher exact test. Why
would this be methodologically wrong?
[As an aside: Which would be the 3 parameters for the
hypergeometric distribution? ]

You might also like