0% found this document useful (0 votes)
30 views30 pages

01 - Introduction

STAT 7010 is a course on Experimental Statistics II at Auburn University, focusing on the design and analysis of experiments using SAS software. The course covers various experimental designs, statistical models, and methodologies essential for conducting experiments and analyzing data. Key principles include replication, randomization, and blocking to ensure valid experimental results.

Uploaded by

souji Apple
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
30 views30 pages

01 - Introduction

STAT 7010 is a course on Experimental Statistics II at Auburn University, focusing on the design and analysis of experiments using SAS software. The course covers various experimental designs, statistical models, and methodologies essential for conducting experiments and analyzing data. Key principles include replication, randomization, and blocking to ensure valid experimental results.

Uploaded by

souji Apple
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

STAT 7010: Experimental Statistics II

1. Introduction

Peng Zeng

Department of Mathematics and Statistics


Auburn University

Spring 2024

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 1 / 30
Outline

1 Course information

2 Introduction
Experiments
Statistics in experiments

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 2 / 30
Course information

Course information

Prerequisite: STAT 7000


two sample t-test, one-way ANOVA, linear regression
Software: SAS
Contents: design and analysis of experiments
simple comparative experiments (two treatments, t-test)
one factor (multiple treatments, one-way ANOVA)
two or more factors (factorial designs, ANOVA)
blocking factors (block designs, latin squares, ANOVA)
more advanced experiments and models (nested designs,
split-plot designs, random-effect models, mixed-effect models)
Level of difficulty: focus on application and interpretation.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 3 / 30
Course information

About Software
Throughout this course, all data analysis are conducted using SAS.
powerful functions, relatively easy to use
many sample codes provided in this course

You can use other software in homework, exam and project.


Minitab, SPSS, R, python, . . .
Excel is not counted as a statistical software.

Why not R?
May produce wrong answers.
Limited support for models with random effects.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 4 / 30
Course information

Checklist for SAS Knowledge


Create a SAS dataset
using data step (input and datalines)
from an external file (csv files, txt files)
Data steps
define new variables
create a subset
if/then statement
Proc steps
proc print
proc sort
proc means
proc sgplot
A brief tutorial on SAS is provided for those new to the software.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 5 / 30
Course information

Organization of Lecture Notes

I will try my best to organize the lecture notes into modules.

A typical module contains


objective
example
statistical model
SAS code

Pay attention to experimental designs and the associated statistical


models. Make sure that you understand how to analyze similar data
using SAS.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 6 / 30
Introduction Experiments

Introduction to experiments and role of Statistics

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 7 / 30
Introduction Experiments

Experimentation and Experiment


Experimentation is an indispensable tool in scientific and engineering
research to explore a particular process or system.

controllable factors
x1 x2 · · · xp

inputs output
process or system
y

z1 z2 · · · zp
uncontrollable factors

Experiment is a test or series of tests in which purposeful changes are


made to the input variables of a process or system, so that we may
observe and identify the reasons for changes that may be observed in
the output response.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 8 / 30
Introduction Experiments

Example: Quenching
Quenching is the rapid cooling of a workpiece in water, oil or air to
obtain certain material properties.

Output: hardness of alloy


Controllable factors: quenching medium, time, etc
Uncontrollable factors: operator, environment, etc

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 9 / 30
Introduction Experiments

Objectives of Experiments
Variable screening: Which x’s affect the response y?
Treatment comparison: Does the value of y differ at different
levels of x?
Response surface exploration: Where to set x’s so that y is near
the desired nominal value?
System optimization. Where to set x’s so that y is maximized?
System robustness. Where to set x’s so that the variability of y
is minimized? Where to set x’s so that the effects of the
uncontrollable variables z’s are minimized?

Example: The objective of the quenching experiment is to determine


which quenching medium (oil or saltwater) produces the maximum
hardness for a particular alloy.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 10 / 30
Introduction Experiments

Treatment Factor and Treatment

Treatment factors: of particular interest to the experimenter


Treatment: a level of a treatment factor in a single factor
experiment, or a treatment combination when there are two or
more treatment factors.

Example: The quenching experiment has one treatment factor


(quenching medium) with two levels (oil or saltwater).

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 11 / 30
Introduction Experiments

Experimental Units

Experimental units are the material to which the levels of the


treatment factor(s) are applied. For example,
in agriculture these would be individual plots of land,
in medicine they would be human or animal subjects,
in industry they might be batches of raw material, factory
workers, etc.
Experimental units should be representative of the material and
conditions to which the conclusions of the experiment will be applied.

Example: In the quenching experiment, an experimental unit is one


piece of alloy to be tested.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 12 / 30
Introduction Experiments

Nuisance Factors
A nuisance factor is a variable that is not of primary interest but can
affect the outcome of the experiment.
blocking factors: a group of relatively homogenous experimental
units or conditions.
plots of land, location, or times of day
human subjects with similar characteristics (gender, age group)
materials made in the same factory
covariate: a property of the experimental units that can be
measured before the experiment takes place.
the blood pressure of a patient in a medical experiment
the IQ of a student in an educational experiment
the acidity of a plot of land in an agricultural experiment
Example: In the quenching experiment, blocking factors may include
operator or batch of alloy, and covariates can be the concentrations
of certain ingredients in the alloy.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 13 / 30
Introduction Experiments

A Checklist for Planning Experiments


1 Define the objective of the experiment.
2 Identify all sources of variation, including:
1 treatment factors and their levels
2 experimental units
3 nuisance factors (blocking factors, noise factors, and covariates)
3 Choose a rule for assigning the experimental units to the
treatments.
4 Specify the measurements to be made, the experimental
procedure, and the anticipated difficulties.
5 Run a pilot experiment.
6 Specify the model.
7 Outline the analysis.
8 Calculate the number of observations that need to be taken.
9 Review the above decisions. Revise, if necessary.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 14 / 30
Introduction Statistics in experiments

Why Statistics?

When the problem involves data that are subject to experimental


errors, statistical methodology is the only objective approach to
analysis.

Two aspects to any experimental problem


design of the experiment
statistical analysis of the data

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 15 / 30
Introduction Statistics in experiments

A Brief History of Experimental Design


Agricultural era began in the 1920s and early 1930.
R.A. Fisher, Rothamsted Agricultural Experimental Station, UK.
Systematically introduced statistical thinking and principles into
designing experimental investigations.
Three fundamental principles of experimental design.
Factorial design and analysis and variance (ANOVA).
Focus on treatment comparison.
Industrial era began in the 1950s.
George Box and collaborators.
Response surface methodology. central composite designs.
optimal designs.
Chemical and the process industries.
Focus on process modeling and optimization.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 16 / 30
Introduction Statistics in experiments

Quality improvement era began in the late 1970s.


Genichi Taguchi advocated robust parameter design.
Quality improvement and variation reduction.
Although Taguchi’s engineering concepts and objectives are
well-founded, there were substantial problems with his
experimental strategy and methods of data analysis.
Current state of experimental design
Popular outside of statistics and an indispensable tool in many
scientific and engineering researches
New challenges:
Large and complex experiments, e.g., screening design in
pharmaceutical industry, experimental design in biotechnology
Computer experiments: efficient ways to model complex systems
based computer simulation (e.g. weather and climate model)
A/B test in marketing or web development

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 17 / 30
Introduction Statistics in experiments

Typing Efficiency Experiment

Compare the typing efficiency of two keyboards denoted by A and B.


One typist uses the keyboards on six different manuscripts (1–6).
Which design is the best?
manuscript
design 1 2 3 4 5 6
I A B
II A B A B A B
III AB AB AB AB AB AB
IV AB BA AB BA AB BA

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 18 / 30
Introduction Statistics in experiments

Three Fundamental Principles

Three fundamental principles


Replication, Randomization, and Blocking

Replication
Each treatment is applied to a number of units that are
representative of the population (of units).
Enable the estimation of experimental error.
Increase the power to detect important effects.
Replication vs Repetition (or repeated measurements).

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 19 / 30
Introduction Statistics in experiments

Randomization

Randomization
Allocation (of units to treatments) and run order need to be
randomized.
Protect against latent variables and subjective biases.
Ensure the validity of experimental error estimation.
Ensure the validity of statistical inferences (iid data).
Complete randomization makes possible to derive and perform
randomization distribution and randomization test

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 20 / 30
Introduction Statistics in experiments

Blocking

Blocking
A block refers to a group of homogeneous units.
Used to reduce or eliminate the variability transmitted from
nuisance factors.
Within-block variation need to be much smaller than
between-block variation.
Trade off between variation and the degrees of freedom.

Block what you can and randomize what you cannot.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 21 / 30
Introduction Statistics in experiments

Some Standard Experimental Designs

An experimental design is a rule that determines the assignment of


experimental units to treatments.

There are four basic designs structure


completely randomized design
completely randomized block design
split-plot design
strip-plot design
Most complex design structures are combinations of these four.

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 22 / 30
Introduction Statistics in experiments

Example: Cupcakes
Professor Z wants to evaluate the volume of cupcakes after baking.
Suppose that he is interested in the following factors
recipes τ : three levels (A, B, C)
temperature β: two levels (a, b)
It is desired to have 3 replicates (1, 2, 3).

a b
1 2 3 1 2 3
A 81 70 80 68 76 73
B 73 77 68 59 62 59
C 73 82 61 61 64 75

Understand the procedure of data collection before analyzing data,


because how to analyze the data depends on the experimental design.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 23 / 30
Introduction Statistics in experiments

Cupcakes: Completely Randomized Design

In a completely randomized design, the experimenter assigns the


experimental units to the treatments completely at random, subject
only to the number of observations to be taken on each treatment.

Get a random sequence of 18 runs,

Ba, Cb, Aa, Ab, Ba, Ca, . . .

Follow the above order, and for each run, prepare cake mix using the
given recipe, fill a cupcake, and bake it at the given temperature
prepare cake mix 18 times (one for each run)
bake cupcake 18 times (one cupcake in one oven)

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 24 / 30
Introduction Statistics in experiments

Completely Randomized Design


The statistical model is

volume ∼ recipe + temperature

The 3 × 2 × 3 = 18 experimental runs are completed randomized.

Aa Aa Aa Ab Ab Ab

Ba Ba Ba Bb Bb Bb

Ca Ca Ca Cb Cb Cb

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 25 / 30
Introduction Statistics in experiments

Cupcake: Completely Randomized Block Design


In a completely randomized block design, the experimenter partitions
the experimental units into blocks, determines the allocation of
treatments to blocks, and assigns the experimental units within each
block to treatments completely at random.

Professor Z run the experiment in three days (blocking factor).


In each day, generate a random sequence of 6 treatment
combinations (different sequence in each day)
for each run, prepare cake mix using the given recipe, fill a
cupcake, and bake it at the given temperature
Notice that:
prepare cake mix 18 times (one for each run)
bake cupcakes 18 times (one cupcake in one oven)
one complete replicate in each day
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 26 / 30
Introduction Statistics in experiments

Completely Randomized Block Design


The statistical model is

volume ∼ recipe + temperature + replicate

The 3 × 2 × 3 = 18 experimental runs are randomized within each


block. Each block contains a complete set of treatment combinations.
D1 D2 D3

Aa Ab Aa Ab Aa Ab

Ba Bb Ba Bb Ba Bb

Ca Cb Ca Cb Ca Cb

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 27 / 30
Introduction Statistics in experiments

Cupcake: Split-Plot Design


The preparation of cake mix is randomized as in the completely
randomized design. However, when baking cupcakes, three cupcakes
are baked together. The order of baking is randomized.
prepare cake mix 18 times
bake cupcake 6 times (3 cupcakes in one oven)

Aa Ab Aa Ab Aa Ab

Ba Bb Ba Bb Ba Bb

Ca Cb Ca Cb Ca Cb

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 28 / 30
Introduction Statistics in experiments

Cupcake: Strip-plot Design


Professor Z wants to further save time. Once he prepares cake mix
from a given recipe, he fills two cupcakes, and each of which is baked
at one temperature level.
prepare cake mix 9 times (one cake mix fills 2 cupcakes)
bake cupcakes 6 times (3 cupcakes in one oven)

Aa Ab Aa Ab Aa Ab

Ba Bb Ba Bb Ba Bb

Ca Cb Ca Cb Ca Cb

Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 29 / 30
Introduction Statistics in experiments

Using Statistics in Experimentation


Talk with a statistician before collecting data, not after.
Use your nonstatistical knowledge of the problem.
Keep the design and analysis as simple as possible.
Experiments are usually iterative.
Recognize the difference between practical and statistical
significance.

Example. An engineer may determine that a modification to an


automobile fuel injection system may produce a true mean
improvement in gasoline mileage of 0.1 mile/gal and be able to
determine that this is a statistically significant result.
However, if the cost of the modification is $1000, the 0.1 mile/gal
difference is probably too small to be of any practical value.
Peng Zeng (Auburn University) STAT 7010 – Lecture Notes Spring 2024 30 / 30

You might also like