0% fanden dieses Dokument nützlich (0 Abstimmungen)
49 Ansichten154 Seiten

Statistik

statistik für biologen

Hochgeladen von

Fabio Morellini
Copyright
© © All Rights Reserved
Wir nehmen die Rechte an Inhalten ernst. Wenn Sie vermuten, dass dies Ihr Inhalt ist, beanspruchen Sie ihn hier.
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen
0% fanden dieses Dokument nützlich (0 Abstimmungen)
49 Ansichten154 Seiten

Statistik

statistik für biologen

Hochgeladen von

Fabio Morellini
Copyright
© © All Rights Reserved
Wir nehmen die Rechte an Inhalten ernst. Wenn Sie vermuten, dass dies Ihr Inhalt ist, beanspruchen Sie ihn hier.
Verfügbare Formate
Als PDF, TXT herunterladen oder online auf Scribd lesen

INTRODUCTION TO

STATISTICS

Fabio Morellini
Behavioral Biology Group, ZMNH
INTRODUCTION TO
STATISTICS

Fabio Morellini
Behavioral Biology Group, ZMNH
AND NOT BIOSTATISTIC DEPARTMENT!
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• Multifactorial analysis

• Designing an experiment
• Type I and type II errors, power

• Statistical software
• GraphPad Prism
• (Statistica)
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• Multifactorial analysis

• Designing an experiment
• Type I and type II errors, power

• Statistical software
• GraphPad Prism
• (Statistica)
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• Multifactorial analysis

• Designing an experiment
• Type I and type II errors, power

• Statistical software
• GraphPad Prism
• (Statistica)
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• Multifactorial analysis

• Designing an experiment
• Type I and type II errors, power

• Statistical software
• GraphPad Prism
• (Statistica)
I do not need statistics
I use statistics when I have data to analyze
I costantly use statistics for my research
I know what these terms indicate: 9:30 10:55
Nominal, Ordinal, Interval Ratio measurements
Median
Percentile
Parametric
Null hypothesis
Mann-Whitney test
„P“ value
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
WHAT IS STATISTICS?

WHAT IS STATISTICS FOR?


"The secret language of statistics, so appealing
in a fact-minded culture, is employed to
sensationalize, inflate, confuse, and
oversimplify“.
"There are three kinds of lies: lies,
damned lies, and statistics.„
Benjamin Disraeli (1804 - 1881)
MISTRUST IN STATISTICS

"Zwei Männer sitzen im Wirtshaus. Der eine verdrückt eine


ganze Kalbshaxe, der andere trinkt zwei Maß Bier.
Statistisch gesehen ist das für jeden ein Maß Bier und eine
halbe Haxe - aber der eine hat sich überfressen, der andere
ist besoffen.„ Franz Josef Strauß

„Two men sit in a tavern. One eats a whole knacle of veal, the
other drinks two liters of beer. From a stitistical point of
view, each had a liter of beer and a half knacle- but one has
oveareaten and the other is totally drunk. „ Franz Josef Strauß
„Fuhr vor einigen Jahren noch jeder zehnte
Autofahrer zu schnell, so ist es mittlerweile heute 'nur
noch' jeder fünfte. “

„Few years ago, every tenth driver drove too fast; in


the meantime it is today only every fifth.“
„Use of estrogen supplements increases the risk of
breast cancer by 30% in menopausal women.“
„Use of estrogen supplements increases the risk of
breast cancer by 30% in menopausal women“.
But: the absolute increase is from 2 to 2.6% (+0.6%)
Statistical methods are scientific and
with proper education anyone should
be able to recognize the good
statisticians from the charlatans.
Jessica Rabbit
STATISTICS

I‘m not bad I‘m just misuseD


WHAT IS STATISTICS?

WHAT IS STATISTICS FOR?


STATISTICS

Statistics is a mathematical science pertaining to


the collection, analysis and presentation of data.
SCIENTIFIC METHOD
SCIENTIFIC METHOD
SCIENTIFIC METHOD
STATISTICS

Statistics is a mathematical science pertaining to


the collection, analysis and presentation of data.

✓Collecting data
✓Summarize the data
✓Represent the data in meaningful ways
✓Determine whether our data show a pattern different
from chance or not
✓Interpretation of the data
STATISTICS provides conventional expressions for an
international audience

Enhance the degree of objectivity in the analysis

Reproducible

The procedure is arguable


STATISTICS provides conventional expressions for an
international audience

Enhance the degree of objectivity in the analysis

Reproducible

The procedure is arguable

BASIC TOOL IN EXPERIMENTAL SCIENCE


Science is not about truth, it's about
testability

Scientific theories are models of the


real world (or parts of it)

The vocabulary of science concerns


the models rather than reality.
sampling
sampling

inference
Even if you can't find a source
of demonstrable bias, allow
yourself some degree of
skepticism about the results
as long as there is a possibility
of bias somewhere.
There always is.
SCIENTIFIC (inductive) METHOD
1. Observe some aspects of the universe, "free from bias„

2. Ask the right question

3. Create alternative hypotheses that are consistent with your


empirically described observations and could answer your question.

4. Based on the hypotheses, make predictions.

5. Test the predictions by experiments or further observations.

6. If only one hypothesis is acceped by the results, put it into


(falsifiable) theory and…

7. Publish your findings in a peer reviewed journal

8. Consider criticisms offered, and revise your theory

9. Go to step 3.
1. Satistics gives no information about the real
word.
2. Statistics „only“ describes, summarizes and
eventually interpret observations
3. Statistics is as good (or as bad) as the
observations
4. Statistics is useful only if correctly used and
described
Which of the following are part of statistics?

O Numerical calculations

O Graphs

O Interpretations and decisions based on numbers


and graphs
Which of the following are part of statistics?

O Numerical calculations

O Graphs

O Interpretations and decisions based on numbers


and graphs
You ear in a commercial that 80% of dentists suggests
to use a certain kind of toothpaste.
Your conclusion is:

O According to the majority of the dentists, that toothpaste


is superior to all others

O 20% of the dentists suggest to use other toothpastes

O You need to know more about how the data was


collected and analyzed before making any conlcusion
You ear in a commercial that 80% of dentists suggests
to use a certain kind of toothpaste.
Your conclusion is:

O According to the majority of the dentists, that toothpaste


is superior to all others

O 20% of the dentists suggest to use other toothpastes

O You need to know more about how the data was


collected and analyzed before making any conlcusion
What should you take in consideration when evaluating
statistical claims (= final conclusions)?

O The statistics presented

O The sources of the statistical findings

O The procedures used to generate the claims


What should you take in consideration when evaluating
statistical claims (= final conclusions)?

O The statistics presented

O The sources of the statistical findings

O The procedures used to generate the claims


Structure of the lecture

• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
DESCRIPTIVE AND INFERENTIAL
STATISTICS
DESCRIPTIVE AND INFERENTIAL
STATISTICS

• Descriptive statistics
Used to summarize and represent results
• Central tendency
• Variability

• Inferential statistics
Used to determine whether relationships or differences
within and between samples are not caused by chance
(“statistically significant”)
DIFFERENT TYPES OF DATA
Four Levels of Measurement

Nominal level Interval level

Ordinal level Ratio level


Four Levels of Measurement

Nominal level Interval level


eye color, gender, religious affiliation.

Ordinal level Ratio level


Four Levels of Measurement

Nominal level Interval level


eye color, gender, religious affiliation.

Ordinal level Ratio level


Grades (1, 2, 3, 4, 5, 6)
Steak: rare, mid-rare, medium, well
Four Levels of Measurement

Nominal level – Interval level


eye color, gender, religious affiliation. Temperature on the Fahrenheit scale,
Intelligence quotient (IQ)

Ordinal level
Grades (1, 2, 3, 4, 5, 6) Ratio level
Steak: rare, mid-rare, medium, well
Four Levels of Measurement

Nominal level – Interval level


eye color, gender, religious affiliation. Temperature on the Fahrenheit scale,
Intelligence quotient (IQ)

Ordinal level
Grades (1, 2, 3, 4, 5, 6) Ratio level
Steak: rare, mid-rare, medium, well Body weight, length
Four Levels of Measurement

Nominal level Interval level


eye color, gender, religious affiliation. Temperature on the Fahrenheit scale,
Intelligence quotient (IQ)

Ordinal level
Grades (1, 2, 3, 4, 5, 6) Ratio level
Steak: rare, mid-rare, medium, well Body weight, length

Nominal and Ordinal scales are discrete or categorical

Interval and Ratio scales are continuous scales


84 42 66 67 35 88 72 84 59 45 94 94 54 88 25 78 99 57 90 88 64 57 57 51 90 85 79 76 56 30 29 30 72 47 35 53 55 77 33 44 88 59 78 32 33 68 49
73 34 42 87 59 30 44 74 90 73 70 66 43 89 31 38 30 90 26 52 34 51 39 52 96 38 23 21 55 78 77 21 43 72 60 91 29 41 88 26 70 24 86 31 63 34 67
22 53 54 28 76 29 70 42 84 24 73 86 69 38 66 27 26 40 72 31 83 60 88 62 41 52 46 92 78 46 60 90 29 63 84 24 76 59 56 63 31 96 68 95 99 49 100
64 70 82 34 24 84 27 40 42 51 78 50 93 26 24 89 46 49 30 33 81 36 70 38 96 38 74 32 75 30 44 93 55 28 91 31 52 69 44 86 34 65 29 97 62 37 59
39 64 64 63 80 73 66 46 95 26 55 59 37 93 28 60 68 35 79 78 89 67 97 87 89 57 56 36 95 79 85 36 78 88 59 86 92 48 59 56 56 51 74 43 22 70 56
62 97 69 39 79 34 46 23 58 98 90 87 70 72 75 23 89 51 50 72 90 46 40 32 62 57 29 29 67 52 86 76 94 30 44 82 80 27 42 70 21 60 71 38 47 42 68
50 35 25 37 86 35 77 22 84 76 81 52 46 75 77 29 66 58 53 66 91 96 32 82 59 49 80 88 51 95 59 81 20 91 62 91 25 29 74 97 43 21 49 46 53 85 47
43 37 80 62 71 86 67 21 89 76 47 60 71 49 43 97 94 75 39 47 58 46 76 96 96 80 68 82 95 29 25 95 27 27 87 69 89 77 21 37 83 73 35 75 24 86 32
23 21 86 36 25 74 38 83 86 27 39 22 29 72 76 84 65 44 90 62 78 42 87 30 24 50 44 69 52 55 87 67 66 28 30 74 44 73 27 26 51 22 75 61 56 58 33
52 52 70 85 56 99 55 40 72 39 84 71 55 99 47 48 51 92 39 72 63 83 22 71 48 68 40 27 71 92 61 69 83 57 59 76 38 74 44 80 99 39 98 87 27 29 40
71 73 81 72 91 82 96 47 69 57 61 34 64 42 25 94 22 28 94 34 99 29 92 34 57 71 38 94 69 32 44 45 31 20 29 89 61 24 92 85 64 87 40 27 90 68 92
58 99 37 47 66 72 79 94 54 96 33 28 44 93 44 41 25 37 87 88 38 30 95 81 63 93 90 65 72 92 51 61 94 32 33 37 93 43 32 90 55 37 23 97 98 31 66
38 51 71 56 70 23 34 57 54 94 43 95 32 40 63 37 50 47 54 81 97 95 31 44 80 51 74 42 47 39 64 24 56 69 59 95 44 80 31 24 78 36 57 54 54 97 73
44 81 48 87 66 44 40 83 48 71 89 64 89 77 73 41 83 84 68 43 64 65 56 78 59 75 55 39 90 97 91 57 51 20 52 69 98 89 91 51 71 67 46 94 46 92 55
26 44 76 51 71 67 44 29 45 92 87 54 46 62 34 40 26 63 74 31 53 33 86 61 86 49 36 22 82 81 28 74 30 78 22 58 79 73 63 27 28 28 55 50 47 94 87
36 86 20 92 89 86 46 30 96 23 23 99 53 40 30 82 29 42 70 69 57 24 79 40 73 21 60 32 33 93 86 56 39 33 36 30 48 100 80 82 31 54 23 56 84 48 45
66 91 20 58 27 24 75 28 85 56 69 20 51 55 68 92 100 34 62 59 92 31 61 36 97 84 64 47 88 24 83 61 40 75 80 89 92 56 86 92 75 90 47 54 68 67 43
46 86 58 47 98 95 54 56 70 69 83 29 46 62 79 23 94 78 87 79 65 40 95 40 28 28 97 67 55 42 42 20 44 64 29 82 67 74 55 70 70 68 29 58 67 54 62
58 54 64 59 80 33 48 97 92 54 48 27 80 47 75 38 45 29 65 25 53 45 86 81 75 83 96 45 60 42 89 34 44 48 64 59 82 21 33 99 79 50 58 72 81 72 49
41 37 34 45 30 67 44 96 95 87 99 26 49 98 74 68 62 72 22 98 43 34 78 97 54 51 45 33 89 94 26 73 38 85 97 33 29 67 70 31 39 36 73 39 87 75 91
95 99 31 65 99 24 98 38 87 92 51 40 37 70 30 25 52 36 22 46 42 85 76 27 74 75 85 99 33 94 97 82 49 82 81 38 66 77 23 73 31 73 82 33 26 23 84
80 51 79 80 46 80 61 28 55 68 66 68 28 36 62 89 92 41 56 48 77 75 55 22 63 49 78 97 39 99 87 30 85 49 54 97 43 98 29 96 65 42 31 86 66 79 28
57 71 37 89 49 49 87 71 30 83 68 79 31 68 56 95 72 80 90 24 85 59 74 83 82 28 27 57 30 89 84 100 51 63 96 40 96 85 60 82 31 32 35 41 76 55 59
66 86 47 80 95 96 44 37 64 96 27 76 44 24 91 41 57 56 72 43 99 69 63 99 57 54 36 90 57 62 59 82 21 74 88 65 30 68 55 40 99 46 87 75 78 35 64
44 61 55 99 21 60 25 70 77 96 62 21 40 22 98 58 91 60 77 69 86 64 69 41 73 39 26 72 88 65 84 49 33 97 71 83 31 34 78 83 78 58 61 97 90 94 52
86 91 100 79 81 95 40 65 21 78 35 21 61 94 91 93 22 96 53 84 54 48 28 22 93 36 34 36 50 39 84 29 37 49 53 80 96 27 75 70 88 80 88 23 80 23 64
38 54 38 45 32 98 89 63 95 87 32 72 84 89 46 99 52 80 88 95 58 35 63 25 24 43 89 96 69 44 56 33 92 50 49 35 66 59 31 87 83 29 32 95 68 97 38
43 43 60 85 77 23 91 74 47 55 98 33 71 37 70 42 41 23 58 58 37 22 45 69 95 73 82 92 73 83 27 97 38 56 27 84 72 27 51 53 84 87 92 54 49 67 63
49 74 35 100 97 32 37 92 45 97 23 22 20 97 85 41 95 41 43 72 58 99 25 20 28 40 48 52 73 83 76 56 33 65 86 78 29 90 90 42 69 37 53 69 99 65 46
38 38 97 32 26 45 72 67 97 78 92 24 85 82 35 54 64 35 25 34 96 94 59 61 58 72 92 48 39 92 49 95 75 43 68 96 95 75 26 35 40 77 30 63 87 65 34
99 79 71 65 30 58 68 45 63 38 30 25 51 42 69 51 21 77 96 65 41 64 93 63 24 41 97 62 41 27 37 58 98 64 41 49 32 61 85 83 69 47 81 66 69 45 62
61 94 65 100 42 51 47 64 62 92 99 47 41 54 23 87 47 28 66 85 83 41 76 30 25 94 82 26 74 20 44 90 36 43 73 67 50 61 64 98 22 55 45 59 72 51 26
86 49 88 22 46 47 28 34 22 47 93 80 99 46 82 74 77 37 89 32 30 53 28 56 66 77 25 61 92 43 71 97 80 92 21 45 91 97 28 88 95 75 82 98 26 88 28
23 69 56 99 65 82 64 97 89 97 39 48 71 39 65 97 92 57 78 44 56 26 57 58 56 52 80 37 40 51 36 88 92 38 84 32 50 80 53 21 25 97 77 76 73 78 94
25 93 75 57 98 88 82 78 27 53 53 55 94 61 26 47 47 69 72 42 77 49 46 80 88 20 100 86 91 77 81 78 21 72 61 34 40 76 40 78 26 80 97 99 34 56 88
74 83 56 47 99 68 27 36 26 56 92 50 44 68 72 72 83 92 59 80 23 34 67 83 53 80 28 55 33 72 41 72 91 37 21 21 23 83 82 59 60 27 87 61 78 91 91
40 57 95 92 56 52 54 96 43 22 69 77 63 24 88 26 34 42 28 75 33 52 20 78 96 66 96 24 67 48 88 84 65 29 53 78 47 78 88 47 76 64 57 22 84 52 47
58 70 52 75 62 90 95 82 66 67 61 79 31 21 85 79 69 65 90 35 73 36 77 29 69 28 79 86 38 70 65 93 29 21 64 54 28 68 89 29 88 21 46 77 87 43 84
66 58 20 50 29 69 72 57 100 75 66 96 51 95 70 43 68 96 97 60 37 50 71 92 67 87 67 20 81 98 91 93 89 66 34 36 27 78 83 76 27 76 70 96 26 87 41
60 74 92 31 62 87 83 93 35 21 44 67 60 47 25 81 71 66 40 93 93 54 55 85 77 81 54 51 98 46 79 32 38 82 38 79 77 55 45 24 67 78 57 32 90 67 53
67 52 64 41 51 41 50 25 74 31 45 28 45 93 88 99 90 45 24 50 55 71 90 28 73 69 53 77 24 89 21 30 62 37 92 92 34 66 43 24 57 55 61 82 77 73 36
25 45 49 45 46 66 80 77 35 82 22 76 71 78 89 32 86 21 96 64 45 57 88 52 57 94 55 30 43 31 25 81 96 98 35 24 48 34 97 96 92 89 43 59 94 78 88
82 43 53 65 40 95 78 34 91 47 41 28 23 85 62 29 95 25 61 54 97 71 95 30 76 92 29 72 27 60 23 74 68 31 81 33 64 77 58 34 51 70 56 36 39 33 43
65 51 85 86 46 76 22 92 68 66 69 54 45 82 44 80 73 67 20 37 68 67 32 66 46 43 52 87 51 96 77 37 49 73 97 50 79 85 95 76 96 28 91 59 29 58 72
42 74 67 30 44 25 60 78 74 72 69 45 44 37 97 50 44 95 62 72 21 87 87 49 80 52 66 71 46 42 28 88 55 29 69 62 89 38 82 53 70 37 55 55 92 36 65
75 60 79 27 32 23 56 71 58 31 92 91 24 22 97 48 61 65 83 62 25 84 35 91 97 39 23 85 61 41 23 59 91 70 61 63 92 44 72 58 42 93 92 73 46 71 81
93 73 59 86 78 87 99 73 33 97 47 48 36 74 53 48 69 64 40 92 54 97 24 93 75 74 22 75 29 48 61 84 99 72 44 99 22 21 92 73 71 88 74 75 28 37 41
64 26 38 95 66 28 68 39 52 76 77 70 63 75 74 80 91 45 40 30 86 39 77 95 50 25 70 73 22 23 65 28 41 71 97 20 40 60 86 98 88 76 74 80 44 68 62
99 74 58 53 64 99 52 82 27 46 20 83 62 80 24 91 92 64 33 33 58 83 94 43 61 53 93 59 22 89 87 38 39 97 40 41 38 23 91 76 37 45 30 78 27 36 76
21 32 68 88 47 65 31 31 31 42 63 87 84 61 49 23 37 88 77 53 30 35 28 61 99 45 25 39 33 55 64 69 36 66 29 70 97 34 90 34 51 99 47 90 80 50 36
DESCRIPTIVE STATISTICS

✓Histograms

✓Central tendency

✓Level of dispersion

✓Normal distribution

✓Plotting the results


Histogram
A frequency distribution in graphical form
Bar graph
CENTRAL TENDENCY

AND

LEVEL OF DISPERSION
CENTRAL TENDENCY

What is the “heart of the data”?

• Two most common measures of central tendency

– Mean – average

– Median – middle score

– Mode – most frequent score


CENTRAL TENDENCY

What is the “heart of the data”?

• Two most common measures of central tendency

– Mean – average

– Median – middle score

– Mode – most frequent score


MEAN
Simply add up all of the scores and divide by the number in
the sample.
Sum of scores / number of samples
15+12+23+30+22+5= 97 : 6= 16.16
Advantages
– Summarizes data in a way that is easy to understand

– Uses all the data

– Used in many statistical applications

Disadvantages
– Affected by extreme values
• Example: Average salary at a company:
12.000; 12.000; 12.000; 12.000; 12.000; 12.000; 12.000;
12.000; 12.000; 12.000; 20.000; 390,000
MEAN
Simply add up all of the scores and divide by the number in
the sample.
Sum of scores / number of samples
15+12+23+30+22+5= 97 : 6= 16.16
Advantages
– Summarizes data in a way that is easy to understand

– Uses all the data

– Used in many statistical applications

Disadvantages
– Affected by extreme values
• Example: Average salary at a company:
12.000; 12.000; 12.000; 12.000; 12.000; 12.000; 12.000;
12.000; 12.000; 12.000; 20.000; 390,000 Mean = 44.167
MEDIAN
The middle score in the data: half the scores are above
it, half of the scores are below it.

– Scores are ranked…. Find the one in middle.

50 56 66 68 70 72 76 76 76 78 78 78 78 80 80 86 86 86 88 96 98 100 100
Median = 78

-If there is an even number of scores, the median is the average of the
two middle scores.
Example: 10, 10, 9, 9 Median = 9.5
MEDIAN
The middle score in the data: half the scores are above
it, half of the scores are below it.

– Scores are ranked…. Find the one in middle.

50 56 66 68 70 72 76 76 76 78 78 78 78 80 80 86 86 86 88 96 98 100 100
Median = 78

-If there is an even number of scores, the median is the average of the
two middle scores.
Example: 10, 10, 9, 9 Median = 9.5

• Example: Average salary at a company:


12,000; 12,000; 12,000; 12,000; 12,000; 12,000; 12,000;
12,000; 12,000; 12,000; 20,000; 390,000 Mean = 12.000 Euro
MEDIAN

Advantages
– Not affected by extreme values
400
– Easy to compute
300

Neurons (n)
200

Disadvantages
– Doesn't use all of the data values 100

– Many reviewers editors are not familiar with it 0

Median is preferred when data is skewed (not symmetrically


distributed) or has extreme scores.
50,50,50,50,50,50,50,50 → MEAN = 50
(MEDIAN)

10,20,30,40,60,70,80,90 → MEAN = 50
(MEDIAN)
50,50,50,50,50,50,50,50 → MEAN = 50
(MEDIAN)

10,20,30,40,60,70,80,90 → MEAN = 50
(MEDIAN)

LEVEL OF DISPERSION
LEVEL OF DISPERSION

How spread are the data


400

• Measures of variability: 300

Neurons (n)
– Range
200
– Variance (variability around the mean)
– Standard deviation (average variability)
100

• Range
– the simplest variability statistic = highest score – lowest score.

• Variance and Standard deviation


– measurements which indicate how far away from the middle
the scores are.
VARIANCE & STANDARD DEVIATION

400

300

Neurons (n)
200

VARIANCE 100

0
10,20,30,40,60,70,80,90 → MEAN = 50
2 2
(10-50) +(20-50)….
----------------------------
8-1
VARIANCE & STANDARD DEVIATION

VARIANCE

• The larger the standard deviation, the more spread


out the scores are.
• The smaller the standard deviation, the closer the
scores are to the mean.
STANDARD ERROR OF THE MEAN

Standard Deviation
SEM=
n
IF THE SD IS UNCHANGED
THE HIGHER IS THE SAMPLE SIZE, THE LOWER IS THE SEM
PERCENTILES

• A percentile reflects the percentage of scores that are below your data
point of interest.

100th percentile
400 400

75th percentile
300 300

Neurons (n)
Neurons (n)

200 200
50th percentile (median)

100 100

25th percentile
0 0

O percentile
Histogram
A frequency distribution in graphical form
Bar graph
NORMAL (GAUSSIAN) DISTRIBUTION

Mean
Median
NORMAL (GAUSSIAN) DISTRIBUTION

• As sample size increases, the distribution of the


data becomes more normalized.

• Importance of the normal distribution


– Symmetrical
– Mean and median, are the same
– The further away from the mean, the less likely the score is to
occur
– Probabilities can be calculated
0 SD is the 50th percentile
1 SD is the 84th percentile
1.96 SDs is the 95th percentile
2.58 SDs is the 99th percentile
DISTRIBUTION

CENTRAL TENDENCY

LEVEL OF DISPERSION

PLOTTING THE RESULTS


PLOTTING THE RESULTS

• Visually representing the data can make it more


understandable for you as well as anyone else
looking at your results.

• The best graph is the one that makes the data clear
and exhaustive.
GRAPHICAL REPRESENTATIONS
OF CENTRAL TENDENCY AND DISPERSION

Scatterplot Box and Whiskers Bar Bar


(median) (percentiles) (mean and SD) (mean and SEM)
400 400 300 300

300 300
Neurons (n)

Neurons (n)
Neurons (n)
Neurons (n)

200 200

200 200

100 100
100 100

0 0 0 0
GRAPHICAL REPRESENTATIONS
OF CENTRAL TENDENCY AND DISPERSION

Scatterplot Box and Whiskers Bar Bar


(median) (percentiles) (mean and SD) (mean and SEM)
400 400 300 300

300 300
Neurons (n)

Neurons (n)
Neurons (n)
Neurons (n)

200 200

200 200

100 100
100 100

0 0 0 0

With small sample size or not normal Large sample size or normal distribution
distribution (parametric tests)
(non parametric tests)
Axon length (m)

0
250
500
750

A
B
Axon length (m)

0
250
500
750

A
B

Axon length (m)


0
100
200
300
400
500

A
B

Axon length (m)


0
100
200
300
400

A
B
Keys to making figures
The most general standards of charting data :
•Present meaningful data
•Define the data unambiguously
•Present the data efficiently
•Do not distort the data

KEEP IT SIMPLE!

DON’T “LIE”!
KEEP IT SIMPLE!
KEEP IT SIMPLE!
DON’T “LIE”!
Two basic divisions of statistics are

O Inferential and descriptive

O mean and median

O range and standard deviation


Two basic divisions of statistics are

O Inferential and descriptive

O mean and median

O range and standard deviation


Which of the following are descriptive statistics

O The mean age men living in Hamburg

O The number of people who watched ARD-Sportschau on


Sunday

O The prediction of next semester‘s unemployment rate

O The length of the longest river in Europe


Which of the following are descriptive statistics

O The mean age men living in Hamburg

O The number of people who watched ARD-Sportschau on


Sunday

O The prediction of next semester‘s unemployment rate

O The length of the longest river in Europe


Structure of the lecture

• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• (Multifactorial analysis)

• (Designing an experiment)
• Type I and type II errors, power
INFERENTIAL STATISTICS
Population
Population Sample

Subset
Population Sample

? INFERENCE

• Statistical inference is the process of


making an estimate, prediction, or decision
about a population based on a sample.
INFERENTIAL STATISTICS
A decision, estimate, prediction, or generalization
about a population, based on a sample.

Determine whether our data show a pattern different


from chance or not.
INFERENTIAL STATISTICS

Determine whether our data show a pattern


different from chance or not

IT TESTS WHETHER THERE ARE DIFFERENCES:

– TO A HYPOTEHTICAL DISTRIBUTION (e.g., normal distribution)

– BETWEEN GROUPS (POPULATIONS)

– WITHIN A GROUP (e.g. between two time points)

– RELATIONSHIPS BETWEEN VARIABLES / PARAMETERS


INFERENTIAL STATISTICS TESTS WHETHER
THE NULL HYPOTHESIS SHOULD BE
REJECTED

NULL HYPOTHESIS

The hypothesis that an apparent difference is due to chance


INFERENTIAL STATISTICS

Determines the probability that the


differences arise due to chance.

If the probability that the observed


differences are due to chance is very low
(e.g., < 5%) we say that the difference is
statistically significant.
THE P VALUE

The probability that you reject the null hypothesis when


the null hypothesis is true
THE P VALUE

The probability that you reject the null hypothesis when


the null hypothesis is true

Translation to english:

The probability that you do a mistake when you state that


there is a difference between your groups

P= 1.0 → 100%
P= 0.1 → 10%
P= 0.05 → 5%
P= 0.01 → 1%
THE P VALUE

160 p = 0.025

Axon length (m)


150

140

130
10
0
A B

p = 0.025 means that there is a probability of 2.5% that you make a mistake
when you conclude that the length of axons is shorter in group B than
group A

(or: a probability of 2.5% that the difference is due to chance)


INFERENTIAL STATISTICS

IT TESTS WHETHER THERE ARE DIFFERENCES:

– TO A HYPOTEHTICAL DISTRIBUTION (e.g., normal distribution)

– BETWEEN GROUPS

– WITHIN A GROUP (e.g. between two time points)

– RELATIONSHIPS BETWEEN VARIABLES / PARAMETERS


PARAMETRIC vs NON-PARAMETRIC
Nominal level – Interval level
eye color, gender, religious affiliation. Temperature on the Fahrenheit scale,
Intelligence quotient (IQ)

Ordinal level
Grades (1, 2, 3, 4, 5, 6) Ratio level
Steak: rare, mid-rare, medium, well Body weight, length
PARAMETRIC vs NON-PARAMETRIC
Nominal level – Interval level
eye color, gender, religious affiliation. Temperature on the Fahrenheit scale,
Intelligence quotient (IQ)

Ordinal level
Grades (1, 2, 3, 4, 5, 6) Ratio level
Steak: rare, mid-rare, medium, well Body weight, length
PARAMETRIC vs NON-PARAMETRIC

PARAMETRIC

ASSUMPTIONS
• The variables must be measured at interval
or ratio scale
• A normally distributed population
• Equal variances among the population

ADVANTAGES
• Powerful
• Multi-factorial
• Good interpretation of the results

DISADVANTAGES
• Assumptions must be met
• Requires a relatively large sample size
PARAMETRIC vs NON-PARAMETRIC

PARAMETRIC NON-PARAMETRIC

ASSUMPTIONS ASSUMPTIONS
• The variables must be measured at interval • Nonparametric (distribution free) techniques
or ratio scale make no assumptions about the population
• A normally distributed population
• Equal variances among the population

ADVANTAGES ADVANTAGES
• Powerful • May be the only test when the sample size is
• Multi-factorial small
• Good interpretation of the results • Require fewer assumptions
• The only choice when the measurement
scales are nominal or ordinal (e.g. using
categories or rankings)

DISADVANTAGES DISADVANTAGES
• Assumptions must be met • They are less powerful
• Requires a relatively large sample size • Does not allow multifactorial analysis
• They are unfamiliar to many researchers and
editors
How can I test whether my data are
normally distributed with equal
variances among the population?
NORMAL (GAUSSIAN) DISTRIBUTION

Mean
Median

Kolmogorov-Smirnov

Lilliefors

Shapiro-Wilks‘W (better power, preferred)


EQUAL VARIANCES

20

Body weght (g)

10

-MOSES TEST FOR EQUAL VARIANCES

-GOOD-BAKER
RELATIONSHIPS BETWEEN VARIABLES
RELATIONSHIPS BETWEEN VARIABLES
70

Rearing in the open field (n)


Sperman r = -0.72 *
60

50

40

30

20

10

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)

PARAMETRIC Pearson r

NON-PARAMETRIC Spearman r
• Correlation simply measures relationships!
(not casualities!)

• All methods use to calculate correlation are


established so that it can vary between –1 and +1.

– Represented by the “r”

• Strength of the correlation


– The closer to +1 or -1, stronger the correlation
70
Rearing in the open field (n)

60
Sperman r = -0.72 * Cohen’s rule of thumb for r values:

50 .10 = no relationship
40
.30 = weak relationship
.50 = moderate relationship
30 > .60= strong relationship
20

10

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)
Importance of Sample Size

• Any value of r become significant with greater sample


sizes.
Importance of Sample Size

60

Rearing in the open field (n)


50

40

30

20

10

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)
Importance of Sample Size

60
*

Rearing in the open field (n)


Spearman r = 0.29
50

40

30

20

10

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)
INDEPENDENT CORRELATIONS
FOR DIFFERENT POPULATIONS

70
WT r = -0.72 *
Rearing in the open field (n)

60 KO r = -0.78 *

50

40

30

20

10

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)
INDEPENDENT CORRELATIONS
FOR DIFFERENT POPULATIONS

70 7.5
WT r = -0.72 *
Rearing in the open field (n)

60 KO r = -0.78 *

50

Jumping (n)
5.0
40

30 WT
2.5
KO
20

10
Spearman r = 0.57 *
0 0.0
0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000
ChAT+ neurons in medial ChAT+ neurons in medial
septal / diagonal band of septal / diagonal band of
Broca complex (n) Broca complex (n)
Parametric and non-parametric tests

75
Rearing in the open field (n)

50

25

0
0 5000 10000 15000 20000 25000

ChAT+ neurons in medial


septal / diagonal band of
Broca complex (n)

Pearson r = -0.76 p = 0.01

Spearman r = -0.36 p = 0.33


Parametric and non-parametric tests

75 75
Rearing in the open field (n)

Rearing in the open field (n)


50 50

25 25

0 0
0 5000 10000 15000 20000 25000 0 5000 10000 15000 20000 25000

ChAT+ neurons in medial ChAT+ neurons in medial


septal / diagonal band of septal / diagonal band of
Broca complex (n) Broca complex (n)

Pearson r = -0.76 p = 0.01 Pearson r = -0.46 p = 0.14

Spearman r = -0.36 p = 0.33 Spearman r = -0.36 p = 0.33


DIFFERENCES BETWEEN GROUPS

750

Axon length (m)


500

250

0
A B
DIFFERENCES BETWEEN GROUPS

2 GROUPS > 2 GROUPS


Unpaired Paired Unpaired Paired
(independent) (dependent) (independent) (dependent)

Unpaired t-test Paired t-test 1-way ANOVA 1-way ANOVA for


PARAMETRIC repeated measurements

Mann-Whitney U Wicoxon matched pairs Kruskal-Wallis Friedman ANOVA


NON-PARAMETRIC Kolmogorov-Smirnov Signed test ANOVA
Wald-Wolfowitz runs
DIFFERENCES WITHIN A GROUP

750

Axon length (m)


500

250

0
A
BEFORE B
AFTER
DIFFERENCES BETWEEN GROUPS

UNPAIRED Unpaired
2 GROUPS

(independent)
Paired
(dependent)
> 2 GROUPS
Unpaired
(independent)
Paired
(dependent)

(INDEPENDENT)
PARAMETRIC
Unpaired t-test Paired t-test 1-way ANOVA 1-way ANOVA for
repeated measurements

Mann-Whitney U Wicoxon matched pairs Kruskal-Wallis Friedman ANOVA


NON-PARAMETRIC
PAIRED
Kolmogorov-Smirnov
Wald-Wolfowitz runs
Signed test ANOVA

(DEPENDENT)
MULTIFACTORIAL ANOVA
DIFFERENCES BETWEEN GROUPS

2 GROUPS > 2 GROUPS


Unpaired Paired Unpaired Paired
(independent) (dependent) (independent) (dependent)

Unpaired t-test Paired t-test 1-way ANOVA 1-way ANOVA for


PARAMETRIC repeated measurements

Mann-Whitney U Wicoxon matched pairs Kruskal-Wallis Friedman ANOVA


NON-PARAMETRIC Kolmogorov-Smirnov Signed test ANOVA
Wald-Wolfowitz runs
DIFFERENCE TO AN HYPOTHETICAL VALUE
DIFFERENCE TO AN HYPOTHETICAL VALUE

NaCl (n= 8)
0.7 Anisomycin (n= 9)

0.6

Preference index
0.5

0.4
PARAMETRIC One-sample t-test 0.3

0.2

0.1

0.0
NaCl (n= 8) Anisomycin (n= 9)

0.75

Preference index
0.50

NON-PARAMETRIC Wilcoxon signed-rank


0.25
NaCl
Anysimycin
0.00
Rec NACL Rec ANI
DIFFERENCE BETWEEN TWO GROUPS IN
FREQUENCY OF OCCURRENCE (2 x 2 tables)
DIFFERENCE BETWEEN TWO GROUPS IN
FREQUENCY OF OCCURRENCE (2 x 2 tables)

yes no

Chi-square test Group 2 4 16


Fisher exact probability test Group 1 12 8

100
Mice being able to climb

** wt n=16
80
ko n=18
down (%)

60

40

20

0
Our data come from ____, but we really care most about
_____.

O theories, mathematical models

O populations, samples

O samples; populations
Our data come from ____, but we really care most about
_____.

O theories, mathematical models

O populations, samples

O samples; populations
A researcher believes that students in Hamburg will score
higher than students in Munich in a mathematical test:
Which of the following is the null hypothesis?

O mean score in Hamburg < mean score in München

O mean score in Hamburg > mean score in München

O mean score in Hamburg = mean score in München


A researcher believes that students in Hamburg will score
higher than students in Munich in a mathematical test:
Which of the following is the null hypothesis?

O mean score in Hamburg < mean score in München

O mean score in Hamburg > mean score in München

O mean score in Hamburg = mean score in München


A researcher hypothesizes that there is a correlation
between blood testosterone concentration and
aggressive behavior. The null hypothesis is that the
correlation is equal to ?

O 0

O1

O 0.5
A researcher hypothesize that there is a correlation
between blood testosterone concentration and
aggressive behavior. The null hypothesis is that the
correlation is equal to ?

O 0

O1

O 0.5
Which of the following probability values gives you the
most confidence that the null hypothesis is false ?

O p = .28

O p = .05

O p = .042

O p = .003
Which of the following probability values gives you the
most confidence that the null hypothesis is false ?

O p = .28

O p = .05

O p = .042

O p = .003
You have just analyzed the results from your experiment
and you calculated p= 0.13.
Which conclusion can you make?

O You reject the null hypothesis

O You accept the null hypothesis

O You fail to reject the null hypothesis

O You accept the alternative hypothesis


You have just analyzed the results from your experiment
and you calculated p= 0.13.
Which conclusion can you make?

O You reject the null hypothesis

O You accept the null hypothesis

O You fail to reject the null hypothesis

O You accept the alternative hypothesis


You test whether two parameters correlate with each
other and obtain a Spearman r = +0.1 and p < 0.001.
You conclude that:

O The two parameters correlate with each other

O The two parameters do not correlate with each other

O You must enlarge the sample size to get a significant


correlation
You test whether two parameters correlate with each
other and obtain a Spearman r = +0.1 and p < 0.001.
You conclude that:

O The two parameters correlate with each other

O The two parameters do not correlate with each other

O You must enlarge the sample size to get a significant


correlation
You observe that blood testosterone concentration
correlates with amount of aggressive behavior with a
Spearman r= +0.8 and p< 0.001. You conclude that:

O Higher testosterone concentration causes higher


aggresive behavior

O Higher aggresive behavior causes higher testosterone


concentration

O Testosterone concentration and aggressive behavior are


controlled by a third factor

O Testosterone concentration and aggressive behavior


positively covariate and any of the three options above
could be true
You observe that blood testosterone concentration
correlates with amount of aggressive behavior with a
Spearman r= +0.8 and p< 0.001. You conclude that:

O Higher testosterone concentration causes higher


aggresive behavior

O Higher aggresive behavior causes higher testosterone


concentration

O Testosterone concentration and aggressive behavior are


controlled by a third factor

O Testosterone concentration and aggressive behavior


positively covariate and any of the three options above
could be true
The goal of statistics is to prove that the null hypothesis
is true

O True

O False
The goal of statistics is to prove that the null hypothesis
is true

O True

O False
I know what these terms indicate: 9:30 10:55
Nominal, Ordinal, Interval Ratio measurements
Median
Percentile
Parametric
Null hypothesis
Mann-Whitney test
„P“ value
Structure of the course
• What is statistics? Why statistics?

• Descriptive statistics: how to summarize the


data

• Inferential statistics: how to analyze the data


• Parametric vs nonparametric analyses
• Multifactorial analysis, ANOVA, repeated measures

• Designing an experiment
• Type I and type II errors, power

• Statistical software
• GraphPad Prism
• (Statistica)

Das könnte Ihnen auch gefallen