0% found this document useful (0 votes)
42 views37 pages

Week 4 Bioscience

The document provides an overview of advanced study skills in biological sciences, focusing on data handling and statistics. It covers descriptive and inferential statistics, measures of central tendency, variability in biological data, and the impact of outliers on statistical measures. Additionally, it discusses standard deviation, standard error, and various statistical analyses such as T-tests and Chi-squared tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views37 pages

Week 4 Bioscience

The document provides an overview of advanced study skills in biological sciences, focusing on data handling and statistics. It covers descriptive and inferential statistics, measures of central tendency, variability in biological data, and the impact of outliers on statistical measures. Additionally, it discusses standard deviation, standard error, and various statistical analyses such as T-tests and Chi-squared tests.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 37

Birmingham International Academy

Advanced Study Skills in


Biological Sciences:

Data Handling, Statistics &


Describing data.
Richard Banks
[email protected]
Statistics

• Statistics is a collection of mathematical techniques


that help to analyse and present data

• vital to the scientific method


- used to confirm or reject a hypothesis

• Classified into
‘Descriptive statistics’ and ‘Inferential statistics’
Descriptive Statistics
Used to summarise the basic features of a data set

• measures of central tendency


 mean, mode, median

• measures of spread
 range, standard deviation, standard error

• measures of distribution
 skewness

3
Variability in ‘biological’ data

Biological data often has a ‘Normal distribution’


i.e. A frequency distribution with the most frequent number near
the middle: central tendency
Frequency distribution.
• Number of times an observation occurs in the data set
• Often presented in a table or a histogram
• % Frequency can be calculated:
frequency of an observation X 100
total number of observations
Result Frequency
0 2
1 9 frequency of 0 = 2 / (2+9+26+25+10+3) x 100 = 2.67%
2 26 frequency of 2 = 26/(2+9+26+25+10+3) x 100 = 34.67%
3 25
4 10
5 3

• % Frequency can then be used to create a distribution histogram


Task:
• Calculate the % Frequency of the data set.
• Produce a sketch diagram of the percentage
distribution graph of the table from the
previous slide (also below):
Normal distribution: data with central tendency

Biological data often has a normal distribution


i.e. has a frequency distribution with the most frequent number
near the middle, i.e. central tendency
therefore, measuring of the "middle" value of the data set is useful
Measures of central tendency
Sum of observations
Σx
• Mean: x = n Number of observations

• Median: equal number of values above and


below (=Middle)

• Mode: Value with the highest frequency (=Most)

• A data set can be bimodal or even multimodal, with 2 or


more values being equally frequent.

*sample mean used as an estimate of the population mean


The Mean
• The mean (or average) is calculated by adding up all the
individual values and dividing the total by the number of values

The Median
• The median value is identified by putting all of the individual
values in size order (smallest to largest) to find the middle value
(if there are an even number of individual values, take the mean of the two
middle values)

The Mode
• The mode is the value that occurs most often in the data set
Task
• Make a start on
completing the
questions 1-5 on the
worksheet.

• 10 minutes
An example ‘data set’:
2 , 4 , 2 , 0 , 40 , 2 , 4 , 3 , 6

Calculate the mean, median and mode


Σx
Mean: x = n Σx = n= x =

Median: sort data

Mode: (Occurs the most times)

Which is most representative of the centre of the data?


An example ‘data set’:
2 , 4 , 2 , 0 , 40 , 2 , 4 , 3 , 6

Calculate the mean, median and mode


Σx 63
Mean: x = n Σx = 63 n = 9 x = 9 = 7

Is this an error / ‘real’ data point ?

Median: sort data 0 2 2 2 3 4 4 6 40

Mode: 2 (Occurs the most times)

Which is most representative of the centre of the data?


What happens if we exclude the ‘outlier’ ?

2 4 2 0 40 X2 4 3 6
New data set:
2 4 2 0 2 4 3 6
Σx 23
Mean: x = n Σx = 23 n = 8 x = 8 = 2.875 It was 7

Median: sort data 0 2 2 2 3 4 4 6 It was 3

Mode: 2 - occurs the most times It was 2

An ‘outlier’ can have a disproportionate effect on the mean

Median is a reasonably typical value (resistant to outliers)


Range: Difference between the maximum & minimum
• An estimate of the spread of the data (= ‘dispersion’)
e.g. experimental data of weight of lab rats
320 , 367, 423, 471, 480 grams

Range is calculated as ………………………….. 480 - 320 = 160 g

• Useful, BUT some data can be very different from other data
points – outliers
e.g. a small baby rat added to the data set
150, 320 , 367, 423, 471, 480 g 480 - 150 = 330 g

• So, not always an accurate description of the overall data set


Data with outliers: The mean and the range are altered
to a greater extent by outliers.
‘Normal’ distribution
Symmetrical
Mean = Median = Mode
Mean
Median
Mode ‘Skewed’ distributions
- caused by ‘outliers’

Positive (right) skew


Long upper tail (high values)
Mean > Median > Mode
Mean
Median
Mode
Mean is moved in the
direction of the skew
Negative (left) skew
Long lower tail (low values)
Mean < Median < Mode
Mean
Median
Mode
Task
• Make a start on
completing the
questions 6-8 on the
worksheet.

• 5 minutes.
Break
Standard deviation (SD)
A measure of how data is distributed about the mean

• Standard deviation is a measure of the distance of an


individual value from the overall sample mean

• Allows us to quantify the variability within the data

• Expressed as Mean  SD

• The lower the standard deviation, the less uncertainty


[or] More confidence in the experimental result
Standard deviation (SD)
A measure of how data is distributed about the mean
• Mean  SD
• Eg 55.3  3.3
• This means that 68% of the values in the data set lies within 6.6
of the mean value, ie from 52.0 to 58.6
• 95% of the values fall within 2SD ie 48.7 to 61.9
Standard deviation (SD)
a measure of how data is distributed about the mean

• Less spread of data around the


mean = small standard deviation
• More confidence in the data set.

• High spread around the mean =


higher standard deviation
• Lower confidence in the data set.
Standard Deviation and Variance

Sample Variance (S2)

x = each score/value
= mean (average)
n = number of scores/values
= sum of…
Standard Deviation = √ Variance
Standard deviation (SD):
Calculation Task:

•In a learning behaviour


study, rats had to press a
leaver to gain a food
reward.
•Number of leaver presses,
before rat gave up trying to
access food reward are
given on the next slide.
•Can you work out the
standard deviation of the
data set?
Standard deviation (SD)
calculation
Task 10 minutes:
Repetition of lab rat leaver pressing in a reward experiment:
Number of leaver presses: 9, 2, 5, 4, 12, 7, 8, 11, 9, 3, 7, 4, 12, 5, 4,
10, 9, 6, 9, 4.
n=20
To calculate SD

Step 1: Calculate the mean, . Add up all the numbers and divide by
the total number of data. = 7
Step 2: Subtract the mean from each data point and then square
each value.
Step 3: Calculate the sum of the squared values.
Step 4: To calculate the variance, divide the sum of the squared
values by n-1.
Step 5: The standard deviation is the square root of the variance.
Use a calculator to obtain this number.
Standard deviation (SD)
calculation
Task 10 minutes:
Usefulness of Standard Deviation

gives ‘reliability’ measure - 95% confidence interval (CI)


= 2 x SD
= range above and below the mean within which 95% of
the measurements lie
Expressing data points with SD error bars

Symbol or bar that indicates Mean value


Vertical line representing size of standard deviation
Standard Error (SE)

• SE is related to, but is not the same as, the


standard deviation (SD)

• SE = SD/√N

N = sample size

• expressed as Mean  SE, N= (sample size)


Overlap between error bars

If SE bars do not overlap this indicates differences in means are


meaningful
• Requires an appropriate Statistical Test to confirm
Task
• Make a start on
completing the
questions 9-13 on the
worksheet.

• 10 minutes.
Statistical Analyses
- A hypothesis can be confirmed by statistical approaches if sufficient
data has been collected.
- A statistical test confirms whether the difference between data sets is
statistically significant.
- The tests used depends on whether the data collected is independent
or matched/paired & the level of data collected (nominal, categorical,
ordinal, quantitative).
Statistical Analyses
T-Test
Can be used to test a hypothesis to
determine whether there is a significant
difference between the means of two
data sets that are normally distributed.

If the difference between the two data


sets is significant then the null
hypothesis can be rejected and the
alternative hypothesis, which always
states there is a significant difference
between the sets of data can be
accepted (see lecture 1, The Scientific
Method).
Statistical Analyses

Chi-Squared Test
Is used to determine whether there is a significant
difference between the observed set of data obtained
from an investigation is statistically significantly different
from that which was originally expected and stated in the
hypothesis.
The null hypothesis will always state there will be no
difference between the observed and expected values.
This test is often used in inheritance studies to see if
observable characteristics follow mendelian ratios
(eg 3:1 or 9:3:3:1)
Task
• Complete the questions
on the worksheet (14-
16), so you have them
ready for revision.

• Answer sheets will be


made available on
Canvas after the
session.
Useful links for more information
• University of Birmingham Academic Skills Gateway
http://libguides.bham.ac.uk/asg

• http://www.stats.gla.ac.uk/steps/glossary/presenting_data.
html

• http://explorable.com/statistics-tutorial
• http://www.engageinresearch.ac.uk/section_4/step_by_ste
p_statistics.shtml

• http://www.statstutor.ac.uk/topics/
• https://www.bmj.com/about-bmj/resources-readers/public

You might also like