0% found this document useful (0 votes)

55 views49 pages

6 Continuous Data Analysis

The document provides an overview of continuous data analysis in biostatistics, focusing on methods such as t-tests, ANOVA, correlation, and linear regression. It explains the significance of these methods in comparing means, interpreting relationships between variables, and assessing model fit through various statistical measures. The document also outlines assumptions for linear regression and techniques for variable selection in multiple linear regression models.

Uploaded by

Abas Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

55 views49 pages

6 Continuous Data Analysis

Uploaded by

Abas Ahmed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 49

Basic Biostatistics

Haramaya University

Collage of Health and Medicine Sciences

School of Public Health

Continuous Data Analysis

By Adisu Birhanu (Assistant prof. of Biostatistics)

Feb 2025
Session Objectives

Describe Continuous variable and method of analyses

Describe relationship between continuous variables

Interpret the outputs from linear regression models

Analysis of Continuous Data
A continuous variable is one which can take on infinity many,
uncountable possible values in the range of real numbers.

Data analysis methods such as scatter plot, line graphs and

histogram are applicable for describing numerical data.

More advanced methods for inferential analysis of continuous data

include correlation, t-test, ANOVA and linear regression.
Comparison of the means
t-test is appropriate to compare two means from two
population

There are three different t-tests

One sample t-test

Two independent sample t-test

Paired sample t-test

ANOVA is used for IV with more than two groups

BY ADISU B.
One sample t-test

 It is used to compare the estimate of a sample with

a hypothesized population mean to see if the
sample mean is significantly different.

 there is one group being compared against a standard value

BY ADISU B.
Independent two sample t-test

Used to compare mean of two unrelated or independent

groups

the groups come from two different populations (e.g., two

different people from two separate cities).

Hypothesis: Ho: Mean of group 1 = Mean of group 2

HA: Mean of group 1 ≠ Mean of group 2 ,

BY ADISU B.
Example
Research question: To test if there is significant difference in
birth weight of male and female infant→ Independent t-test is
appropriate

BY ADISU B.
Interpretation

The 95% confidence interval for the difference of means

does not contain 0.

The p-value is less than 0.05

Hence, we conclude that there is significant difference in

birth weight among the male and female infants.

BY ADISU B.
Paired t- test
 Compare means if each observation in one sample has
one and only one pair in the other sample dependent
to each other.

 In this case the groups come from a single population (e.g.,

measuring before and after an experimental treatment), perform
a paired t test.

 Hypothesis: Ho: Mean difference = 0 Vs HA: Mean

difference ≠ 0

BY ADISU B.
One way ANOVA (Analysis of Variance)
For two normal distributions the two sample means are
compared by t-test.

The means of more than two distributions need to be

compared.

BY ADISU B.
One way ANOVA…
The t-test methodology generalizes to the one-way analysis
of variance (ANOVA) for categorical variables with more
than two categories.

ANOVA do not tell you which group is different, but only

whether a difference exists.

To know which group is different, we used post hoc tests

(bonferroni, Tuckey, scheffe).

BY ADISU B.
One way ANOVA…

For K means (K> 3).

Ho : µ1 = µ2 = : : : =µ k ,

HA : at least one of the means is different.

There is one factor of grouping (one way ANOVA)

BY ADISU B.
One way ANOVA…

Consider infant data: Outcome variable: birth

weight

Factor variable: residence (urban= 1, semi-urban=

2, rural=3)

Objective: compare weight among the three place

We can conclude that at least one of the groups' means differ

on body weight.

Now the question is: which groups are different?

Answering this question requires multiple comparisons (post

hoc tests).

Bonferroni, Tukey and scheffe are commonly used methods.

Bonferroni method corrects probability of Type I error for the

BY ADISU B.
Interpretation;
All pairs of comparison are statistically significant at 0.05
level:
urban versus semi-urban, urban versus rural, semi-urban
versus rural.
STATA CODE: oneway weight place, bonferroni

BY ADISU B.
Correlation

Correlation is used to quantify the degree to which two

continuous random variables are related,
Common correlation measure
Pearson Correlation Coefficient: for linear relationship
between two variables
Scatterplot
Helpful tool in exploring relationship between two variables
 If No relationship between proposed explanatory and dependent
variables
 Then fitting a linear regression model to data probably will
not provide a useful model
 Before attempting to fit a linear model to observed data, a
modeler should first determine whether or not there is a
relationship between the variables of interest
 This does not necessarily imply that one variable causes the
other, but that there is some significant association between the
two variables
Scatter plot and correlation of two data
Scatter plot for age and CD4 count of
patients
The scatter plot of CD4 count versus age of patient
Correlation coefficient
A valuable numerical measure of relationship between
two variables
A value between -1 and 1 indicating the strength of the
linear relationship for two variables
 Population correlation coefficient ρ (rho) measures the
strength of linear relationship between two variables
 Sample correlation coefficient, r, is an estimate of ρ and is used
to measure the strength of the linear relationship in the
sample observations.
Correlation coefficient
Basic features of sample and population correlation
are:
 It is unit free, It range between -1 and 1

 The closer to -1, the stronger the negative linear relationship

 The closer to 1, the stronger the positive linear relationship

 The closer to 0, the weaker the linear relationship

Coefficient of determination/R
squared
Coefficient of determination is the measure of strength of the
model
Variation in dependent variable is split into two parts as
Variation in y = SSE + SSR
Sum of Squares Error (SSE):
 Measures amount of variation in y that remains unexplained
(i.e. due to error)
Sum of Squares Regression (SSR) :
 Measures amount of variation in y explained by variation in
the independent variable x
Coefficient of determination…
 Coefficient of determination does not have a critical value that enables
us to draw conclusions
 Higher the value of R squared, the better the model fits the data
 If R2= 1, it implies Perfect match between the line and the data points
 If R2=0 then it implies there are no linear relationship between x and y
 Quantitative measure of how well the independent variables account for
the outcome
 When R2 is multiplied by 100 it can be thought of as the percentage of the
variance in the dependent variable explained by the independent
variables
Linear Regression
We frequently measure two or more variables on the same individual
to explore the nature of the relationship among these variables.
Regression analysis is a form of predictive modelling technique which
investigates the relationship between a dependent and independent
variable.
Questions to be answered
What is the relationship between Y and X?
How can changes in Y be explained by changes in X?
Linear regression (#2)
Linear regression attempts to model the relationship
between two variables by fitting a linear equation to
observed data
 Explanatory variable (X): can be any types of variables

 Dependent variable: Y

 Dependent variable for linear regression should be

numeric (continuous)
Linear regression (#3)
Goal of linear regression is to find the line that best
predicts dependent variable from independent variables

 Linear regression does this by finding the line that

minimizes the sum of the squares of the vertical distances
of the points from the line
How linear regression works?
Least-squares methods (OLS)
 Calculates the best-fitting line for the observed data by
minimizing the sum of the squares of the vertical deviations from
each data point to the line
 If a point lies on the fitted line exactly, then its vertical deviation is 0

 Goal of regression is to minimize the sum of the squares of the

vertical distances of the points from the line
Linear Regression Model
To understand linear regression, therefore, you must
understand the model
Y = intercept + slope *X =a + β *X+ ε
When X equals 0 the equation calculates that Y equals a
The slope, β, is the change in Y for every unit change in X
Epsilon (ε) represents random variability
The simplest way to express the dependence of the expected
response Yi on the predictor xi is to assume that it is a linear function, say

Constant or intercept:
 Parameter represents the expected response when xi =0

 Slope
 Parameter represents the expected increment in the response per
unit change in xi
 Note: Both α and β are population parameters which are usually
unknown and hence estimated from the data by a and b
Assumptions of linear regression
Linearity :- Relationship between independent and dependent variable is
linear
 To check this assumptions we draw a scatter plot of residuals and y
values
 If the scatter plot follows a linear pattern (i.e. not a curvilinear pattern)
that shows that linearity assumption is met
Linear Regression Assumptions
Normality (Normally Distributed Error Terms): - Error terms follow
the normal distribution. We can use `qnorm' and `pnorm' to check
the normality of the residuals.

Shapirowilk test can also be used

Homoscedasticity of Residuals
Homoscedasticity: - Variance of the error terms is constant.

Is about homogeneity of variance of the residuals.

If the model is well-fitted, there should be no pattern to the

residuals plotted against the fitted values.

If the variance of the residuals is non-constant. it is heteroscedastic.

Homoscedasticity …
The Breusch-Pagan test is used.
p-value < 0.05, reject the hypothesis that states that
variance is homogenous.
Multicollinearity
When there is a perfect linear relationship among the
predictors, the estimates cannot be uniquely computed.
The term collinearity implies that two variables are near perfect
linear combinations of one another.
The regression model estimates of the coefficients become
unstable.
The standard errors for the coefficients can get wildly inflated.
We can use the vif or tolerance to check for multicollinearity.
Multicollinearity…
As a rule of thumb, a variable whose VIF are greater than 5
may need further investigation.

Tolerance, defined as 1/VIF, is used by many researchers to

check on the degree of collinearity.
Multiple Linear Regression

Simple linear regression can be extended to multiple linear

regression models
Two or more independent variables which could be categorical
or continuous
 Response variable to be a function of k explanatory
variables x1; x2; : : : ; xk
Its purposes are mainly:
 Prediction, explanation
 Adjusting effects of confounders
Multiple Linear Regression

Best fitting model

 Minimizes sum of squared residual

 Residuals are deviations between observed response variables
and values predicted by fitted model
 Smaller residuals, closer the fitted line

 Note that residuals i are given by:

Coefficient in multiple linear regressions

beta coefficient measures amount of increase or decrease in

dependent variable for a one-unit difference in continuous
independent variable
If an independent variable has a nominal scale with more
than two categories
 Dummy variables are needed
 Each dummy should be considered as an independent
variable
Assumptions: Specification of model (model building)

Strategies to identify a subset of variables:

Option 1: Variable selection based on significance in

univariable models (simple linear regression):
 All variables that show a significant effect in uni-variable
models are included
 Variable with a p-value of less than 0.25 is taken to MLR
model
Option 2: Variable selection based on significance in
multivariable model:
 Backward

 stepwise

 forward selection
Backward/stepwise/forward selection
Backward selection:
 All variables will be entered in the model
 Then remove step by step until significantly contributing
variables are left in model
 Least contributing variable will be removed first
 Then second least contributor will be removed and so on
Forward selection:
 Model starts with empty (null model)
 Then most significantly contributing variable will enter first
 This continuous step by step until only significantly
contributing variables enter in the model
Stepwise selection
 Same as forward selection

 Even if a variable is included in the model its contribution

will be tested after inclusion of other variable/s
 Variables are added but can subsequently be removed if
they no longer contribute to the prediction
Option 3: Variable selection based on subject matter
knowledge:
 Best way to select variables, as it is not data-driven and it is
therefore considered as yielding unbiased results
Practical session for Multiple linear
regression using STATA
Thank you!!

Parametric Test
No ratings yet
Parametric Test
49 pages
Correlation & Regression Analysis
100% (1)
Correlation & Regression Analysis
39 pages
Correlation Coefficient and R-squared Explained
No ratings yet
Correlation Coefficient and R-squared Explained
66 pages
DAM Class 21-24 Regression Analysis
No ratings yet
DAM Class 21-24 Regression Analysis
93 pages
Stats101A - Chapter 1
No ratings yet
Stats101A - Chapter 1
25 pages
Investigating Variables
No ratings yet
Investigating Variables
15 pages
Correlation and Regression Analysis - Updated
No ratings yet
Correlation and Regression Analysis - Updated
49 pages
CH 4 - Correlation and Regression YARA&LAMA
No ratings yet
CH 4 - Correlation and Regression YARA&LAMA
27 pages
Correlation and Linear Regression
No ratings yet
Correlation and Linear Regression
46 pages
Predictive Analytics & Hypothesis Testing
No ratings yet
Predictive Analytics & Hypothesis Testing
27 pages
Econometrics For Finance
100% (1)
Econometrics For Finance
54 pages
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
No ratings yet
Descriptive Stats (E.g., Mean, Median, Mode, Standard Deviation) Z-Test &/or T-Test For A Single Population Parameter (E.g., Mean)
43 pages
Bivariate Analysis: Correlation & Regression
No ratings yet
Bivariate Analysis: Correlation & Regression
19 pages
Chapter 12
No ratings yet
Chapter 12
12 pages
Correlation & Regression Guide
No ratings yet
Correlation & Regression Guide
25 pages
Regression and Correlation Analysis
No ratings yet
Regression and Correlation Analysis
16 pages
Applied Statistics in Construction
No ratings yet
Applied Statistics in Construction
8 pages
Ra Web
No ratings yet
Ra Web
70 pages
SPSS Guide for Social Science Students
No ratings yet
SPSS Guide for Social Science Students
27 pages
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
No ratings yet
M. Amir Hossain PHD: Course No: Emba 502: Business Mathematics and Statistics
31 pages
Mda-Session-7 Simple Linear Regression
No ratings yet
Mda-Session-7 Simple Linear Regression
75 pages
Correlation and Regression 2020
No ratings yet
Correlation and Regression 2020
63 pages
Correlation and Regression Analyses
No ratings yet
Correlation and Regression Analyses
8 pages
Correlation and Regression Analysis
No ratings yet
Correlation and Regression Analysis
8 pages
Regression and Correlation
No ratings yet
Regression and Correlation
37 pages
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
No ratings yet
Correlation and Regression: Associate Professor Georgi Iskrov, PHD Department of Social Medicine and Public Health
28 pages
Chapter XI Correlation and Regression
No ratings yet
Chapter XI Correlation and Regression
41 pages
QT - Unit 2 - Part B - Regression
No ratings yet
QT - Unit 2 - Part B - Regression
40 pages
Lectures 14 15
No ratings yet
Lectures 14 15
66 pages
Econometrics: Linear Regression Basics
No ratings yet
Econometrics: Linear Regression Basics
52 pages
Topic - Chapter 12 - Regression Models
No ratings yet
Topic - Chapter 12 - Regression Models
1 page
STAT22209 - Chapter 02-Regression Analyisis - 2022
No ratings yet
STAT22209 - Chapter 02-Regression Analyisis - 2022
41 pages
STAT630Slide Adv Data Analysis
0% (1)
STAT630Slide Adv Data Analysis
238 pages
Correlation
100% (1)
Correlation
29 pages
Correlation and Regression
No ratings yet
Correlation and Regression
31 pages
6 Correlation and Linear Regression
No ratings yet
6 Correlation and Linear Regression
32 pages
Intro to Correlation & Regression
No ratings yet
Intro to Correlation & Regression
15 pages
Understanding Linear Regression Models
No ratings yet
Understanding Linear Regression Models
75 pages
14 - Regresi Dan Korelasi
No ratings yet
14 - Regresi Dan Korelasi
34 pages
Chapter 17 Correlation Regression
No ratings yet
Chapter 17 Correlation Regression
42 pages
Bivariate Analysis & Associations
100% (1)
Bivariate Analysis & Associations
66 pages
Chapter 10
No ratings yet
Chapter 10
3 pages
6 ASAP Advanced Statistics-Regression
No ratings yet
6 ASAP Advanced Statistics-Regression
53 pages
Regression vs. Correlation Explained
No ratings yet
Regression vs. Correlation Explained
32 pages
Business Analytics: Data Analysis Methods
No ratings yet
Business Analytics: Data Analysis Methods
83 pages
Module 2 - Section 4 (Linear Regression) - 11
No ratings yet
Module 2 - Section 4 (Linear Regression) - 11
20 pages
Regression Analysis
No ratings yet
Regression Analysis
18 pages
Simple Linear Regression Analysis
No ratings yet
Simple Linear Regression Analysis
7 pages
Linear Regression
No ratings yet
Linear Regression
30 pages
Business Research Methods: Bivariate Analysis: Measures of Associations
No ratings yet
Business Research Methods: Bivariate Analysis: Measures of Associations
66 pages
Linear Regression Analysis - 1
No ratings yet
Linear Regression Analysis - 1
18 pages
Correlation Regression
100% (1)
Correlation Regression
25 pages
Second Stats Packet 24
No ratings yet
Second Stats Packet 24
100 pages
BCSE352E EDA CAT 2 Mod 1,2,5
No ratings yet
BCSE352E EDA CAT 2 Mod 1,2,5
146 pages
Session 15 Regression and Correlation
No ratings yet
Session 15 Regression and Correlation
66 pages
Engineering Regression Techniques
No ratings yet
Engineering Regression Techniques
8 pages
Stats10 - Chapter+4 2
No ratings yet
Stats10 - Chapter+4 2
54 pages
Regression Analysis
No ratings yet
Regression Analysis
6 pages
2017EC WHSBCWEB Weekly Report - 074703
No ratings yet
2017EC WHSBCWEB Weekly Report - 074703
17 pages
Lymphatic System
No ratings yet
Lymphatic System
19 pages
2ndD.C 2017
No ratings yet
2ndD.C 2017
2 pages
14th DC 2017.docx - 071624
No ratings yet
14th DC 2017.docx - 071624
1 page
7 - Work Plan and Budget
No ratings yet
7 - Work Plan and Budget
7 pages
Final Updated Hidar 2018 Schedule
No ratings yet
Final Updated Hidar 2018 Schedule
1 page
Democracy and Federal System in Ethiopia Full English
No ratings yet
Democracy and Federal System in Ethiopia Full English
12 pages
Ethiopia Nurses Association: Course Summary Report Nejeha Abdurahman Ahemed Not Known
No ratings yet
Ethiopia Nurses Association: Course Summary Report Nejeha Abdurahman Ahemed Not Known
2 pages
10TH D.C 2016
No ratings yet
10TH D.C 2016
2 pages
6th D.C Meeting
No ratings yet
6th D.C Meeting
2 pages
11th DC
No ratings yet
11th DC
2 pages
Format of Department Meeting
No ratings yet
Format of Department Meeting
1 page
13th DC 2017.docx - 071619
No ratings yet
13th DC 2017.docx - 071619
2 pages
1rst D.C 2015
No ratings yet
1rst D.C 2015
2 pages
1 Data MNGT CH 1,2,3
No ratings yet
1 Data MNGT CH 1,2,3
28 pages
12 TH DC Meeting
No ratings yet
12 TH DC Meeting
2 pages
Statistical Inference Basics
No ratings yet
Statistical Inference Basics
53 pages
Manuscript Publication
No ratings yet
Manuscript Publication
13 pages
Neonatology Part Neural Tube Defects (NTDS)
100% (1)
Neonatology Part Neural Tube Defects (NTDS)
39 pages
Unit One - or
No ratings yet
Unit One - or
28 pages
Fistula
No ratings yet
Fistula
22 pages
IUCD Insertion and Removal
No ratings yet
IUCD Insertion and Removal
2 pages
1introduction and Descriptive Stats
No ratings yet
1introduction and Descriptive Stats
134 pages
Diaphram 101043
No ratings yet
Diaphram 101043
1 page
Common Minor Pregnancy Disorders Explained
No ratings yet
Common Minor Pregnancy Disorders Explained
22 pages
Fetal Skull
No ratings yet
Fetal Skull
36 pages
HYDATIDIFORM MOLE Session 4
No ratings yet
HYDATIDIFORM MOLE Session 4
12 pages
Micrgynon: Oral Contraceptive Overview
No ratings yet
Micrgynon: Oral Contraceptive Overview
53 pages
Analgsc
No ratings yet
Analgsc
29 pages
Contraceptive
No ratings yet
Contraceptive
50 pages
Dynamic Separator Karthi
50% (2)
Dynamic Separator Karthi
17 pages
Rahul Cadence Report
No ratings yet
Rahul Cadence Report
13 pages
Power Production & Refrigeration
100% (1)
Power Production & Refrigeration
30 pages
Chapter 2 - Physical Layer
No ratings yet
Chapter 2 - Physical Layer
84 pages
Beam Deflection Analysis
No ratings yet
Beam Deflection Analysis
12 pages
Android-x86 Developer Update
No ratings yet
Android-x86 Developer Update
53 pages
Practice 1 PDF
No ratings yet
Practice 1 PDF
4 pages
Fundamental Mechanics of Fluids 3rd Edition by Iain Currie ISBN 0824708865 9780824708863 PDF Download
100% (1)
Fundamental Mechanics of Fluids 3rd Edition by Iain Currie ISBN 0824708865 9780824708863 PDF Download
50 pages
Aakash Trigonometry
No ratings yet
Aakash Trigonometry
7 pages
Crane Lifting Capacities and Ratings Guide
No ratings yet
Crane Lifting Capacities and Ratings Guide
2 pages
Tooling For Composites and Aerospace Materials: Guhring Coating and Reconditioning Services The Tool Company
No ratings yet
Tooling For Composites and Aerospace Materials: Guhring Coating and Reconditioning Services The Tool Company
4 pages
Industrial Power Quality Solutions
No ratings yet
Industrial Power Quality Solutions
11 pages
Transformer Production Automation Guide
No ratings yet
Transformer Production Automation Guide
2 pages
1214 - Year - B.Sc. (Part-I) (Old) Sem-I Subject - MAT 101 - Mathematics Paper-I (Algebra and Trigonometry)
No ratings yet
1214 - Year - B.Sc. (Part-I) (Old) Sem-I Subject - MAT 101 - Mathematics Paper-I (Algebra and Trigonometry)
2 pages
Tme39546c en
No ratings yet
Tme39546c en
6 pages
Thermoregulation (Final)
No ratings yet
Thermoregulation (Final)
9 pages
Windows 10 System Info and Drivers
No ratings yet
Windows 10 System Info and Drivers
31 pages
Fin 367 Sample Problems Chapters 10-11, 13-16
No ratings yet
Fin 367 Sample Problems Chapters 10-11, 13-16
21 pages
Rep Data 2 Answers
No ratings yet
Rep Data 2 Answers
3 pages
PDF 73
0% (1)
PDF 73
3 pages
Understanding RPA and UiPath Components
No ratings yet
Understanding RPA and UiPath Components
34 pages
18CSL58 - DBMS Lab Manual 2018 Scheme - KKS
75% (4)
18CSL58 - DBMS Lab Manual 2018 Scheme - KKS
113 pages
ImmuniWeb SSL Security Test Report - bDoTMy7U
No ratings yet
ImmuniWeb SSL Security Test Report - bDoTMy7U
12 pages
MasteringArchiMateEdition3 20171022 Screensyntax Optimized
100% (1)
MasteringArchiMateEdition3 20171022 Screensyntax Optimized
56 pages
Accident Detection and Prevention System Using GSM & GPS With Traffic Clearence For Ambulance
No ratings yet
Accident Detection and Prevention System Using GSM & GPS With Traffic Clearence For Ambulance
4 pages
Uninterruptible Power Supply Solutions
No ratings yet
Uninterruptible Power Supply Solutions
46 pages
Ex. No.3 Digital Signature Standard
No ratings yet
Ex. No.3 Digital Signature Standard
2 pages
Technical Test Coding Questions
No ratings yet
Technical Test Coding Questions
12 pages
Blast Design PDF
No ratings yet
Blast Design PDF
244 pages

6 Continuous Data Analysis

Uploaded by

6 Continuous Data Analysis

Uploaded by

Basic Biostatistics

Collage of Health and Medicine Sciences

School of Public Health

Continuous Data Analysis

By Adisu Birhanu (Assistant prof. of Biostatistics)

Describe Continuous variable and method of analyses

Describe relationship between continuous variables

Interpret the outputs from linear regression models

Data analysis methods such as scatter plot, line graphs and

More advanced methods for inferential analysis of continuous data

There are three different t-tests

Two independent sample t-test

Paired sample t-test

ANOVA is used for IV with more than two groups

 It is used to compare the estimate of a sample with

 there is one group being compared against a standard value

Used to compare mean of two unrelated or independent

the groups come from two different populations (e.g., two

Hypothesis: Ho: Mean of group 1 = Mean of group 2

HA: Mean of group 1 ≠ Mean of group 2 ,

The 95% confidence interval for the difference of means

The p-value is less than 0.05

Hence, we conclude that there is significant difference in

 In this case the groups come from a single population (e.g.,

 Hypothesis: Ho: Mean difference = 0 Vs HA: Mean

The means of more than two distributions need to be

ANOVA do not tell you which group is different, but only

To know which group is different, we used post hoc tests

For K means (K> 3).

HA : at least one of the means is different.

There is one factor of grouping (one way ANOVA)

Consider infant data: Outcome variable: birth

Factor variable: residence (urban= 1, semi-urban=

Objective: compare weight among the three place

We can conclude that at least one of the groups' means differ

Now the question is: which groups are different?

Answering this question requires multiple comparisons (post

Bonferroni, Tukey and scheffe are commonly used methods.

Bonferroni method corrects probability of Type I error for the

Correlation is used to quantify the degree to which two

 The closer to -1, the stronger the negative linear relationship

 The closer to 1, the stronger the positive linear relationship

 The closer to 0, the weaker the linear relationship

 Dependent variable for linear regression should be

 Linear regression does this by finding the line that

 Goal of regression is to minimize the sum of the squares of the

Shapirowilk test can also be used

Is about homogeneity of variance of the residuals.

If the model is well-fitted, there should be no pattern to the

If the variance of the residuals is non-constant. it is heteroscedastic.

Tolerance, defined as 1/VIF, is used by many researchers to

Simple linear regression can be extended to multiple linear

Best fitting model

 Minimizes sum of squared residual

 Note that residuals i are given by:

beta coefficient measures amount of increase or decrease in

Strategies to identify a subset of variables:

Option 1: Variable selection based on significance in

 Even if a variable is included in the model its contribution

You might also like