Introduction to SPSS
Pimrapat Gebert, MPH
Institute of Biometry and Clinical Epidemiology
[email protected]UNIVERSITÄTSMEDIZIN BERLIN
1
Contents
Time Day 1
9.15 – 9.30 Introduction
Starting SPSS
SPSS windows: Data editor, SPSS viewer, Syntax
9.30 – 10.15 Getting the data into SPSS Data management
- Manually entering data - Variable name
- Opening from SPSS file - Value label
- Opening from Excel file - Compute
- Recode into same
- Recode into difference
- Visual Bander (Cont. -> Cat.)
10.15 – 11.00 Describing Data
- Frequency - Histogramm
- EXAMINE - Skewness/Kurtosis
- Barchart/Graphic
11.00 – 11.15 Pause
11.15 – 12.30 Data analysis
- Crosstabs and Chi-square test, McNemar‘s test
- Comparing means: Independent t-test, Paired t-test
- Non-parametric: Mann-Whitney U test, Wilcoxon Signed Rank test
2
Contents
Time Day 2
9.15 – 10.15 Comparing >2 Groups
- One-Way Analysis of Variance (ANOVA) + Post-hoc tests
- Kruskal-Wallis test (non-parametric)
10.15 – 11.00 Correlation
- Pearson correlation
- Spearman‘s rank correlation
- Linear regression
- Scatter plots with adding a regression line
11.00 – 11.15 Pause
11.15 – 11.45 - ROC
- Logistic regression
11.45 – 12.30 Survival analysis
- Kaplan-Meier Curve
- Log-rank test
- Cox regression
3
Introduction to SPSS
SPSS
IBM® SPSS® Statistics
Statistical Package for Social Sciences
Superior Performing Software System
Statistical Product and Service Solutions
PASW (Predictive Analytics Software)
IBM SPSS Statistics Data File Structure
•Rows (records) are cases. Each row represents
a case or an observation.
•Columns (fields) are variables. Each column represents a variable
or characteristic that is being measured.
Ex. Age or Gender etc…
1. 2. 3. 4. 5. 6. 7.
Variable Variable Variable Variable Variable Variable Variable
1. 1.Record 1.Record 1.Record 1.Record 1.Record 1.Record 1.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
2. 2.Record 2.Record 2.Record 2.Record 2.Record 2.Record 2.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
3. 3.Record 3.Record 3.Record 3.Record 3.Record 3.Record 3.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
4. 4.Record 4.Record 4.Record 4.Record 4.Record 4.Record 4.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
5. 5.Record 5.Record 5.Record 5.Record 5.Record 5.Record 5.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
6. 6.Record 6.Record 6.Record 6.Record 6.Record 6.Record 6.Record
Record 1.Variable 2.Variable 3.Variable 4.Variable 5.Variable 6.Variable 7.Variable
Dependent Sample (Paired-Sample)
Dependent Tests are always calculated between two Variables (Columns).
The respective Values (before/after) must always be in one Line!
Example: NPAR TESTS WILCOXON = WT0 WITH WT1
Weight (kg) ID GROUP WT0 WT1
Before After 1 1 75 73
75 73 2 1 88 79
88 79
3 2 69 71
69 71
93 88 4 1 93 88
71 71
5 1 71 71
65 63
80 75 6 2 65 63
83 86 7 2 80 75
8 1 83 86
Independent Samples
For Independent Tests, the existing Cases must be divided into two or more Groups.
The variable values "Verum" Or "Placebo" must be defined and entered as a Variable.
In SPSS: NPAR TESTS M-W = WT1 BY GROUP (1,2)
Example: Weight after experiment
1. Placebo: 73, 79, 88, 71, 86
2. Verum: 71, 63, 75
Group = 1 Group = 2
ID GROUP WT0 WT1 (Placebo) (Verum)
1 1 75 73 73
2 1 88 79 79
3 2 69 71 71
4 1 93 88 88
5 1 71 71 71
6 2 65 63 63
7 2 80 75 75
8 1 83 86 86
7
Log-in to the Computer
• Using Charité account
• Using Local log-in
User = passwort
Download data file: https://biometrie.charite.de/
Service Unit Biometrie
Interne Fortbildungskurse
Einführung in SPSS
“Kurse in englische Sprache”
Accompanying material (Download example data file)
(Pls. click right and save under …)
***Change data type from .htm .sav***
8
9
Download
SPSS data file
Download Handout
10
Changing the language in SPSS
Menu bar
Edit
Options…
11
SPSS Windows: Data View
12
SPSS Windows: Variable View
13
SPSS Windows: Output
14
SPSS Windows: Syntax
15
Save SPSS files
All window (Data file, Output, Syntax) will be saved separately**
In each window:
From the menus choose:
File
Save as …
Data file datafile.sav
Output file outputfile.spv
Syntax file syntaxfile.spo
16
Getting the data into SPSS
Manually entering data
Opening from SPSS file
Opening from Excel file
17
Manually entering data
Case Record Form (CRF)
Demographic data:
Patient number □□□ No
Clinic [1=University clinic 2=Local hospital] □ Clinic
Categorical data
Gender [0=Male 1=Female] □ Gender
Age (years) □□ Age
Height (cm) □□□ Height Continuous data
Weight (kg) □□□.□ Weight
Baseline:
Enrollment date [DD, MM, YYYY] □□.□□.□□□□ Enroll_date
Low blood count:
- after 1 hr □□ LBC_1h_v0
- after 2 hrs □□□ LBC_2h_v0
GOT [U/l] □□□ GOT_v0
GPT [U/l] □□□ GPT_v0
18
Manually entering data
Menu Bar
File
New
Data
*Variable name=short, has meaning, no space
19
20
Ref: https://www.ibm.com/support/knowledgecenter/en/SSLVMB_25.0.0/statistics_mainhelp_ddita/spss/base/idh_defvar_type.html 21
22
23
Opening from Excel file
Change type of file
24
Getting the data into SPSS
Manually entering data
Opening from SPSS file
Opening from Excel file
Data management
Variable name
Value label
Compute
Recode into same
Recode into difference
Visual Bander (Cont. -> Cat.)
25
Compute
• You can compute values for numeric or string (alphanumeric) variables.
• You can create new variables or replace the values of existing variables.
For new variables, you can also specify the variable type and label.
• You can compute values selectively for subsets of data based on logical
conditions.
• You can use a large variety of built-in functions, including arithmetic
functions, statistical functions, distribution functions, and string functions.
From the menus choose:
Transform
Compute Variable
26
Compute BMI (kg/m2) =
( )
From the menus choose:
Transform
Compute Variable
Variable name:
Height = Height (cm)
Weight = Weight (kg)
Syntax:
COMPUTE bmi=Weight / ((Height / 100) ** 2).
EXECUTE.
27
Recode
Recode into same: to reassign the values of existing variables
gender gender
Old variable name
0=Male 1=Male !!lose the old value
0 1
1=Female 2=Female
1 2
Recode into difference: to assign the values into new variables, but
keep the existing variables as original. (Recommend!)
gender gender_n
New variable name
0=Male 1=Male !!keep the old valiable
0 1
1=Female 2=Female
1 2
28
Recode into difference
From the menus choose:
Transform
Recode into Different Variables…
29
Recode into same
From the menus choose:
Transform
Recode into Same Variables…
30
Create categorical variables from continuous variables
Age Age_g
23 1 = <40 yrs
35 1
40 2 = 40 – 49 yrs
45 2
50 3 = ≥50 yrs
60 3
Commands:
Recode into difference
Visual binning
31
Create categorical variables from continuous variables
From the menus choose:
Transform
Recode into Different Variables…
32
Create categorical variables from continuous variables
From the menus choose:
Transform
Visual Binning…
33
Create categorical variables from continuous variables
From the menus choose:
Transform
Visual Binning…
34
Getting the data into SPSS
Manually entering data
Opening from SPSS file
Opening from Excel file
Data management
Variable name
Value label
Compute
Recode into same
Recode into difference
Visual Bander (Cont. -> Cat.)
35
Exercise 1:
Using SPSS_course.sav
BMI (kg/m2) =
( )
Create new variable:
Create BMI as Category variable with label value
Code Category BMI (kg/m2)
1 Normal <25.0
2 Overweight 25.0 – 29.9
3 Obese I >=30.0
Recode BMI group into new variable Normal vs. Overweight+Obese
36
Compute BMI (kg/m2) =
( )
From the menus choose:
Transform
Compute Variable
Variable name:
Height = Height (cm)
Weight = Weight (kg)
Syntax:
COMPUTE bmi=Weight / ((Height / 100) ** 2).
EXECUTE.
37
Create BMI as Category variable with label value
From the menus choose:
Transform
Visual Binning…
38
Recode BMI group into new variable
Normal vs. Overweight+Obese
39
Syntax
***Exercise 1
*Calculate BMI.
COMPUTE bmi=weight / ((height / 100) ** 2).
EXECUTE.
*Create BMI group.
* Visual Binning.
*bmi.
RECODE bmi (MISSING=COPY) (30 THRU HI=3) (25 THRU HI=2) (LO THRU HI=1) (ELSE=SYSMIS)
INTO bmi_g.
VARIABLE LABELS bmi_g 'bmi (Binned)'.
FORMATS bmi_g (F5.0).
VALUE LABELS bmi_g 1 '< 25,00' 2 '25,00 - 29,99' 3 '30,00+'.
VARIABLE LEVEL bmi_g (ORDINAL).
EXECUTE.
*Create BMI into 2 groups.
RECODE bmi_g (1=1) (MISSING=Copy) (2 thru 3=2) INTO bmi_2g.
EXECUTE.
40
Break!
https://pixabay.com/de/lehre-klassenzimmer-lehrer-bildung-311356/
12
Describing data
Frequency
Explore
Histogramm
Skewness/Kurtosis
Barchart/Graphic
42
Frequency: Categorical variables
From the menus choose:
Analyze
Descriptive Statistics
Frequencies…
43
44
From the menus choose:
Analyze
Descriptive Statistics
Frequencies…
45
46
Frequency: Continuous variables
From the menus choose:
Analyze
Descriptive Statistics
Frequencies…
47
48
Descriptive separate by Category Variable
From the menus choose:
Analyze
Descriptive Statistics
Explore…
49
50
51
52
Statistics overview
Categorical vs. Categorical
Independent: Chi-square test or Fisher’s exact test
Non-independent: McNemar’s test or Binomial exact test
Continuous vs. Categorical
2 Groups (Independent): Independent t-test or Mann-Whitney U test
2 Groups (Paired): Paired t-test or Wilcoxon-signed rank test
>2 Groups: ANOVA or Kruskal-Wallis test
Continuous vs. Continuous
Pearson’s correlation or Spearman’s rank correlation
53
Data analysis: Crosstab + Chi-square test
To compare BMI groups between Male and Female
From the menus choose:
Analyze
Descriptive Statistics
Crosstabs…
54
55
Data analysis: Independent t-test and Paired t-test
• Independent t-test: to compare BMI between Male and Female
• Paired t-test: to compare Cholesterol at baseline and Visit 2
56
Data analysis: Independent t-test
From the menus choose:
Analyze
Compare Means
Independent-Samples T Test…
57
Report
There was no statistically difference of BMI between Male and Female,
Mean difference=0.41 (95%CI: -0.57 to 1.39); p-value=0.405
58
Data analysis: Paired t-test
From the menus choose:
Analyze
Compare Means
Paired-Samples T Test…
59
60
Data analysis: Mann-Whitney U Test
From the menus choose:
Analyze
Nonparametric tests
Legacy Dialogs
2 Independent Samples
61
𝒁
𝑪𝒐𝒉𝒆𝒏 𝒔 𝒓 =
𝑵
Z n Effect size
5.079 132 0.442
Intermediate effect size
Online ES calculation: https://www.psychometrica.de/effect_size.html 62
gender Alcohol consumption (g) P-value Effect size
Median (IQR) (Cohen‘s r)
Male 80 (22, 110) <0.001 0.442
Female 0 (0, 60)
63
From the menus choose: From the menus choose:
Data Analyze
Split files… Descriptive Statistics
Frequencies…
Don‘t forget to set Analyze all cases
In Split files again!!!
64
Without Split file With Split file by gender
65
From the menus choose:
Data Don‘t forget to set Analyze all cases
Split files… In Split files back!!!
66
Data analysis: Wilcoxon-signed rank test
From the menus choose:
Analyze
Nonparametric tests
Legacy Dialogs
2 Related Samples
67
𝒁
𝑪𝒐𝒉𝒆𝒏 𝒔 𝒓 =
𝑵
Z n Effect size
6.207 132 0.54
Large effect size
Online ES calculation: https://www.psychometrica.de/effect_size.html 68
Contents
Time Day 2
9.15 – 10.15 Comparing >2 Groups
- One-Way Analysis of Variance (ANOVA) + Post-hoc tests
- Kruskal-Wallis test (non-parametric)
10.15 – 11.00 Correlation
- Pearson correlation
- Spearman‘s rank correlation
- Linear regression
- Scatter plots with adding a regression line
11.00 – 11.15 Pause
11.15 – 11.45 - ROC
- Logistic regression
11.45 – 12.30 Survival analysis
- Kaplan-Meier Curve
- Log-rank test
- Cox regression
69
Statistics overview
Categorical vs. Categorical
Independent: Chi-square test or Fisher’s exact test
Non-independent: McNemar’s test or Binomial exact test
Continuous vs. Categorical
2 Groups (Independent): Independent t-test or Mann-Whitney U test
2 Groups (Paired): Paired t-test or Wilcoxon-signed rank test
>2 Groups: ANOVA or Kruskal-Wallis test
Continuous vs. Continuous
Pearson’s correlation or Spearman’s rank correlation
70
Statistics overview: Regression models
Continuous outcome
Linear regression model
Binary outcome
Logistic regression model
Ordinal outcome
Ordinal logistic regression model
Multiple categorical outcome
Multinomial logistic regression model
Time-to-event outcome
Cox-proportional hazard model
Longitudinal study
Generalized Estimating Equation (GEE)
Mixed/Multilevel model
71
From the menus choose:
Transform
Visual Binning…
72
Data analysis: One-Way ANOVA
From the menus choose:
Analyze
Compare Means
One-Way ANOVA…
73
74
75
Data analysis: Kruskal-Wallis Test
From the menus choose:
Analyze
Nonparametric tests
Legacy Dialogs
K Independent Samples
76
77
From the menus choose:
Analyze
Nonparametric tests
Independent Samples…
78
79
Double click at result table!
To show editor window
Change View from Independent Samples Test View
to Pairwise Comparisons
80
Data analysis: Correlation
From the menus choose:
Analyze
Correlate
Bivariate…
81
82
Data analysis: Linear regression
From the menus choose:
Analyze
Regression
Linear…
83
84
From the menus choose:
Graphs
Chart Builder…
85
86
From the menus choose:
Graphs Scatter plot by gender
Chart Builder…
87
88
Break!
https://pixabay.com/de/lehre-klassenzimmer-lehrer-bildung-311356/
12
Data analysis: ROC
From the menus choose:
Analyze
ROC Curve…
90
91
Data analysis: Logistic Regression
From the menus choose:
Analyze
Regression
Binary Logistic…
92
93
The odds of alcohol drinking in female were 87% less than Male.
The odds of alcohol drinking in male was 7.58 (1/0.132) times higher than female.
The odds of alcohol drinking in Pt.who was younger than 50 were 1.2 times higher than
the Pt. who was older than 60 years.
94
Calculation Time from Date Variable
95
96
Data analysis: Kaplan-Meier Curve
From the menus choose:
Analyze
Survival
Kaplan-Meier…
97
98
From the menus choose:
Analyze
Survival
Kaplan-Meier…
99
100
Data analysis: Cox-regression
From the menus choose:
Analyze
Survival
Cox regression…
101
102
Male at any time point during the study period were likely to die 1.94 times compared
to female, and we are 95% confident that the true value is lying between 0.86-4.37.
Pt. who had BMI overweight at any time point during the study period had the risk of
death 3.33 times and Pt. who had BMI obese had 4.29 times compared to the Pt.
who had normal BMI.
103
http://www.picserver.org/images/highway/phrases/exercise.jpg
13
Create „Group variable“ from „No“
1= <45
2= 45 – 100
3= >100
Compare „Chol_v0“, „TG_v0“ and „Hb_v0“ between groups by
choosing the appropriate statistical test
ANOVA or Kruskal-Wallis test
Performing the correlation between „Age“ and „Chol_v0“ separately
by groups
Pearson correlation and Scatter plot separately by groups
Create „Cholesterol group“ from „Chol_v0“
0 = Normal (Chol_v0<200)
1 = High Cholesterol (Chol_v0>=200)
Which factors (Age, gender, bmi_2g) associate with High
Cholesterol?
Which statistics will you perform?
105
From the menus choose:
Transform
Visual Binning…
106
From the menus choose:
Analyze
Descriptive Statistics
Explore…
107
108
109
110
111
112
From the menus choose:
Analyze
Compare Means
One-Way ANOVA…
113
114
115
From the menus choose:
Analyze
Nonparametric tests
Legacy Dialogs
K Independent Samples
116
117
From the menus choose: From the menus choose:
Data Analyze
Split files… Correlate
Bivariate…
118
119
From the menus choose:
Transform
Visual Binning…
120
From the menus choose:
Transform
Visual Binning…
121
From the menus choose:
Analyze
Regression
Binary Logistic…
122
123