Survival analysis
Recommended texts
Survival analysis: A Self-learning text
David G. Kleinbaum, Springer.
An excellent tutorial on survival analysis,
including Cox proportional hazards
Primer of Biostatistics
Stanton Glantz, McGraw-Hill
Chapter 11 gives a very understandable
introduction to survival analysis
We use survival analysis:
To describe the survival times of
members of a group
Survival function
Hazard function
Kaplan-Meier curves
We use survival analysis:
To compare the survival times of two or
more groups
Log-rank test
We use survival analysis:
To describe the effect on survival times of
a continuous variable (e.g., gene
expression)
Cox proportional hazard regression
Gives relative risk for a unit change in the
variable
E.g., a unit change in the expression of
gene A gives a 2-fold increase in relative
risk
Survival analysis: definitions
Event: Death, disease occurrence,
disease recurrence, recovery, or other
experience of interest
Time: The time from the beginning of an
observation period (e.g., surgery) to (a) an
event, or (b) end of the study, or (c) loss of
contact or withdrawal from the study.
Censoring / Censored observation: When
a subject does not have an event during
the observation time, they are described
as censored, meaning that we cannot
observe what has happened to them
subsequently.
A censored subject may or may not have
an event after the end of observation time.
Survivor function S(t):
S(t) = The probability that a subject
survives longer than time t.
Survivor function
S(t) = The probability that a subject
survives longer than time t.
The survivor function is often
expressed as a Kaplan-Meier curve
Vertical drop indicates an event
Kaplan-Meier survival curves
Kaplan-Meier curves and the logrank test
are useful when the predictor variable is
categorical (e.g., drug vs. placebo) , or
takes a small number of values (e.g., drug
doses 0, 20, 50, and 100 mg/day) that can
be considered to be categorical.
The logrank test and K-M curves dont
work easily with continuous predictors
such as gene expression values.
To do survival analysis using continuous
variables such as gene expression or
White Blood Count (WBC), we use Cox
Proportional Hazards Regression Analysis
(Cox PH regression).
Hazard function h(t)
(1) A conceptual, approximate definition:
h(t) is a function of the probability of an event
in the time interval [t, t+i], given that the
individual has survived up to time t
(2) The formal calculus definition:
Proportional hazards
Consider two groups, A and B
If the hazard function h(t) for group A = 2*h(t) for
group B, then the hazard rates for the two
groups are proportional. The proportionality
constant is 2. The hazards are proportional at all
times t.
In general, if the hazard rate for one group is a
constant multiple of the hazard for a second
group, then the hazards are proportional.
Cox Proportional Hazards
regression
For continuous predictors such as gene
expression values we can use Cox
Proportional Hazard (PH) regression
models.
Cox PH can handle categorical data (e.g.,
treatment groups) by encoding as dummy
{0,1} variables.
Cox PH is a special type of survival
analysis
handles censoring
and a special type of regression analysis
handles continuous and categorical predictor
variables
Cox proportional hazard regression
To describe the effect on survival times of
a continuous variable such as gene
expression
Cox PH gives relative risk for a unit change in
the variable
E.g., a unit change in the expression of
gene A gives a 2-fold increase in relative
risk
How does the Cox model work?
Hazard ratio (HR)
Similar concept to odds ratio and relative
risk
HR = The ratio of two hazard functions
To get the Hazard for Group 2, multiply the
Hazard for Group 1 by the Hazard ratio
HR = 4.5 for Rx, meaning that the risk (of
relapse) for group 2 is 4.5 times that of
group 1.
If HR = 1 then Group 1 h(t) = Group 2 h(t)
P-value for the Hazard Ratio is the
probability that the HR= 1,
P-value = 1 means that there is no
difference in the hazard rates.
P-value < 0.05 means a significant
difference in the hazard rates.
In the Cox model, when all the Xs are
zero, the formula reduces to the baseline
hazard, h0(t)
In a Cox PH model, we can determine the
significant coefficients (variables) without
specifying the baseline hazard.
Use maximum likelihood (ML) to
determine coefficients
Given a Cox model and the coefficients,
we can subsequently estimate the
baseline hazard function and the survival
curves.
Cox Proportional Hazards regression
analysis assumes that the hazards are
proportional (constant ratio) over time.
If the hazards are not proportional, then
the model is wrong, and conclusions are
likely to be wrong.
Software does checks for proportional
hazards assumption.
Cox models can also be used with timevarying covariates, such as gene
expression values that change over time.