840 BRITISH MEDICAL JOURNAL VOLUME 283 26 SEPTEMBER 1981
Statistics in Question SHEILA M GORE
ASSESSING METHODS-
SURVIVAL
SLIli alive at the date of analysis 30 June 1981, giving
survival time of at least 533 days; the information that
the patient died on 24 August 1981 is not available at
the time of preparing the trial report
-patient B entered the trial on 31 January 1980 and
died on 9 October 1980; exact survival time is 252 days
-patient C entered the trial on 27 February 1980 and
was lost to follow-up on 1 April 1980, the latest date of
contact with the patient; survival time for patient C is
(right) censored at 34 days
-patient D was admitted to the trial on 16 August 1980
and was still alive at the date of analysis, though lost to
follow-up later; being alive on 30 June 1981 means that
the survival of this patient is (right) censored at 318
days, another way of saying that the patient's survival
time is at least 318 days
The organisation and graphical representation of survival data
are emphasised in this article. The comparison of lifetables -at the time of analysis survival is (right) censored for
using the logrank test' is recommended because it is more patients A, D, and C because patients A and D are still
efficient than the comparison of survival rates at one point in alive and patient C was lost to follow-up; survival time
time. Moreover, comparison of n-year survival rates can be is exact for patient B, who has died
misleading if investigators decide to make the comparison at a
given time just because the observed difference is greatest then.
Bad practice such as this does not escape notice if lifetables are
necessarily reported. COMMENT
Looking at lifetables readily suggests which factors are The following discussion is about time to death, but the event
important for prognosis; plotting the rates of mortality year by that is most important could as well be tumour regression,
year in addition to lifetables gives insight to the disease process development of metastasis, rehabilitation after a stroke, reinfarc-
and is the first step in assessing statistical models for describing tion, discharge from hospital, or regaining birth weight for a
survival. Diversity of times to peak hazard and the unimportance preterm infant. Several of these events are less easily defined
in later follow-up of prognostic factors that were relevant at the
time of diagnosis may describe the pattern of mortality when
there are long-term survivors, as in breast cancer.2 Plotting the Serial patient entry x Entry
lifetable on a logarithmic scale is one method of assessing * Death
graphically whether survival is exponential. Other methods may o Loss to follow-up
be easier to interpret, however. Patient A
Patient B x 0
Patient C X-0
Organising survival data Patient D x
(17) Summarise the survival data at the time of analysis for
patients A to D who were entered serially in a clinical trial I
comparing treatments for advanced cancer of the pancreas. Trial begins Date of analysis
-patient A entered the trial on 14 January 1980 and was Date 7Jan 1980 I Nov 1980 30 June 1981
BRITISH MEDICAL JOURNAL VOLUME 283 26 SEPTEMBER 1981 841
than death, so that in practice the study may be more difficult because deaths from other causes are underreported,3 a pro-
but in principle the same type of information is being recorded portion of them being described as deaths from cancer. When
-namely, the time from entering the study to the occurrence analysing overall survival all deaths are accounted for irrespective
of the particular event. of the cause. One justification for this type of analysis is that
In most clinical trials patients enter serially-in the order in what is important to the patient is life. To assess curability,
which they are referred. The first step in organising survival however, it is useful to have some measure2-such as the excess
data, after fixing the date for analysis, is to update the follow-up death rate or the ratio of observed to expected deaths-which
on all patients so that each case history is summarised by only allows for the normal mortality in the general population. In one
one of the following three descriptions. Description (1): patient sense, therefore, death from cancer and deaths from other causes
died before analysis on (give the date, and cause of death are competing risks. Analyses of duration of survival corrected
if inferences about competing risks are to be made). Description for age and the study of curability and of competing risks are
(2): patient is "known" to be alive at the time of analysis (pre- problems that should be referred to a statistician.
sumption is not good enough, because of the possibility of bias).
Description (3): latest date before analysis on which the patient
was known to be alive is (give date and reason for informa- Lifetables and annual rate of mortality
tion being incomplete, such as the patient is abroad or no reply
has been received to letters requesting follow-up information or (18) Figure 1 shows the lifetablefor breast cancer patients referred
the case notes have been mislaid, etc). The next step is to check to the Western General Hospital, Edinburgh in 1956. Estimate
that the sequence of dates is correct for a given patient and also the probability that a patient referred for treatment of breast
that trial numbers and entry dates correspond. Any inversion- cancer survives for 10 years or more from date of diagnosis.
such as patient 12 was entered on 2 May 1981 while patient 13
was entered earlier on 16 April 1981-suggests an error in
transcribing dates or in the trial procedures. Having checked the -for any time t, the lifetable gives the estimated prob-
orderliness of the data, the third step is to convert dates into ability of surviving for at least time t
survival time in days. Survival time is exact for patients who
have died and is right-censored for patients who satisfy descrip- -always plot survival data as a lifetable (another name
tions (2) and (3), these patients having survived for a time of for the lifetable is Kaplan-Meier experimental survival
at least so many days. The information is now in a form that curve)
makes it straightforward to estimate the lifetable for different
treatment groups as they were randomised and to compare the -plot also the annual proportion dying, which is
duration of survival, using, for example, the logrank test. An related to what statisticians call the hazard function
excellent guide to the comparison of lifetables has been published
by Peto et al.'
-28% of patients with breast cancer survive for 10
years or more from date of diagnosis
COMMENT
A simple identity gives the principle on which calculation of
lifetables is based. The identity is that the probability of
surviving for (t + 1) days from diagnosis equals the probability of
surviving for t days, multiplied by the probability of surviving the
next day. The last term, the probability of surviving day (t + 1),
is got by counting (a) how many patients are at risk (t +1) days
Censored survival times are dealt with easily when computing after diagnosis-only patients whose recorded survival time is
lifetables (experimental survival curves) for different treatment (t +1) days or more are in the group at risk; (b) how many
groups. We assume, however, that the reasons for censoring are patients died on day (t + 1); and then computing (c) the
independent of treatrrment. If this is not so the comparison of probability of surviving that day as one minus the proportion who
lifetables could be biased. In particular, ensure that the follow- died-the proportion who died being the number of deaths on
up of patients who have been withdrawn from treatment- day (t + 1) divided by the number of patients in the group at
because of toxicity, for example-is as intense as for patients risk. Notice that the lifetable is a step function that estimates
who continue on treatment. The former group may be dis-
advantaged also in terms of survival, so that if toxic reactions
occur more often in one treatment group failure to follow-up 1-0
such patients shows that group in a relatively better light than is
warranted. Notice also that the detection of a local recurrence, >
0-8.
say, or metastasis tends to be earlier when the interval between
follow-up visits is short, so that routine appointments should be L.
arranged similarly for all treatment groups. The reasons for
censoring are important even when estimating a single lifetable X 04
as the following example shows. In the Western General 2
Hospital breast cancer series,' patients were dismissed from 0-2
follow-up at the 20th anniversary of first treatment if there was
no recurrent disease. Patients who continued to be followed up . . . . . . - . - . . . . rmmq
after the 20th anniversary had an unfavourable prognosis Survivd time, t 0 1 2 3 4 5 6 8 10 12 14 16 18 20
therefore, and the duration of their survival would be a poor Number of (years)
reflection of the outcome for all 20-year survivors. Langlands patients alive 347 240 150 98 65 44
et a12 avoided potential bias by censoring survival at the 20th at time t
anniversary for all patients in the Western General Hospital
breast cancer series. Lifetable for 347 patients with breast cancer referred to the Western
Certified cause of death is often unreliable in cancer series General Hospital, Edinburgh.
842 BRITISH MEDICAL JOURNAL VOLUmE 283 26 SEPTEMBER 1981
the true survival curve in an unbiased way and that steps occur that the rate of mortality decreases for stage 4 patients: more
only at times of death. Readers are referred for a worked than 60°, die within one year of diagnosis. For patients with
example and further explanation to the expository paper by international stage 1, 2, or 3 breast cancer a different pattern
Peto et al.1 This gives sound advice about the design4 and emerges. Hazard increases during the first one to four years
analysis' of randomised clinical trials that require prolonged after diagnosis and then slowly declines. Peak hazard is not
observation of each patient. only greater but also occurs earlier in stage 3 and stage 2
By plotting the lifetable authors report survival data more disease. By the 10th year after diagnosis the annual percentage
completely than by summarising 5-year and 10-year survival dying is similar for survivors from all three stages-remembering
rates. The lifetable gives for any time t in the range of the that a standard error qualifies the estimators. In particular,
observations the estimated probability of surviving for at least survivors from stage 3 breast cancer seem to experience the same
time t. In addition, comparison of lifetables gives a preliminary rate of mortality at 15 years after diagnosis as do survivors from
impression of which factors are important for prognosis. stages 1 and 2.
Figure 2 shows that international stage separates patients with Reference points for choosing a description of survival are
exponential survival (constant annual proportion dying), the
Weibull family (risk of dying is monotone-it increases (or
Life table international stage decreases) steadily from time of diagnosis), and the proportional
hazards model (irrespective of the elapsed time since diagnosis
the ratio of the risks of dying in given prognostic groups remains
10 * Stage 21 (n.1301)
'*Stage (n= 641) () constant, in particular the time when the force of mortality is
greatest is the same for all prognostic groups). The problem of
0)
~~~~~~Stage4 (n= 474) fitting a parametric distribution to survival needs to be referred
01'0
>
2 .2
0~~~~~~~~~0
/.~~~~~~~~.O.G.1... .
to a statistician, but an initial assessment can be made from
figure 3. The annual proportion dying is neither constant nor
monotone and so the exponential model and the Weibull are not
in the running. Diversity of times to peak hazard and the
relative unimportance in later follow-up of description of the
tumour (international stage) that was relevant initially are a
denial of the proportional hazards model. These features suggest
that a log-logistic distribution might fit the data. Careful
statistical analysis is called for to substantiate these impressions.
1 2 3 4 56 8 10 12 14 16 18 20 When analysing long-term follow-up of patients investigators
Time (years) should plot mortality year by year in addition to the cumulative
proportion of patients who survive. They should be on the look
breast cancer into four groups with progressively poorer survival out for prognostic factors which, although relevant at diagnosis,
-690o and 570o of patients with stage 1 and 2 breast cancer are less important when it comes to describing later follow-up.
survive for five years or more compared with 35% of patients Variable time to peak hazard indicates accelerated failure in
with stage 3 and 7% of patients with stage 4. some groups compared to others.
Lifetables are not the only graphical representation of survival
data and, because they show cumulative information, questions
about how the rate of mortality changes year by year cannot be
answered directly-although the answers are implicit in figure 2.
I have in mind questions such as this: is the relative disadvantage
of stage 3 breast cancer as great 15 years after diagnosis as it was
at five years ? The solution is to plot for each year of follow-up or
other interval covering a sufficient number of deaths the
estimated proportion of those alive at the beginning of a year
who die within the year. To ensure proper handling of censored (19) If the graph of log pt (where pt is the estimated probability
survival times, derive these proportions from the lifetable- of surviving for t years or more) plotted against time is a straight
for example, 410% of stage 3 patients survived for four years and line then the survival distribution is exponential. What do you
3500 for five years, making the estimated proportion who died infer from the figure below about the survival of patients with
in the fifth year of follow-up equal to 6/41= 0-15, or 15% (see breast cancer ?
figure 3). Obvious from figure 3 but concealed in figure 2 is
-survival is not exponential
Mortality year by year international stage
0
60- -rate of dying for breast cancer patients as a whole is
Stage 1 (n=1301) rapid during the first four to five years after diagnosis;
0) 50-
c
0
0.
Stage 2 (n= 641) thereafter the annual proportion dying is less, but
-' II0. Stage 3 (n=1417) moderately constant
4-c 40 oi
Stage 4 (n=474)
G0
u
-the hazard pattern for important subgroups of
patients (according to international stage, for example)
is different from the pattern for the series as a whole
g 20-
C
<10* i
-
i 8 L@ -:-
COMMENT
The figure shows the lifetable from figure 1 in question 18
plotted on a logarithmic scale. I have mentioned already that the
2 3 4 5 6 8 10 12 14 16 18 20 exponential distribution is one reference standard for survival
Time from diagnosis (years) data-a particularly simple one, the time-specific risk of dying
BRITISH MEDICAL JOURNAL VOLUME 283 26 SEPTEMBER 1981 843
) -0 1.0 10O
C 0-7
08
0
08
0-4 R 0-6 X 046
c
2 053 0
&-0-4
0
0-4B B
0-
XL 0-2B 02 A
0-11 A
0 1 23 45 6 8 10 12 14 16 18 1 2 3 4 5 7 9 1 2 3 4 5 7 9
Survival time (years) Survival time (years)
Distribution of survival in 347 patients with breast cancer referred to
the Western General Hospital, Edinburgh. Undesirable comparisons of survival rates.
or hazard being constant irrespective of the elapsed time since of survival rates at one point in time as the first of 13 bad
diagnosis. This means that the rate of mortality neither intensifies methods of analysis. Avoid comparison of n-year survival rates.
nor mitigates at any time during follow-up. Fairly simple The inefficiency of this method is illustrated in the figure on
statistical theory tells us that if survival is exponential, then the the left. By comparing three-year survival rates a difference
graph of log p, against time looks like a straight line. For any between treatments that becomes apparent in time is missed.
other survival distribution a similar theoretical exercise indicates Comparison of lifetables over the entire follow-up period by the
the appropriate transformation of p, which induces a straight-line logrank test makes efficient use of the data and is likely to identify
relationship and so gives a visual test of goodness of fit. The the superiority of treatment B.
exponential example is most familiar, however. The slope or In the second example, on the right of the figure, the treatment
gradient is identified with the rate of mortality, which is most that is singled out as advantageous is different at two years and
intense when the gradient is steepest. seven years. Treatment B is associated with higher initial
The difficulty of translating from gradient to hazard is mortality. In this case the logrank test is likely to report no
avoided in figure 3 in question 18 by plotting annual mortality significant difference between treatments. Only by plotting life-
directly. The logarithmic plot is the more commonly reported, tables will the investigator discover the cross-over of the survival
however, and so I include this example dealing with survival in curves, an unusual contingency in practice-one that the logrank
the 1956 cohort as a whole. I prefer the method described in test does not deal with happily, nor does any other simple test
question 18. statistic.
From the figure we see that the rate of dying is rapid and A third example, not illustrated, is when an initial treatment
probably constant during the first four to five years after diag- advantage (in the first year, for instance) is not sustained there-
nosis-by placing a ruler along the line of the early part of the after. This means that the rates of mortality in the two treatment
slope observe the more gentle gradient in the later part of the groups are likely to cross-over, being first higher in one treatment
graph. This lies above the ruler and approximates to a different group and then in the other before approaching a common level.
straight line. Our conclusions are (a) that survival is not Significant difference in disease-free interval but not in overall
exponential, and (b) that the rate of dying does not increase for survival is the guise in which this problem commonly presents.5
the series as a whole. Recall, however, the initially increasing Statistical advice should be sought.
mortality associated with international stage 1, 2, and 3 breast Comparison of n-year survival rates is at best inefficient, at
cancer (see figure 3 under question 18). It is not unusual for the worst misleading. Unless lifetables are plotted the two extremes
pattern of mortality in subgroups to be different from the cannot be identified separately. Organising survival data to
overall impression. Investigation of prognostic factors should compute lifetables and calculation of the logrank statistic are so
include looking at graphs of mortality plotted year by year. much the same exercise that there is little justification for
presenting an inferior analysis-and lifetables should of course
be reported always.
Inefficient or misleading comparison
(20) Commiinent on the comparison of survival at the fixed times References
shown in the figure below.
1 Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized
clinical trials requiring prolonged observation of each patient. II
-comparison of three-year survival rates misses the Analysis and examples. Br3 Cancer 1977;35:1-39.
emergent difference between treatments A and B 2 Langlands AO, Pocock SJ, Kerr GR, Gore SM. Long-term survival of
patients with breast cancer: a study of the curability of the disease.
Br Med]7 1979;ii:1247-51.
-comparison of two-year survival rates favours 3 Cutler SJ, Axtell LM, Schottenfeld D. Adjustment of long-term survival
treatment A, whereas by seven years after diagnosis the rates for deaths due to intercurrent disease. JChron Dis 1969;22:485-91.
Peto R, Pike MC, Armitage P, et al. Design and analysis of randomized
advantage is with patients randomised to treatment B; clinical trials requiring prolonged observation of each patient. I Intro-
the logrank test would identify neither treatment as duction and design. Br Cancer 1976;34:585-612.
superior Henk JM, Kunkler PB, Smith CW. Radiotherapy and hyperbaric oxygen
in head and neck cancer. Final report of first controlled clinical trial.
Lancet 1977;i:101-3.
COMMENT
Comparison of n-year survival rates is inefficient because it Sheila M Gore, MA, is a statistician in the MRC Biostatistics Unit,
ignores the structure of the lifetable and is open to the criticism Medical Research Council Centre, Hills Road, Cambridge CB2 2QH.
that if the time of analysis had been chosen differently the
results might be slanted differently. Peto et all list comparison No reprints will be available from the author.