I - H C D P M: Dvancing N Ospital Linical Eterioration Rediction Odels
I - H C D P M: Dvancing N Ospital Linical Eterioration Rediction Odels
A DVANCING IN-HOSPITAL
CLINICAL DETERIORATION
PREDICTION MODELS
By Alvin D. Jeffery, PhD, RN, Mary S. Dietrich, PhD, MS, Daniel Fabbri, PhD,
Betsy Kennedy, PhD, RN, Laurie L. Novak, PhD, Joseph Coco, MS, and
Lorraine C. Mion, PhD, RN
[Link] AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 381
Optimal statistical approaches to embed within In this study, we compared the accuracy of 2
decision support tools and assist clinicians with traditional statistical modeling strategies (logistic
recognition are still being identified. Most statistical regression and Cox proportional hazards regres-
approaches are simply classification models that sion) and 2 related machine learning strategies
attempt to identify the likeli- (random forest and random survival forest) for
hood of an event. Researchers in-hospital cardiopulmonary arrest (CPA). We
Effectively predicting have focused on increasingly selected these 4 strategies on the basis of their com-
clinical deterioration accurate models, but accuracy mon use in the scientific literature and because 2
is not the only important fea- of the strategies (logistic regression and random
requires both mathe- ture of a statistical method’s forest) predict a binary outcome, whereas the other
performance. For example, a 2 strategies (Cox proportional hazards regression
matical accuracy model resulting in a single and random survival forest) predict a time-to-event
and a consideration probability as opposed to outcome (Table 1). The traditional statistical strate-
probability trends over time gies leverage regression methods for classification
of clinicians’ needs. might yield weaker models and survival analyses, and the machine learning
for implementation into the strategies average the results of many decision trees
clinical environment. For nurses, especially those in created by splitting a random selection of predictor
a hospital, identifying when an event is likely to occur variables in each tree.16 We evaluated each of the
(or at least monitoring trends over time) might be approaches for accuracy and discrimination, the
equally as important as the classification outcome expected number of alarms at select thresholds, and
of whether an event will occur at any point. the differences in model outputs with respect to
what was being predicted. We hypothesized that
the machine learning strategies would provide
About the Authors improved accuracy and discrimination and that the
Alvin D. Jeffery is a medical informatics fellow at the US time-to-event models would provide outputs more
Department of Veterans Affairs, Tennessee Valley Health- amenable to human interpretation for evaluation
care System, Nashville, Tennessee, and a postdoctoral
research fellow, Department of Biomedical Informatics,
in future work.
Vanderbilt University, Nashville, Tennessee. Mary S. Dietrich
is a professor of statistics and measurement, Schools of Methods
Medicine (Biostatistics, Vanderbilt-Ingram Cancer Center,
Psychiatry) and Nursing, Vanderbilt University. Daniel Fabbri
Design and Setting
is an assistant professor, Department of Biomedical Infor- For this retrospective cohort study, we collected
matics, Vanderbilt University. Betsy Kennedy is a professor, data from deidentified copies of the electronic health
School of Nursing, Vanderbilt University. Laurie L. Novak
is an assistant professor and Joseph Coco is a senior
records of adults admitted to a large urban academic
application developer, Department of Biomedical Infor- medical center from 2006 through 2015. A start
matics, Vanderbilt University. Lorraine C. Mion is a pro- date of 2006 accounted for changes in the rapid
fessor, College of Nursing, The Ohio State University,
Columbus, Ohio.
response team’s organizational policy, which could
have influenced the outcome of interest given that
Corresponding author: Alvin D. Jeffery, 2525 West End
Ave, #1475, Nashville, TN 37203 (email: alvinjeffery
these changes placed increased emphasis on early
@[Link]). recognition and management of clinical deterioration
382 AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 [Link]
[Link] AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 383
[Link] AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 385
1.0 1.0
Cox (day 2)
Random forest
Precision
Logistic
0.4 Cox (day 2) 0.4
Random forest
0.2 Survival forest (day 2) 0.2
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
False-positive rate Recall
Figure 2 Receiver operating characteristic curves (left) and recall-precision curves (right) for logistic regression, Cox pro-
portional hazards regression, random forest, and random survival forest approaches. Evaluation of survival approaches
is provided at the median time point, day 2.
386 AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 [Link]
Percentage
sion vs machine learning methods) has recently
been published,21 and the findings differed slightly 50
from ours in that the random forest approach
outperformed logistic regression with respect to
AUROC (0.801 vs 0.770). The investigators also
found that respiratory rate, heart rate, and age were
25
the 3 most important predictor variables, whereas
we found several laboratory values to be the most
important clinical variables in our models. Of note,
they used a composite outcome of non–intensive
care unit CPA, unexpected intensive care unit trans- 0
fer, and death rather than a single end point of
0.006 0.012 0.018 0.06 0.12 0.006 0.012 0.018 0.06 0.12
CPA.
Conversely, the statistical performance of all Thresholds
modeling approaches was more dissimilar for recall
Cox Logistic Random forest Random survival forest
and precision, with F1 scores of 0.170 to 0.325. The
2 regression models (Cox proportional hazards for
time-to-event outcomes and logistic for classifica- Figure 3 Comparison of positive prediction rate and sensitivity
tion outcomes) performed similarly, with F1 scores among all models at thresholds comprising the event rate in this
data set (0.006) and several of its multiples.
of 0.284 and 0.273, respectively. In contrast, the time-
to-event machine learning approach (random sur-
vival forest) performed worse than the classification behavior, and in fact, precision reached 1 at the most
machine learning approach (random forest), with extreme threshold before returning to values similar
F1 scores of 0.170 and 0.325, respectively. Unfortu- to those generated by other approaches. For clinical
nately, we were not able to compare our F1 scores environments where precision is valued more than
with those of other studies because these metrics recall (ie, where certainty in a positive prediction is
are not frequently reported in CPA prediction lit- more important than a false-negative result), the
erature. With rare events, comparing precision (ie, random forest approach could be more appropriate.
positive predictive value) is preferable to specific- In terms of clinical interpretability, we used this
ity because of precision’s sensitivity to event rate, study to generate the hypothesis that prediction trends
which can provide insight into the clinical burden of time-to-event models might be more likely to
of false alarms.22 influence clinicians’ decisions. Time-to-event models
The potential clinical influence of the models produce trajectory curves
with respect to number of alarms varied as well. that align more closely
At all thresholds, machine learning approaches with the underlying deteri- Evaluation of the 4
produced more clinical alarms than regression
approaches (Figure 3). This finding was accompa-
oration phenomenon than
does a single probability
prediction models
nied by the benefit of increased sensitivity, but that is expressed as a emphasized potential
too many alarms could contribute to clinicians’ straight line on a graph
alert fatigue. Increased thresholds decrease the (Figure 4). The display of
impact on false alarms,
positive prediction rate and recall (sensitivity) graphical probability trends in addition to accuracy.
while increasing precision (positive predictive offers a potential solution to
value). In our study, increases in precision occurred alarm fatigue that might
at increasingly higher thresholds but eventually result from simple numerical cutoffs. Although there
returned to zero in 3 of the 4 approaches (Figure 2). does not appear to be a single superior approach
The random forest model did not exhibit the same at this time, given that the random forest machine
[Link] AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 387
0.025
Probability of CPA
0.020
0.015
0.010
0.005
0.000
0 2 4 6 8 10 12 14
Hospital day
0.6
0.5
Probability of CPA
0.4
0.3
0.2
0.1
0.0
0 2 4 6 8 10 12 14
Hospital day
Figure 4 Comparison of estimated probability of a cardiopulmonary arrest (CPA) event from 2 fictitious patients. Top:
average patient defined as all model variables’ values set at the median value. Bottom: ill patient characterized by
several abnormal values (ie, creatinine = 2 mg/dL [177 μmol/L], glucose = 300 mg/dL [16.6 mmol/L], potassium = 5
mEq/L [5 mmol/L], sodium = 150 mEq/L [150 mmol/L], hemoglobin = 7 g/dL [70 g/L], red cell distribution width = 20%,
respiratory rate = 24/min, pulse = 115/min, and age = 80 years). The y-axis scales are different in the 2 graphs.
learning methods have several advantages (ie, fewer Strengths and Limitations
assumptions and increased variability in prediction We leveraged robust prediction model methods,
trends) over the traditional statistical regression including flexible regression models and newer
models and the time-to-event models allow predic- machine learning methods. Random forest models
tion trends, the random survival forest model might have the benefit of fewer predictor variable assump-
provide the best option for further model develop- tions than traditional modeling strategies (eg, lin-
ment work for in-hospital CPA. Future research to earity, interaction effects) and minimal overfitting
determine what is most likely to influence clinicians’ compared with simple classification and regres-
decisions would be helpful. sion trees. A benefit of using survival models is the
388 AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 [Link]
[Link] AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 389
390 AJCC AMERICAN JOURNAL OF CRITICAL CARE, September 2018, Volume 27, No. 5 [Link]
The American Association of Critical-Care Nurses is an accredited provider of continuing nursing education by the
American Nurses Credentialing Center’s Commission on Accreditation. AACN has been approved as a provider of
continuing education in nursing by the State Boards of Registered Nursing of California (#01036) and Louisiana
(#LSBN12).
Age Yes
Sex Yes
Race No Small sample in some categories resulted in a singular matrix during model fits.
Ethnicity No Small sample in some categories resulted in a singular matrix during model fits.
Body mass index Yes
Heart rate Yes
Respiratory rate Yes
Blood pressure No Data source listed all timestamps at 00:00, so we were unable to determine first value.
Sodium Yes
Potassium Yes
Chloride No Could be predicted by other variables in a regression model with R2 > 0.9
Glucose Yes
Blood urea nitrogen No Collinear with creatinine (Spearman r ~ 0.4)
Creatinine Yes
Anion gap Yes
Calcium Yes
Carbon dioxide Yes
White blood cell count Yes
Red blood cell count No Collinear with hemoglobin (Spearman r ~ 0.8)
Hemoglobin Yes
Platelet count Yes
Red cell distribution width Yes
Blood gas panelb No Missing in > 80% of patients
Braden score No Missing in > 80% of patients
ICD-9 codes Most The obstetrical procedure category was removed because it resulted in a singular
matrix during model fits.
CPT codes No Only used for outcome variables
Abbreviations: CPT, Current Procedural Terminology; ICD-9, International Classification of Diseases, Ninth Revision.
a
Temperature and pulse oximetry (variables frequently collected for hospitalized patients) were not available in the data set used for this
study. All laboratory values were obtained from serum collections. Raw ICD-9 codes were collapsed into 19 diagnostic categories and 16
procedural categories.
b
Blood gas panel comprised pH, Pco2, base excess, Po2, lactic acid, and methemoglobin.
Supplemental Table 2
Data analysis software: R packages
80 000
60 000
Frequency
40 000
20 000
0 10 20 30 40
Days
400
300
Frequency
200
100
0 10 20 30 40
Days
80 000
60 000
Frequency
40 000
20 000
0 10 20 30 40
Days
DEVELOPMENT
Training Testing
Multiple MEDIAN (50%) (25%)
imputation imputation Multiply imputed
set m
(with bootstrap)
Validation (25%)*
Multiply imputed
set m
* Validation portion of MEDIAN imputed data set
used for direct comparison of all approaches
Supplemental Figure 2 Data sets used for model training development and validation.
50
RDW
RDW
40 Creatinine
Sodium
Serum carbon dioxide Pulse
Age
Clinical variables
Gender Hemoglobin
Serum carbon dioxide
Anion gap Glucose
Respirations
30 WBC
Platelets
Age BMI
Potassium
Calcium
Platelets Gender
Anion gap
Sodium
20 Hemoglobin
Glucose
Rank (larger is more important)
Creatinine WBC
Pulse
10 Calcium
Respirations
BMI
0 Potassium
Abbreviations: BMI, body mass index; Dx, diagnostic code; ICD, International Classification of Diseases; Proc, procedural code; RDW,
red cell distribution width; WBC, white blood cells.
In all modeling approaches, the predicted cardiopulmonary arrest event E is said to occur if the probability estimate Y^ meets or
exceeds the threshold c, set at the event rate (0.006) and several of its multiples.
E=
{ 1, if Y^ ≥ c ∈ {0.006, 0.012, 0.018, 0.06, 0.12}
0, otherwise
This formulation creates a binary classification for direct comparison of predicted events E with actual events A in a sample of n
patients with the following metrics:
∑(E = 1 | A = 1)
Sensitivity (recall, true-positive rate) = (1)
∑A
∑E
Positive prediction rate = (2)
n
∑(E = 1 | A = 1)
Positive predictive value (precision) = (3)
∑E
False-positive rate =
∑(E = 1 | A = 0)
(4)
∑n – A
Precision × Recall
F1 score = 2 × (5)
Precision + Recall
The area under the receiver operating characteristic curve metric AUROC was calculated with a trapezoidal approximation using a
plot comparing the false-positive rate FPR to the true-positive rate TPR at each unique predicted probability i in {Y^ }.
AUROC = ∑
i∈{2.3,... |Y^ |}
@ (FPR – FPR
i i-1
)(TPRi + TPRi-1) (6)
2 Logistic Regression
Probability estimates for logistic regression models given a vector of coefficients β and new data X are calculated by:
1
Y^ = -Xβ (7)
1 + exp
In this study, t = 2 was used for comparisons because that was the median time to both the event and censoring.
4 Random Forests
For each of the R trees Tr and new data X, the event probability Y^ becomes:
R
Y = 1– (9)
^
R ∑ Tr(x)
r=1
Y^ t = 1 – S^t
R
1 – 1– ∑ Tr,t(x)
= (10)
R r=1
Once again, t = 2 was used because it was the median time.
Subscription Information
[Link]
Information for authors
[Link]
Submit a manuscript
[Link]
Email alerts
[Link]
The American Journal of Critical Care is an official peer-reviewed journal of the American Association of Critical-Care Nurses
(AACN) published bimonthly by AACN, 101 Columbia, Aliso Viejo, CA 92656. Telephone: (800) 899-1712, (949) 362-2050, ext.
532. Fax: (949) 362-2049. Copyright ©2016 by AACN. All rights reserved.