Machine Learning in CKD & CVD Analysis
Machine Learning in CKD & CVD Analysis
fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI
ABSTRACT Chronic kidney disease (CKD) describes a long-term decline in kidney function and has many
causes. It affects hundreds of millions of people worldwide every year. It can have a strong negative impact
on patients, especially when combined with cardiovascular disease (CVD): patients with both conditions
have lower survival chances. In this context, computational intelligence applied to electronic health records
can provide insights to physicians that can help them make better decisions about prognoses or therapies.
In this study we applied machine learning to medical records of patients with CKD and CVD. First, we
predicted if patients develop severe CKD, both including and excluding information about the year it
occurred or date of the last visit. Our methods achieved top mean Matthews correlation coefficient (MCC)
of +0.499 in the former case and a mean MCC of +0.469 in the latter case. Then, we performed a feature
ranking analysis to understand which clinical factors are most important: age, eGFR, and creatinine when
the temporal component is absent; hypertension, smoking, and diabetes when the year is present. We then
compared our results with the current scientific literature, and discussed the different results obtained when
the time feature is excluded or included. Our results show that our computational intelligence approach
can provide insights about diagnosis and and relative important of different clinical variables that otherwise
would be impossible to observe.
INDEX TERMS machine learning; computational intelligence; feature ranking; electronic health records;
chronic kidney disease; CKD; cardiovascular diseases; CVD.
2 Chronic kidney disease (CKD) kills around 1.2 million peo- ing applied to clinical records of patients with CKD have 17
3 ple and affects more than 700 million people worldwide appeared in the biomedical literature in the recent past [4]– 18
5 high blood pressure, and are more likely to be developed in Among the studies found, a large number involves appli- 20
6 subjects with a family history of CKD . cations of machine learning methods to the Chronic Kidney 21
7 Individuals with chronic kidney disease are at higher risk Disease dataset of the University of California Irvine Ma- 22
9 stroke, heart failure) [2], and patients with both diseases are On this dataset, Shawan [16] and Abrar [18] employed 24
10 more likely to have worse prognoses [3]. several data mining methods for patient classification in their 25
11 In this context, computational intelligence methods ap- PhD theses. Wibawa et al. [8] applied a correlation-based 26
12 plied to electronic medical records of patients can provide feature selection methods and AdaBoost to this dataset, while 27
13 interesting and useful information to doctors and physicians, Al Imran et al. [13] employed deep learning techniques to the 28
14 helping them to more precisely predict the trend of the con- same end. 29
15 dition and consequently to make decisions on the therapies. Rashed-al-Mahfuz and colleagues [24] also employed 30
VOLUME X, 2021 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
31 a number of machine learning methods for patient clas- a data mining step, which instead could retrieve additional 87
32 sification and described the dataset precisely. Syed Im- information or unseen patterns in these data. 88
33 ran Ali and coauthors [21] applied several machine learning To fill this gap, we perform here two analyses: first, 89
34 methods to the same dataset to determine a globlal threshold we apply machine learning methods to binary classify the 90
35 to discriminate between useful clinical factors and irrelevant serious CKD development, and then to rank the clinical 91
37 Salekin and Stankovic [6] used Lasso for feature selection, colleagues [28] did, we also performed the same analysis 93
38 while Belina and colleagues [15] applied a hybrid wrapper excluding the year when the disease happened to each pa- 94
39 and filter based feature selection for the same scope. tient (Figure 1). 95
40 Tazin et al. [5] employed several data mining methods As major results, we show that computational intelligence 96
41 for patient classification. Ogunleye and Wang [11] used is capable of predicting a serious CKD development with or 97
42 an enhanced XGBoost method for patient classification. without the time information, and that the most important 98
43 Satukumati and coauthors [17] used several techniques for clinical features change if the temporal component is con- 99
44 feature extraction. Elhoseny and colleagues [19] developed sidered or not. 100
45 a method called Density based Feature Selection (DFS) We organize the rest of the paper as follows. After this 101
46 with Ant Colony based Optimization (D-ACO) algorithm Introduction, we describe the dataset we analyzed (section II) 102
47 for the classification of patiens with CKD. Polat et al. [7] and the methods we employed (section III). We then report 103
48 showed an application of a Support Vector Machine vari- the binary classification and feature ranking results (sec- 104
49 ant for patient classification to the same dataset. Chit- tion IV) and discuss them afterwards (section V). Finally, we 105
50 tora and colleagues [22] applied numerous machine learn- recap the main points of this study and mention limitations 106
51 ing classifiers and their variants for patient classification. and future developments (section VI). 107
84 authors employed multivariable Cox’s proportional hazards In this study, we examine a dataset of electronic medical 109
85 to identify the independent risk factors causing CKD at stages records of 491 patients collected at the Tawam Hospital in 110
86 3-5. Although this analysis was interesting, it did not involve Al-Ain city (Abu Dhabi, United Arab Emirates), between 111
2 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
112 1st January and 31st December 2008 [28]. The patients All the dataset features refer to the first visits had by the 168
113 included 241 women and 250 men, with an average age of patients in January 2008, except the EventCKD35 and the 169
114 53.2 years (Table 2 and Table 3). time year variables that refer to the end of the follow-up 170
115 Each patient has a chart of 13 clinical variables, expressing period, in June 2017. 171
116 her/his values of laboratory tests and exams or data about More information about this dataset can be found in the 172
117 her/his medical history (Table 1). Each patient included in original article [28]. 173
120 Hospital [28]. The problem described earlier (section I) can be addressed 175
121 Several features regard the personal history of the pa- as conventional binary classification framework, where the 176
122 tient: diabetes history, dyslipidemia history, hypertension goal is to predict EventCKD35, using the data described 177
123 history, obesity history, smoking history, and vascular disease earlier (section II). This target feature indicates if the patient 178
124 history (Table 3) state if the patient biography had those has the chronic kidney disease in the stage 3 to 5, which 179
125 specific diseases or conditions. Dyslipidemia indicates ex- represents an advanced stage. 180
126 cessive presence of lipids in the blood. Two variables refer In binary classification, the problem is to identify the 181
127 to the blood pressure (diastolic blood pressure and systolic unknown relation R between the input space X (in our case: 182
128 blood pressure), and other variables refer to blood levels the features described in Section II) and an output space 183
129 obtained through laboratory tests (cholesterol, creatinine). Y ⊆ {0, 1} (in our case: the EventCKD35 target) [32]. 184
130 Few features state if the patients have taken specific-disease Once a relation is established, one can find a way to discover 185
131 medicines (dyslipidemia medications, diabetes medications, what the most influencing factors are in the input space for 186
132 and hypertension medications) or inhibitors (angiotensin- predicting the associated element in the output space, namely 187
133 converting-enzyme inhibitors, or angiotensin II receptor to determine the feature importance [33]. 188
134 blockers) which are known to be effective against cardio- Note that, X can be composed by categorical features 189
135 vascular diseases [29] and hypertension [30]. The remaining (the values of the features belong to a finite unsorted set) 190
136 factors describe the physical conditions of each patient: age, and numerical–valued features (the values of the features 191
137 body–mass index, biological sex (Table 3). belong to a possibly infinite sorted set). In case of categorical 192
138 Among the clinical features available for this dataset, features, one-hot encoding [34] can map them in a series of 193
139 the EventCKD35 binary variable states if the patient had numerical features. The consequent resulting feature space is 194
140 chronic kidney disease at high stage (3rd , 4th , or 5th stage). X ⊆ Rd . 195
141 According to the Kidney Disease Improving Global Out- A set of data Dn = {(x1 , y1 ), . . . , (xn , yn )}, with xi ∈ X 196
142 comes (KDIGO) organization [31], CKD’s can be grouped and yi ∈ Y, is available in a binary classification framework. 197
143 into 5 stages: Moreover, some values of xi might be missing [35]. In this 198
144 • Stage 1: normal kidney function, no CKD; case, if the missing value is categorical, we introduce an 199
145 • Stage 2: mildly decreased function of kidney, mild additional category for missing values for the specific feature. 200
146 CKD; Instead, if the missing value is associated with a numerical 201
147 • Stage 3: moderate decrease of kidney function, moder- feature, we replace the missing value with the mean value of 202
148 ate CKD; the specific feature, and we introduce an additional logical 203
149 • Stage 4: severe decrease of kidney function, severe feature to indicate if the value of the feature is missing for a 204
151 • Stage 5: extreme CKD and kidney failure. Our goal is to identify a model M : X → Y, which best 206
152 When the EventCKD35 variable has value 0, the pa- approximates R, through an algorithm AH characterized by 207
153 tient’s kidney condition is at stage 1 or 2. Instead, when its set of hyper-parameters H. The accuracy of the model M 208
154 EventCKD35 equals to 1, the patient’s kidney is at stage 3, 4, to represent the unknown relation R is measured using dif- 209
156 Even if the value of eGFR has a role to the definition of Since the hyper-parameters H influence the ability of AH 211
157 the CKD stages in the KDIGO guidelines [31], we found to estimate R, we need to adopt a proper Model Selection 212
158 weak correlation between the eGFRBaseline variable and (MS) procedure [36]. In this work, we exploited the Com- 213
159 the target variable EventCKD35 in this dataset. The two plete Cross Validation (CCV) procedure [36]. CCV relies on 214
160 variables have Pearson correlation coefficient equal to −0.36 a simple idea: we resample the original dataset Dn many 215
161 and Kendall distance of −0.3, both in the [−1, +1] interval (nr = 500) times without replacement to build a training 216
162 where −1 indicates perfectly opposite correlation, 0 indicates set of size l Lrl while the remaining samples are kept in the 217
163 no correlation, and +1 indicates perfect correlation, validation set Vvr , with r ∈ {1, · · · , nr }. In order to perform 218
164 The time year derived factor indicates in which year the the MS phase, to select the best combination of the hyper- 219
165 patient had a serious chronic kidney disease, or the year when parameters H in the set of possible ones H = {H1 , H2 , · · · } 220
166 he/she had his/her last outpatient visit, whichever occurred using the algorithm AH , the hyper-parameters which mini- 221
167 first (Supplementary information), in the follow-up period. mize the average performance of the model, trained on the 222
VOLUME X, 2021 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
TABLE 1: Meaning, measurement unit, and possible values of each feature of the dataset. ACEI: Angiotensin-converting
enzyme inhibitors. ARB: Angiotensin II receptor blockers. mmHg: millimetre of mercury. kg: kilogram. mmol: millimoles.
223 training set, and evaluated on the validation set, should be are two options to investigate this property. The first one is to 257
224 selected. Since the data in Lrl are independent from the learn a M such that its functional form is, by construction, 258
225 ones in Vvr , the idea is that H∗ should be the set of hyper- interpretable [43] (for example, Decision Trees and Rule 259
226 parameters which allows to achieve a small error on a data based models); this solution, however, usually results in poor 260
227 set that is independent from the training set. generalization performances. The second one, used when the 261
228 Finally, we need to estimate the error (EE) of the optimal functional form of M is not interpretable by construction [43] 262
229 model with a separate set of data Tm = {(xt1 , y1t ), · · · , (for example, Kernel Methods or Neural Network), is to 263
230
t
(xtm , ym )} since the error that our model commits over Dn derive its interpretability a posteriori. A classical method 264
231 would be optimistically biased since Dn has been used to find for reaching this goal is to perform a feature ranking pro- 265
232 M. cedure [33], [44] which gives an hint to the users of M about 266
233 Additionally, another aspect to consider in this analysis the most important features which influence its results. 267
236 well with imbalanced datasets and tend to poorly perform In this paper, for the A , we will exploit different state-of-the- 269
237 on the minority class. For these reasons, several techniques art models. In particular we will exploit Random Forests [45], 270
238 have been developed in order to address this issue [40]. Support Vector Machines (linear and kernelized with the 271
239 Currently the most practical and effective method involves Gaussian Kernel) [46], [47], Neural Network [48], Decision 272
240 the resampling of the data in order to synthesize a balanced Tree [49], XGBoost [50], and One Rule [51]. 273
241 dataset [40]. For this purpose, we can under-sample or over- We tried a number of different hyper-parameter configu- 274
242 sample the dataset. Under-sampling balances the dataset by rations for the machine learning methods employed in this 275
243 reducing the size of the abundant class. By keeping all sam- study. 276
244 ples in the rare class and randomly selecting an equal number For Random Forests, we set the number of trees to 1000 277
245 of samples in the abundant class, a new balanced dataset and we searched number of variables randomly sampled as 278
246 can be retrieved for further modeling. Note that this method candidates at each split in {1, 2, 4, 8, 16}, the minimum size 279
247 wastes a lot of information (many samples might be dis- of samples in the terminal nodes of the trees in {1, 2, 4, 8}, 280
248 carded). For this reason, scientists take advantage of the over- the percentage samples (sampled with bootstrap) during 281
249 sampling strategy more often. Over-sample tries to balance the creation of each tree in {60, 80, 100, 120} [?], [52]– 282
250 the dataset by increasing the size of rare samples. Rather than [54]. For the linear and kernelized Support Vector Ma- 283
251 removing abundant samples, new rare samples are generated chines [46], we searched the regularization hyper-parameters 284
252 (for example by repetition, by bootstrapping, or by synthetic in {10−6.0,−5.8,··· ,4 } and, for the kernelized Support Vec- 285
253 minority). The latter method is the one that we employed in tor Machines, we used the Gaussian Kernel [47] and we 286
254 this study: synthetic minority oversampling [41], [42]. searched the kernel hyper-parameters in {10−6.0,−5.8,··· ,4 }. 287
255 Another important property of M is its interpretability, For the Neural Network we used a single hidden layer net- 288
256 namely the possibility to understand how it behaves. There work (hyperbolic tangent as activation function in the hidden 289
4 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
feature value # %
to make a split in {0, 0.001, 0.005, 0.01}, the fraction 305
ACEIARB 0 272 55.397
ACEIARB 1 219 44.603 of samples in {1, 0.9, 0.7} and features {1, 0.5, 0.2, 0.1} 306
DLDmeds 0 220 44.807 used train the trees and the maxim number of leaves in 307
DLDmeds 1 271 55.193 {1, 2, 4, 8, 16}, and the regularization hyper-parameters in 308
DMmeds 0 330 67.210 {10−6.0,−5.8,··· ,4 } [50]. For One Rule we did not have to tune 309
DMmeds 1 161 32.790 hyper-paramters (OneR in the caret [55] R package). 310
HistoryCHD 0 446 90.835
HistoryCHD 1 45 9.165 Note that these methods have shown to be a set of of the 311
HistoryDiabetes 0 276 56.212 simplest yet best performing methods available in scientific 312
HistoryDiabetes 1 215 43.788 literature [56], [57]. The difference between the methods is 313
HistoryDLD 0 174 35.438 just the functional form of the model which tries to better 314
HistoryDLD 1 317 64.562
HistoryHTN 0 156 31.772
approximate a learning principle. 315
HistoryHTN 1 335 68.228 For example, Random Forests and XGBoost try to imple- 316
HistoryObesity 0 243 49.491 ment the wisdom of the crowd principles, Support Vector Ma- 317
HistoryObesity 1 248 50.509 chines are robust maximum margin classifiers, and Decision 318
HistorySmoking 0 416 84.725 Tree and One Rule represent very easy to interpret models. 319
HistorySmoking 1 75 15.275
HistoryVascular 0 462 94.094 In this paper we tested multiple algorithms since the no-free- 320
HistoryVascular 1 29 5.906 lunch theorem [58] assures us that, for a specific application, 321
HTNmeds 0 188 38.289 it is not possible to know, a-priori, what algorithm will better 322
HTNmeds 1 303 61.711 perform on a specific task. Then we tested the ones which, 323
Sex 0 241 49.084 in the past, have shown to perform well on many tasks and 324
Sex 1 250 50.916
[target] EventCKD35 0 435 88.595 identified the best one for our application. 325
All the binary features have meaning true for the value 1 and Feature rankings methods based on Random Forests are 328
false for the value 0, except sex (0 = female and 1 = male). among the most effective techniques [59], [60], particularly 329
The dataset contains medical records of 491 patients. in the context of bioinformatics [61], [62] and health infor- 330
matics [63]. Since Random Forests obtained the top predic- 331
feature median mean range σ tion scores for binary classification, we focus on this method 332
dBPBaseline 77 76.872 [41, 112] 10.711 the Permutation Importance or Mean Decrease in Accuracy 336
eGFRBaseline 98.1 98.116 [60, 242.6] 18.503 (MDA), where the importance is assessed for each feature 337
sBPBaseline 131 131.375 [92, 180] 15.693 by removing the association between that feature and the 338
time year 8 7.371 [0, 10] 2.175 target. This effect is achieved by randomly permuting [64] 339
TABLE 3: Numeric feature quantitative characteristics. the values of the feature and measuring the resulting increase 340
σ: standard deviation. in error. The influence of the correlated features is also 341
removed. 342
In details, for every tree, the method computes two quanti- 343
290 layer) with dropout (mlpKerasDropout in the caret [55] ties: the first one is the error on the out-of-bag samples as they 344
291 R package), we train it with adaptive subgradient methods are used during prediction, while the second one is the error 345
292 (batch size equal to 32), and we tuned the following hyper- on the out-of-bag samples after a random permutation of the 346
293 parameters: the number of neurons in the hidden layer in values of a variable. These two values are then subtracted and 347
294 {10, 20, 40, 80, 160, 320, 640, 1280}, the dropout rate of the the average of the result over all the trees in the ensemble is 348
295 hidden layer in {0.001, 0.002, 0.004, 0.008}, the learning the raw importance score for the variable under exam. 349
296 rate in {0.001, 0.002, 0.005, 0.01, 0.02.0.05}, the fraction Despite the effectiveness of MDA, when the number of 350
297 of gradient to keep at each step in {0.01, 0.05, 0.1, 0.5}, samples is small these methods might result being unsta- 351
298 and the learning rate decay in {0.01, 0.05, 0.1, 0.5}. For ble [65]–[67]. For this reason, in this work, instead of running 352
299 Decision Tree we we searched the max depth of the trees the Feature Ranking (FR) procedure just once, analogously 353
300 in {4, 8, 16, 24, 32} (rpart2 in the caret [55] R pack- to what we have done for MS and EE, we sub-sample the 354
301 age). For XGBoost we set tree gradient boosting and original dataset and we repeat the procedure many times. The 355
302 we searched the Booster Parameters in {0.001, 0.002, final rank of a feature will be the aggregation of the different 356
303 0.004, 0.008, 0.01, 0.02, 0.04, 0.08} the number of ranking using the Borda’s method [68]. 357
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
358 C. BIOSTATISTICS UNIVARIATE TESTS teristic AUC (Table 4), while the support vector machine with 410
359 Before employing machine learning algorithms, we applied Gaussian kernel achieved the top specificity and precision. 411
360 traditional univariate biostatistics techniques to evaluate the Because of the imbalance of the dataset (section II), all 412
361 relationship between the EventCKD35 target and each fea- the classifiers attained better results among the negative data 413
362 ture. instances (specificity and NPV) than among the positive ele- 414
363 We made use of the Mann–Whitney U test (also known ments (sensitivity and precision). This consequence happens 415
364 as Wilcoxon rank–sum test) [69] for the numerical features because each classifier can observe and learn to recognize 416
365 and of the chi–square test [70] for the binary features. more individuals without CKD during training, and therefore 417
366 The p-values of both these tests range between 0 and 1: a are more capable of recognizing them than recognizing pa- 418
367 low p-value of this test means that the analyzed variable tients with CKD during testing. 419
368 strongly relates to the target feature, while a high p-value XGBoost and One Rule obtained Matthews correlation 420
369 means the no evident relation. These tests are also useful coefficients close to 0, meaning that their performance was 421
370 to detect the importance of each feature with respect to the similar to random guessing. Random Forests, linear SVM, 422
371 target: the lower the p-value of a feature, the stronger its and Decision Tree were the only methods able to correctly 423
372 association with the target. Following the recent advice of classify most of the true positives (TP rate = 0.792, 0.6, and 424
373 Benjamin and colleagues [71], we use 0.005 as threshold of 0.588, respectively). No technique was capable of correctly 425
374 significance for the p-values, that is 5×10−3 . If the p-value of making most of the positive predictions: all PPVs are below 426
375 a test applied to a variable and the target results being lower 0.5 Table 4. 427
376 than 0.005, we consider significant the association between Regarding positives, SVM with Gaussian kernel obtained 428
377 the variable and the target. an almost perfect specificity (0.940), while Random Forests 429
378 D. PREDICTION AND FEATURE RANKING INCLUDING These results show that the machine learning classifiers 431
379 TEMPORAL FEATURE Random Forests and SVM with Gaussian kernel can effi- 432
ciently predict patients with CKD and patients without CKD 433
380 In the second analysis we performed for chronic kidney
from their electronic health records, with high prediction 434
381 disease prediction, we decided to include the temporal com-
scores, in few minutes. 435
382 ponent expressing in which year the disease occurred for the
Since Random Forests resulted being the best performing 436
383 CKD patients or which year they had their last outpatient
classifier, we also included the calibration curve plot [78] of 437
384 visit (Supplementary information).
its predictions (Figure 2), for the sake of completeness. The 438
385 We applied a Stratified Logistic Regression [72], [73]
curve follows the trend of the x = y perfect line translated 439
386 to this complete dataset, including all the original clinical
on the x axis between approximately 5% and approximately 440
387 features and the derived year feature, both for supervised
65%, indicating well calibrated predictions in this interval. 441
388 binary classification and feature ranking. We measured the
CKD prediction excluding temporal component. To 442
389 prediction with the typical confusion matrix rates (MCC, F1
show a scenario where no previous disease history of a pa- 443
390 score, and others), and the importance for each variable as
tient is available, we did not include any temporal component 444
391 the logistic regression model coefficient. This method has
providing information about the progress of the disease in the 445
392 no significant hyper-parameters so we did not perform any
previous analysis. We then decided to performed a stratified 446
393 optimization (glm method of the stats R package).
prediction including a time feature indicating the year when 447
399 CKD prediction. We report the results obtained for the static +0.469, better than all the MCC’s achieved by the classi- 455
400 prediction of the CKD measured with traditional confusion fiers applied to the static dataset version except Random 456
401 matrix indicators in Table 4. We rank our results by the Forests (Table 5). Also in this case, sensitivity and precision 457
402 Matthews correlation coefficient (MCC) because it is the result being much higher than sensitivity and NPV, because 458
403 only confusion matrix rate that generates a high score if of the imbalance of the dataset. 459
404 the classifier was able to correctly predict most of the data This result comes with no surprise: it makes complete 460
405 instances and correctly make most of the predictions, both sense that the inclusion of a temporal feature describing the 461
406 on the positive class and the negative class [74]–[77]. trend of a disease could improve the prediction quality. 462
407 Random Forests outperformed all the other methods for To better understand the prediction obtained by the Strati- 463
408 MCC, F1 score, accuracy, sensitivity, negative predictive fied Logistic Regression, we plotted a calibration curve [78] 464
409 value, precision recall AUC, and receiver operating charac- of its predictions (Figure 3). As one can notice, the Stratified 465
6 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
466 Logistic Regression returns well calibrated predictions, as it information does not help us to detect the relevance of the 487
467 trends follows the x = y line which represents the perfect features with enough precision. For this reason, we decided to 488
468 calibration from approximately 5% to approximately 75% calculate the feature ranking with machine learning, by em- 489
469 of the probabilities. This calibration curve confirms that the ploying Random Forests, which is the method that achieved 490
470 Stratified Logistic Regression made a good prediction. the top performance results in the binary classification ear- 491
471 B. FEATURE RANKING RESULTS We therefore applied the Random Forests feature ranking, 493
472 CKD predictive feature ranking. After verifying that com- and ranked the results by mean accuracy decrease posi- 494
473 putational intelligence is able to predict CKD developments tion (Table 7 and Figure 4). 495
474 among patients, we applied a feature ranking approach to The two rankings show some common aspects, both listing 496
475 detect the most predictive features in the clinical records. We AgeBaseline and eGFRBaseline in top positions, but show 497
476 employed two techniques: one based on traditional univariate also some significant differences. The biostatistics standing, 498
477 biostatistics tests, and one based on machine learning. for example, lists dBPBaseline as unrelevant predictive fea- 499
478 Regarding the biostatistics phase, applied the Mann– ture (Table 6), while Random Forests puts it on the 4th 500
479 Whitney test and of chi-squared test to each variable in position out of 19 (Table 7). Also, the biostatistics tests stated 501
480 relationship with the CKD target (subsection III-C), and that HistoryDiabetes is one of the most significant factors, 502
481 ranked the features by p-value (Table 6). with p-value of 0.0005 (Table 6), while the machine learning 503
482 The application of these biostatistics univariate tests, al- approach put the same feature on the last position of its 504
483 though useful, show a huge number of relevant variables: ranking. 505
484 13 variable of out 19 result being significant, having a p- The two rankings contain other minor differences that we 506
485 value smaller than 0.005 (Table 6). Since the biostatistics consider unimportant. 507
486 tests affirm that 68.42% of clinical factors are important, this CKD predictive feature ranking considering the tem- 508
VOLUME X, 2021 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
Mann-Whitney U
position feature test p-value
1 *AgeBaseline 0
2 *CreatinineBaseline 0
3 *eGFRBaseline 0
4 *CholesterolBaseline 9.490 × 10−04
5 *sBPBaseline 4.379 × 10−03
6 dBPBaseline 1.083 × 10−01
7 BMIBaseline 9.134 × 10−01
chi–squared
position feature p-value
1 *HistoryDiabetes 5 × 10−04
2 *HistoryCHD 5 × 10−04
3 *HistoryHTN 5 × 10−04
4 *DLDmeds 5 × 10−04
5 *DMmeds 5 × 10−04
6 *ACEIARB 5 × 10−04
7 *HistoryDLD 1.999 × 10−03
8 *HTNmeds 1.999 × 10−03
9 HistoryVascular 3.698 × 10−02
10 Sex 4.398 × 10−02
11 HistorySmoking 5.397 × 10−02
12 HistoryObesity 4.948 × 10−01
FIGURE 2: Calibration curve and plots for the results ob- TABLE 6: Feature ranking through biostatistics univari-
tained by Random Forests predictions applied on the dataset ate tests. We employed the Mann–Whitney U test [69] for
excluding the temporal component (Table 4). the numerical features and and the chi–square test [70] for the
binary features. We reported in blue and with an asterisk * the
only feature having a p-value lower than the 0.005 threshold,
that is 5 × 10−03 .
MDA average
position position feature
1 1.2 AgeBaseline
2 1.8 eGFRBaseline
3 3.3 DMmeds
4 3.7 dBPBaseline
5 5.2 CholesterolBaseline
6 6.0 HistoryVascular
7 7.0 HistoryCHD
8 8.3 sBPBaseline
9 8.7 CreatinineBaseline
10 11.4 HistoryHTN
11 11.6 HistorySmoking
12 11.9 DLDmeds
13 12.1 Sex
14 13.4 HTNmeds
15 14.6 HistoryObesity
16 15.9 HistoryDLD
17 17.4 ACEIARB
18 17.7 BMIBaseline
FIGURE 3: Calibration plot for the Stratified Logistic Re- 19 18.8 HistoryDiabetes
gression predictions applied on the dataset including the
TABLE 7: Feature ranking generated by Random Forests.
temporal component (Table 5).
MDA average position: average position obtained by each
feature through the accuracy decrease feature ranking of
509 poral component. As we did early for the CKD prediction, Random Forests.
510 we decided to re-run the feature ranking procedure by in-
511 cluding the temporal component regarding the year when the
512 patient developed chronic kidney disease or the year of the previously described ranking generated without it (Table 7). 516
513 last visit. Again, we employed Stratified Logistic Regression. The most relevant differences in ranking positions are the 517
514 The ranking generated considering the time compo- following: 518
st
515 nent (Table 8) showed several differences with respect to the • HTNmeds is at the 1 position in this ranking, while it 519
8 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
BMIBaseline
TABLE 8: Clinical feature ranking generated by the
HistoryDiabetes
Stratified Logistic Regression, depending on the the tem-
0 5 10 15 20 poral component (the year when the CKD happened or
MDA
of patient’s last visit). Importance: average coefficient of the
FIGURE 4: Barplot of the Random Forests feature rank- trained logistic regression model out of 100 executions.
ing. MDA average position: average position obtained by
eaach feature through the accuracy decrease feature ranking
of Random Forests. kidney disease, in a few minutes, and then use this informa- 547
course, do not replace laboratory exams and tests, that will 549
529 We also decided to measure the difference between these Logistic Regression (Table 8). 560
530 two rankings through two traditional metrics such as Spear- The features HTNmeds, ACEIARB, and HistoryDiabetes 561
531 man’s rank correlation coefficient and Kendall distance [80]– had an increase of 13 positions in the year standing (Table 8), 562
532 [82]. Both these metrics range between –1.0 and +1, with compared to their original position in the static ranking (Ta- 563
533 –1 meaning opposite rank orders, 0.0 meaning no correlation ble 7). Also, the feature BMIBaseline had an increase, of 10 564
534 between lists, and +1.0 meaning identical ranking. positions. The AgeBaseline variable, instead, had the biggest 565
535 The comparison between ranking without time (Table 7) position drop possible: it moved from the most important 566
536 and ranking considering time (Table 8) generated Spearman’s feature in the static standing (Table 7) to the less relevant 567
537 ρ = −0.209 and Kendall τ = −0.146. position in the year standing (Table 8). The other variables in 568
the year standing did not show so high position changes. 569
538 V. DISCUSSION These results show that taking medication for hyperten- 570
539 CKD prediction. Our results show that machine learning sion, taking ACE inhibitors, having a personal history of 571
540 methods are capable of predicting chronic kidney disease diabetes, and body–mass index have an important role in 572
541 from medical records of patients at risk of cardiovascular predicting if a patient will have serious CKD, when the 573
542 disease, both including the temporal information about the information about the disease event is included. The age of 574
543 year when the patient has developed the CKD and without the patient is very important when the CKD year is unknown, 575
544 it. These findings can have an immediate impact in the but becomes irrelevant here. 576
545 clinical settings: physicians, in fact, can take advantage of our Difference between temporal feature ranking and non- 577
546 methods to forecast the likelihood of a patient having chronic temporal feature ranking. The significant differences that 578
VOLUME X, 2021 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
579 emerge suggest strong overlap between the information con- Hypertension resulted being the 4th most important factor 635
580 tained within the time variable with certain variables in the in Salekin’s study [6], confirming the importance of the 636
581 previous model. It is plausible that some predictors encode a HistoryHTN variable which is ranked at the 3rd position in 637
582 ‘baseline’ level of risk of developing CKD, which is negated our Stratified Logistic Regression ranking (Table 8). Also 638
583 if the model knows in which year the CKD developed. diabetes history has high ranking in both the standings: 3rd 639
584 The variables which reduce most significantly between the position in the ranking of Salekin’s study [6], and 6th of 640
585 models are age, eGFR and creatinine, which are all clinical importance in our Stratified Logistic Regression ranking, as 641
586 indicators of an individual’s baseline risk of CKD. Inspection HistoryDiabetes (Table 8). 642
614 last position in our Stratified Logistic Regression standing, ity of diagnosis prediction in this dataset through classifier 668
615 while HistoryCHD and HistoryDLD were at unimportant po- calibration and calibration plots [84], and to perform the 669
616 sitions: 10th and 16th ranks out of 19 variables, respectively. feature ranking with a different feature ranking method such 670
617 Smoking history, instead, occupied a high rank both in our as SHapley Additive exPlanations (SHAP) [85]. Moreover, 671
618 standing and in the original study standing: our approach, in we also plan to study chronic kidney disease by applying our 672
619 fact, listed it as 5th out of 19. methods to CKD datasets of other types, such as microarray 673
620 Comparison with results of other studies. Several pub- gene expression [86], [87] and ultrasonography images [88]. 674
623 from electronic medical records. Most of them, however, use AUC: area under the curve. BP: blood pressure. CHD: coro- 676
624 feature ranking to reduce the number of variables for the nary hearth disease. CKD: chronic kidney disease. CVD: 677
625 binary classification, without reporting a final standing of cardiovascular disease. DLD: dyslipidemia. EE: error estima- 678
626 clinical factors ranked by importance [10], [12], [21]. tion. FR: feature ranking. KDIGO: Kidney Disease Improv- 679
627 Only the article of Salekin and colleagues [6] reports the ing Global Outcomes. HTN: hypertension. MCC: Matthews 680
628 most relevant variables found in their study: specific gravity, correlation coefficient. MDA: Model Decrease in Accu- 681
629 albumin, diabetes, hypertension, hemoglobin, serum creati- racy. MS: model selection. NPV: negative predictive value. 682
630 nine, red blood cells count, packed cell volume, appetite, and p-value: probability value. PPV: positive predictive value. 683
631 sodium resulted being at top positions. Even if the clinical PR: precision–recall. ROC: receiver operating characteristic. 684
632 features present in our datasets mainly differ from theirs, we SHAP: SHapley Additive exPlanations. SVM: Support Vec- 685
633 can notice the difference in the ranking positions between the tor Machine. TN rate: true negative rate. TP rate: true positive 686
10 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
688 COMPETING INTERESTS [12] Sirage Zeynu and Shruthi Patil. Survey on prediction of chronic kidney 750
disease using data mining classification techniques and feature selection. 751
689 The authors declare they have no competing interest. International Journal of Pure and Applied Mathematics, 118(8):149–156, 752
2018. 753
690 ACKNOWLEDGMENTS [13] Abdullah Al Imran, MD Nur Amin, and Fatema Tuj Johora. Classification 754
of chronic kidney disease using logistic regression, feedforward neural 755
691 The authors thank Saif Al-Shamsi (United Arab Emirates network and wide & deep learning. In Proceedings of ICIET 2018 756
692 University) for having provided additional information about – the 2018 International Conference on Innovation in Engineering and 757
Technology, pages 1–6, Osaka, Japan, 2018. IEEE. 758
693 the dataset. [14] AK Shrivas, Sanat Kumar Sahu, and HS Hota. Classification of chronic 759
kidney disease with proposed union based feature selection technique. 760
In Proceedings of ICIoTCT 2018 – the 3rd International Conference on 761
694 DATA AND SOFTWARE AVAILABILITY Internet of Things and Connected Technologies, pages 26–27, Jaipur, 762
695 The dataset used in this study is publicly available India, 2018. 763
696 under the Creative Commons Attribution 4.0 Interna- [15] S Belina VJ Sara and K Kalaiselvi. Ensemble swarm behaviour based 764
feature selection and support vector machine classifier for chronic kidney 765
697 tional (CC BY 4.0) license at: https://figshare.com/articles/ disease prediction. International Journal of Engineering & Technology, 766
698 dataset/Chronic_kidney_disease_in_patients_at_high_risk_ 7(2.31):190–195, 2018. 767
699 of_cardiovascular_disease_in_the_United_Arab_Emirates_ [16] Naveed Rahman Shawan, Syed Samiul Alam Mehrab, Fardeen Ahmed, 768
and Mohammad Sharatul Hasmi. Chronic kidney disease detection using 769
700 A_population-based_study/6711155?file=12242270 ensemble classifiers and feature set reduction. PhD thesis, BRAC Univer- 770
701 sity, 2019. 771
702 Our software code is publicly available under GNU Gen- [17] Suresh Babu Satukumati and Raghu Kogila Shivaprasad Satla. Feature 772
extraction techniques for chronic kidney disease identification. Kidney, 773
703 eral Public License v3.0 at: https://github.com/davidechicco/ 15:29, 2019. 774
704 chronic_kidney_disease_and_cardiovascular_disease [18] Tahmid Abrar, Samiha Tasnim, and MD Hossain. Early detection of 775
chronic kidney disease using machine learning. PhD thesis, BRAC 776
University, 2019. 777
705 REFERENCES [19] Mohamed Elhoseny, K Shankar, and J Uthayakumar. Intelligent diagnostic 778
prediction and classification system for chronic kidney disease. Scientific 779
706 [1] Valerie A Luyckx, Marcello Tonelli, and John W Stanifer. The global
Reports, 9(1):1–14, 2019. 780
707 burden of kidney disease and the sustainable development goals. Bulletin
708 of the World Health Organization, 96(6):414, 2018. [20] Stefan Ravizza, Tony Huschto, Anja Adamov, Lars Böhm, Alexander 781
Büsser, Frederik F Flöther, Rolf Hinzmann, Helena König, Scott M 782
709 [2] Sarmad Said and German T Hernandez. The link between chronic kidney
McAhren, Daniel H Robertson, Titus Schleyer, Bernd Schneidinger, and 783
710 disease and cardiovascular disease. Journal of Nephropathology, 3(3):99,
Wolfgang Petrich. Predicting the early risk of chronic kidney disease in 784
711 2014.
patients with diabetes using real–world data. Nature Medicine, 25(1):57– 785
712 [3] Kevin Damman, Mattia A Valente, Adriaan A Voors, Christopher M
59, 2019. 786
713 O’Connor, Dirk J van Veldhuisen, and Hans L Hillege. Renal impairment,
[21] Syed Imran Ali, Gwang Hoon Park, and Sungyoung Lee. Cost–sensitive 787
714 worsening renal function, and outcome in patients with heart failure: an
ensemble feature ranking and automatic threshold selection for chronic 788
715 updated meta-analysis. European Heart Journal, 35(7):455–469, 2014.
kidney disease diagnosis. Preprints, 2020050458, 2020. 789
716 [4] Anusorn Charleonnan, Thipwan Fufaung, Tippawan Niyomwong, Wandee [22] Pankaj Chittora, Sandeep Chaurasia, Prasun Chakrabarti, Gaurav Ku- 790
717 Chokchueypattanakit, Sathit Suwannawach, and Nitat Ninchawee. Pre- mawat, Tulika Chakrabarti, Zbigniew Leonowicz, Michał Jasiński, Łukasz 791
718 dictive analytics for chronic kidney disease using machine learning tech- Jasiński, Radomir Gono, Elżbieta Jasińska, and Vadim Bolshev. Prediction 792
719 niques. In Proceedings of 2016 MITicon – the 2016 Management and of chronic kidney disease-a machine learning perspective. IEEE Access, 793
720 Innovation Technology International Conference, pages 80–83, Bang- 9:17312–17334, 2021. 794
721 Saen, Chonburi, Thailand, 2016. IEEE. [23] Piervincenzo Ventrella, Giovanni Delgrossi, Gianmichele Ferrario, Marco 795
722 [5] Nusrat Tazin, Shahed Anzarus Sabab, and Muhammed Tawfiq Chowdhury. Righetti, and Marco Masseroli. Supervised machine learning for the 796
723 Diagnosis of chronic kidney disease using effective classification and assessment of chronic kidney disease advancement. Computer Methods 797
724 feature selection technique. In Proceedings of MediTec 2016 – the 2016 and Programs in Biomedicine, 209:106329, 2021. 798
725 International Conference on Medical Engineering, Health Informatics and [24] Md Rashed-Al-Mahfuz, Abedul Haque, Akm Azad, Salem A Alyami, Ju- 799
726 Technology, pages 1–6, Dhaka, Bangladesh, 2016. IEEE. lian MW Quinn, and Mohammad Ali Moni. Clinically applicable machine 800
727 [6] Asif Salekin and John Stankovic. Detection of chronic kidney disease and learning approaches to identify attributes of chronic kidney disease (CKD) 801
728 selecting important predictive attributes. In Proceedings of IEEE ICHI for use in low-cost diagnostic screening. IEEE Journal of Translational 802
729 2016 – the 2016 IEEE International Conference on Healthcare Informatics, Engineering in Health and Medicine, 9:1–11, 2021. 803
730 pages 262–270, Chicago, Illinois, USA, 2016. IEEE. [25] Surya Krishnamurthy, Kapeleshh Ks, Erik Dovgan, Mitja Luštrek, Bar- 804
731 [7] Huseyin Polat, Homay Danaei Mehr, and Aydin Cetin. Diagnosis of bara Gradišek Piletič, Kathiravan Srinivasan, Yu-Chuan Jack Li, Anton 805
732 chronic kidney disease based on support vector machine by feature se- Gradišek, and Shabbir Syed-Abdul. Machine learning prediction models 806
733 lection methods. Journal of Medical Systems, 41(4):55, 2017. for chronic kidney disease using national health insurance claim data in 807
734 [8] Made Satria Wibawa, I Made Dendi Maysanjaya, and I Made Agus Wira- Taiwan. In Healthcare, volume 9, page 546. Multidisciplinary Digital 808
735 hadi Putra. Boosted classifier and features selection for enhancing chronic Publishing Institute, 2021. 809
736 kidney disease diagnose. In Proceedings of CITSM 2017 – the 5th [26] Monika Gupta and Parul Gupta. Predicting chronic kidney disease using 810
737 International Conference on Cyber and IT Service Management, pages 1– machine learning. Emerging Technologies for Healthcare: Internet of 811
738 6, Denpasar, Bali, Indonesia, 2017. IEEE. Things and Deep Learning Models, pages 251–277, 2021. 812
739 [9] Abdulhamit Subasi, Emina Alickovic, and Jasmin Kevric. Diagnosis [27] University of California Irvine Machine Learning Repository. Chronic 813
740 of chronic kidney disease by using random forest. In Proceedings of kidney disease data set. https://archive.ics.uci.edu/ml/datasets/chronic_ 814
741 CMBEBIH 2017 – the 2017 International Conference on Medical and kidney_disease URL visited on 4th October, 2021. 815
742 Biological Engineering, pages 589–594. Springer, 2017. [28] Saif Al-Shamsi, Dybesh Regmi, and Romona D Govender. Chronic kidney 816
743 [10] Sirage Zeynu and Shruti Patil. Prediction of chronic kidney disease using disease in patients at high risk of cardiovascular disease in the United Arab 817
744 data mining feature selection and ensemble method. International Journal Emirates: a population-based study. PLoS One, 13(6), 2018. 818
745 of Data Mining in Genomics & Proteomics, 9(1):1–9, 2018. [29] Gary S Francis. ACE inhibition in cardiovascular disease. New England 819
746 [11] Adeola Ogunleye and Qing-Guo Wang. Enhanced XGBoost–based au- Journal of Medicine, 342:201–202, 2000. 820
747 tomatic diagnosis system for chronic kidney disease. In Proceedings of [30] Jun Agata, Daigo Nagahara, Shuichi Kinoshita, Yoshitoki Takagawa, Nori- 821
748 IEEE ICCA 2018 – the 14th IEEE International Conference on Control hito Moniwa, Daisuke Yoshida, Nobuyuki Ura, and Kazuaki Shimamoto. 822
749 and Automation, pages 805–810, Anchorage, Alaska, USA, 2018. IEEE. Angiotensin II receptor blocker prevents increased arterial stiffness in 823
VOLUME X, 2021 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
824 patients with essential hypertension. Circulation Journal, 68(12):1194– [57] Michael Wainberg, Babak Alipanahi, and Brendan J Frey. Are random 896
825 1198, 2004. forests truly the best classifiers? Journal of Machine Learning Research, 897
826 [31] Kidney Disease: Improving Global Outcomes (KDIGO) Transplant Work 17(1):3837–3841, 2016. 898
827 Group. KDIGO clinical practice guideline for the care of kidney transplant [58] David H Wolpert. The lack of a priori distinctions between learning 899
828 recipients. American Journal of Transplantation, 9:S1, 2009. algorithms. Neural Computation, 8(7):1341–1390, 1996. 900
829 [32] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learn- [59] Yvan Saeys, Thomas Abeel, and Yves Van de Peer. Robust feature 901
830 ing: from theory to algorithms. Cambridge University Press, 2014. selection using ensemble feature selection techniques. In Proceedings of 902
831 [33] André Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. ECML PKDD 2008 – the 2008 Joint European Conference on Machine 903
832 Permutation importance: a corrected feature importance measure. Bioin- Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 904
834 [34] Melissa A Hardy. Regression with Dummy Variables. Sage, 1993. [60] Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. Variable 906
835 [35] A Rogier Donders, Geert J van der Heijden, Theo Stijnen, and Karel G selection using random forests. Pattern Recognition Letters, 31(14):2225– 907
836 Moons. Review: a gentle introduction to imputation of missing values. 2236, 2010. 908
837 Journal of Clinical Epidemiology, 59(10):1087–1091, 2006. [61] Yanjun Qi. Random forest for bioinformatics. In Ensemble Machine 909
838 [36] Luca Oneto. Model Selection and Error Estimation in a Nutshell. Springer, Learning, Boston, Massachusetts, USA, 2012. 910
839 Berlin, Germany, 2020. [62] Ramón Díaz-Uriarte and Sara Alvarez De Andres. Gene selection and clas- 911
840 [37] Kathleen F Kerr. Comments on the analysis of unbalanced microarray sification of microarray data using random forest. BMC Bioinformatics, 912
842 [38] Rosalía Laza, Reyes Pavón, Miguel Reboiro-Jato, and Florentino Fdez- [63] Davide Chicco and Cristina Rovelli. Computational prediction of diagno- 914
843 Riverola. Evaluating the effect of unbalanced data in biomedical document sis and feature selection on mesothelioma patient health records. PLoS 915
844 classification. Journal of Integrative Bioinformatics, 8(3):105–117, 2011. One, 14(1):e0208737, 2019. 916
845 [39] Kyunghee Han, Kyee Zu Kim, Jung Mi Oh, In Wha Kim, Kyungim Kim, [64] Phillip Good. Permutation tests: a practical guide to resampling methods 917
846 and Taesung Park. Unbalanced sample size effect on the genome-wide for testing hypotheses. Springer Science & Business Media, 2013. 918
847 population differentiation studies. International Journal of Data Mining [65] M Luz Calle and Victor Urrea. Letter to the editor: stability of random 919
848 and Bioinformatics, 6(5):490–504, 2012. forest importance measures. Briefings in Bioinformatics, 12(1):86–89, 920
849 [40] Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, 2010. 921
850 and Gong Bing. Learning from class-imbalanced data: review of methods [66] Miron Bartosz Kursa. Robustness of random forest-based gene selection 922
851 and applications. Expert Systems with Applications, 73:220–239, 2017. methods. BMC Bioinformatics, 15(1):8, 2014. 923
852 [41] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip [67] Huazhen Wang, Fan Yang, and Zhiyuan Luo. An experimental study of the 924
853 Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal intrinsic stability of random forest variable importance measures. BMC 925
854 of artificial intelligence research, 16:321–357, 2002. Bioinformatics, 17(1):60, 2016. 926
855 [42] Tuanfei Zhu, Yaping Lin, and Yonghe Liu. Synthetic minority oversam- [68] D Sculley. Rank aggregation for similar items. In Proceedings of the 2007 927
856 pling technique for multiclass imbalance problems. Pattern Recognition, SIAM International Conference on Data Mining, Minneapolis, Minnesota, 928
858 [43] Christoph Molnar. Interpretable machine learning. https://christophm. [69] Thomas W MacFarland and Jan M Yates. Mann–Whitney U test. In 930
859 github.io/book/, 2018. Introduction to Nonparametric Statistics for the Biological Sciences using 931
860 [44] Isabelle Guyon and André Elisseeff. An introduction to variable and R, pages 103–132. Springer, Berlin, Germany, 2016. 932
861 feature selection. Journal of Machine Learning Research, 3(Mar):1157– [70] Priscilla E Greenwood and Michael S Nikulin. A guide to chi–squared 933
862 1182, 2003. testing, volume 280. John Wiley & Sons, Hoboken, New Jersey, USA, 934
1996. 935
863 [45] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[71] Daniel J Benjamin, James O Berger, Magnus Johannesson, Brian A Nosek, 936
864 [46] John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern
E-J Wagenmakers, Richard Berk, Kenneth A Bollen, Björn Brembs, 937
865 Analysis. Cambridge University Press, Cambridge, England, United
Lawrence Brown, Colin Camerer, David Cesarini, Christopher D Cham- 938
866 Kingdom, 2004.
bers, Merlise Clyde, Thomas D Cook, Paul De Boeck, Zoltan Dienes, 939
867 [47] S Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support
Anna Dreber, Kenny Easwaran, Charles Efferson, Ernst Fehr, Fiona Fidler, 940
868 vector machines with gaussian kernel. Neural Computation, 15(7):1667–
Andy P Field, Malcolm Forster, Edward I George, Richard Gonzalez, 941
869 1689, 2003.
Steven Goodman, Edwin Green, Donald P Green, Anthony G Greenwald, 942
870 [48] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. Jarrod D Hadfield, Larry V Hedges, Leonhard Held, Teck Hua Ho, Herbert 943
871 MIT Press, Cambrdige, Massachusetts, USA, 2016. Hoijtink, Daniel J Hruschka, Kosuke Imai, Guido Imbens, John P A 944
872 [49] Mohammed J Zaki and Wagner Meira Jr. Data Mining and Machine Ioannidis, Minjeong Jeon, James Holland Jones, Michael Kirchler, David 945
873 Learning: Fundamental Concepts and Algorithms. Cambridge University Laibson, John List, Roderick Little, Arthur Lupia, Edouard Machery, 946
874 Press, Cambridge, England, United Kingdom, 2019. Scott E Maxwell, Michael McCarthy, Don A Moore, Stephen L Morgan, 947
875 [50] Tianqi Chen and Carlos Guestrin. XgBoost: a scalable tree boosting sys- Marcus Munafó, Shinichi Nakagawa, Brendan Nyhan, Timothy H Parker, 948
876 tem. In Proceedings of KDD ’16 – the 22nd ACM SigKDD International Luis Pericchi, Marco Perugini, Jeff Rouder, Judith Rousseau, Victoria 949
877 Conference on Knowledge Discovery and Data Mining, San Francisco, Savalei, Felix D Schönbrodt, Thomas Sellke, Betsy Sinclair, Dustin Tin- 950
878 California, USA, 2016. gley, Trisha Van Zandt, Simine Vazire, Duncan J Watts, Christopher 951
879 [51] Robert C Holte. Very simple classification rules perform well on most Winship, Robert L Wolpert, Yu Xie, Cristobal Young, Jonathan Zinman, 952
880 commonly used datasets. Machine Learning, 11(1):63–90, 1993. and Valen E Johnson. Redefine statistical significance. Nature Human 953
881 [52] Ilenia Orlandi, Luca Oneto, and Davide Anguita. Random forests model Behaviour, 2(1):6–10, 2018. 954
882 selection. In European Symposium on Artificial Neural Networks, Com- [72] Cyrus R Mehta and Nitin R Patel. Exact logistic regression: theory and 955
883 putational Intelligence and Machine Learning, pages ES2016–48, Bruges, examples. Statistics in Medicine, 14(19):2143–2160, 1995. 956
884 Belgium, 2016. [73] Davide Chicco and Giuseppe Jurman. Machine learning can predict 957
885 [53] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An efficient survival of patients with heart failure from serum creatinine and ejection 958
886 approach for assessing hyperparameter importance. In International Con- fraction alone. BMC Medical Informatics and Decision Making, 20(1):16, 959
887 ference on Machine Learning, pages 754–762, Beijing, China, 2014. 2020. 960
888 [54] Simon Bernard, Laurent Heutte, and Sébastien Adam. Influence of [74] Davide Chicco. Ten quick tips for machine learning in computational 961
889 hyperparameters on random forest accuracy. In International Workshop biology. BioData Mining, 10(35):1–17, 2017. 962
890 on Multiple Classifier Systems, pages 171–180, Reykjavik, Iceland, 2009. [75] Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. The Matthews 963
891 [55] Max Kuhn. Building predictive models in R using the caret package. correlation coefficient (MCC) is more informative than Cohen’s Kappa and 964
892 Journal of Statistical Software, 28(1):1–26, 2008. Brier score in binary classification assessment. IEEE Access, 9:78368– 965
893 [56] M Fernández-Delgado, E Cernadas, S Barro, and D Amorim. Do we need 78381, 2021. 966
894 hundreds of classifiers to solve real world classification problems? Journal [76] Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. The Matthews 967
895 of Machine Learning Research, 15(1):3133–3181, 2014. correlation coefficient (MCC) is more reliable than balanced accuracy, 968
12 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
969 bookmaker informedness, and markedness in two-class confusion matrix CHRISTOPHER A. LOVEJOY (ORCID: 0000- 1036
970 evaluation. BioData Mining, 14, 2021. 0003-0919-1264) is a medical doctor with inter- 1037
971 [77] Davide Chicco, Valery Starovoitov, and Giuseppe Jurman. The benefits ests in applied machine learning and bioinformat- 1038
972 of the Matthews correlation coefficient (MCC) over the diagnostic odds ics. He undertook his undergraduate medical de- 1039
973 ratio (DOR) in binary classification assessment. IEEE Access, 9:47112– gree at the University of Cambridge (United King- 1040
974 47124, 2021. dom) before completing a post-graduate degree in 1041
975 [78] Peter C Austin and Ewout W Steyerberg. Graphical assessment of internal data science and machine learning at University 1042
976 and external calibration of logistic regression models by using loess
College London (United Kingdom). 1043
977 smoothers. Statistics in Medicine, 33(3):517–535, 2014.
1044
978 [79] Norman E Breslow, Lue P Zhao, Thomas R Fears, and Charles C
1045
979 Brown. Logistic regression for stratified case–control studies. Biometrics,
980 44(3):891–899, 1988.
981 [80] Jerrold H Zar. Spearman rank correlation. In Encyclopedia of Biostatistics,
982 volume 7, Hoboken, New Jersey, USA, 2005. Wiley Online Library.
983 [81] Franz J Brandenburg, Andreas Gleißner, and Andreas Hofmeier. Com-
984 paring and aggregating partial orders with Kendall tau distances. In
985 Proceedings of WALCOM 2012 – the 6th International Workshop on
986 Algorithms and Computation, pages 88–99, Dhaka, Bangladesh, 2012.
987 Springer.
988 [82] Davide Chicco, Eleonora Ciceri, and Marco Masseroli. Extended Spear-
989 man and Kendall coefficients for gene annotation list correlation. In Pro-
990 ceedings of CIBB 2014 – the 11th International Meeting on Computational
991 Intelligence Methods for Bioinformatics and Biostatistics, volume 8623 of
992 Lecture Notes in Computer Science, pages 19–32, Cambridge, England,
993 United Kingdom, 2015. Springer.
994 [83] David Clayton and Jack Cuzick. Multivariate generalizations of the
995 proportional hazards model. Journal of the Royal Statistical Society: Series
996 A (General), 148(2):82–108, 1985.
997 [84] Peter A Flach. Classifier calibration. In Encyclopedia of Machine Learning
998 and Data Mining. Springer, Berlin, Germany, 2016. LUCA ONETO (ORCID: 0000-0002-8445- 1046
999 [85] Scott M Lundberg and Su-In Lee. A unified approach to interpreting 395X) received his Bachelor of Science degree 1047
1000 model predictions. In Proceedings of NIPS 2017 – the 31st International and Master of Science degree in electronic engi- 1048
1001 Conference on Neural Information Processing Systems, pages 4768–4777, neering at Università di Genova, Italy respectively 1049
1002 2017. in 2008 and 2010. In 2014 he received his PhD 1050
1003 [86] Le-Ting Zhou, Shen Qiu, Lin-Li Lv, Zuo-Lin Li, Hong Liu, Ri-Ning Tang, from the same university in the School of Sciences 1051
1004 Kun-Ling Ma, and Bi-Cheng Liu. Integrative bioinformatics analysis and Technologies for Knowledge and Informa- 1052
1005 provides insight into the molecular mechanisms of chronic kidney disease. tion Retrieval with the thesis "Learning based on 1053
1006 Kidney and Blood Pressure Research, 43(2):568–581, 2018.
empirical data". In 2017 he obtained the Italian 1054
1007 [87] Zhi Zuo, Jian-Xiao Shen, Yan Pan, Juan Pu, Yong-Gang Li, Xing-hua
National Scientific Qualification for the role of 1055
1008 Shao, and Wan-Peng Wang. Weighted gene correlation network analy-
1009 sis (WGCNA) detected loss of MAGI2 promotes chronic kidney disease associate professor in computer engineering and in 2018 he obtained the 1056
1010 (CKD) by podocyte damage. Cellular Physiology and Biochemistry, one in Computer Science. He worked as Assistant Professor in Computer 1057
1011 51(1):244–261, 2018. Engineering at Università di Genovaa from 2016 to 2019. In 2018 he was 1058
1012 [88] Chih-Yin Ho, Tun-Wen Pai, Yuan-Chi Peng, Chien-Hung Lee, Yung- co-funder of the ZenaByte s.r.l. spin-off company. In 2019 he obtained 1059
1013 Chih Chen, Yang-Ting Chen, and Kuo-Su Chen. Ultrasonography image the Italian National Scientific Qualification for the role of full professor in 1060
1014 analysis for detection and classification of chronic kidney disease. In Pro- computer science and computer engineering. In 2019 he became associate 1061
1015 ceedings of CISIS 2012 – the 6th International Conference on Complex, professor in computer science at Università di Pisa and currently is associate 1062
1016 Intelligent, and Software Intensive Systems, pages 624–629, Palermo, professor in computer engineering at Università di Genova. He has been 1063
1017 Italy, 2012. IEEE. involved in several Horizon 2020 projects (S2RJU, ICT, DS) and he has 1064
been awarded with the Amazon AWS Machine Learning and Somalvico 1065
(best Italian young AI researcher) Awards. His first main topic of research is 1066
the statistical learning theory with particular focus on the theoretical aspects 1067
of the problems of (semi) supervised model selection and error estimation. 1068
His second main topic of research is data science with particular reference 1069
1018 DAVIDE CHICCO (ORCID: 0000-0001-9655- to the problem of trustworthy AI and the solution of real world problems by 1070
1019 7142) obtained his Bachelor of Science and Mas- exploiting and improving the most recent learning algorithms and theoretical 1071
1020 ter of Science degrees in computer science at Uni- results in the fields of machine learning and data mining. 1072
1021 versità di Genova (Genoa, Italy) respectively in 1073
1022 2007 and 2010. He then started the PhD program
1074
1023 in computer engineering at Politecnico di Milano
1024 university (Milan, Italy), where he graduated in
1025 spring 2014. He also spent a semester as visiting
1026 doctoral scholar at University of California Irvine
1027 (USA). From September 2014 to September 2018,
1028 he has been a post-doctoral researcher at the Princess Margaret Cancer Cen-
1029 tre and guest at University of Toronto. From September 2018 to December
1030 2019, he was a scientific associate researcher at the Peter Munk Cardiac
1031 Centre (Toronto, Ontario, Canada). From January 2020 to January 2021, he
1032 has been a scientific associate researcher at the Krembil Research Institute
1033 (Toronto, Ontario, Canada). Since January 2021, he started to work as a
1034 scientific research associate at the Institute of Health Policy Management
1035 and Evaluation of University of Toronto.
VOLUME X, 2021 13
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access
D.Chicco et al.: Chronic kidney disease and machine learning feature ranking
TP · TN − FP · FN
MCC = p
(T P + F P ) · (T P + F N ) · (T N + F P ) · (T N + F N )
(1)
1092 (worst value = −1; best value = +1)
2 · TP
F1 score = (2)
2 · TP + FP + FN
1093 (worst value = 0; best value = 1)
TP + TN
accuracy = (3)
TP + FN + TN + FP
1094 (worst value = 0; best value = 1)
TP
true positive rate, recall, sensitivity = (4)
TP + FN
1095 (worst value = 0; best value = 1)
TN
true negative rate, specificity = (5)
TN + FP
1096 (worst value = 0; best value = 1)
TP
positive predictive value, precision = (6)
TP + FP
1097 (worst value = 0; best value = 1)
TN
negative predictive value = (7)
TN + FN
1098 (worst value = 0; best value = 1)
(
true positive rate on the x axis
Precision-Recall (PR) curve =
precision on the y axis
(8)
14 VOLUME X, 2021
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/