0% found this document useful (0 votes)
85 views14 pages

Machine Learning in CKD & CVD Analysis

This article discusses a machine learning analysis of health records from patients with chronic kidney disease who are at risk of cardiovascular disease. The study applied machine learning methods to predict if patients develop severe chronic kidney disease, both including and excluding information about the year it occurred. When including temporal data, the models achieved a mean Matthews correlation coefficient of 0.499, and 0.469 when excluding temporal data. Feature ranking identified age, estimated glomerular filtration rate, and creatinine as important factors without temporal data, and hypertension, smoking, and diabetes as important with temporal data included. The results provide insights into diagnoses and clinically important variables not otherwise observable.

Uploaded by

Mortimer Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views14 pages

Machine Learning in CKD & CVD Analysis

This article discusses a machine learning analysis of health records from patients with chronic kidney disease who are at risk of cardiovascular disease. The study applied machine learning methods to predict if patients develop severe chronic kidney disease, both including and excluding information about the year it occurred. When including temporal data, the models achieved a mean Matthews correlation coefficient of 0.499, and 0.469 when excluding temporal data. Feature ranking identified age, estimated glomerular filtration rate, and creatinine as important factors without temporal data, and hypertension, smoking, and diabetes as important with temporal data included. The results provide insights into diagnoses and clinically important variables not otherwise observable.

Uploaded by

Mortimer Smith
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2021.DOI

A machine learning analysis of health


records of patients with chronic kidney
disease at risk of cardiovascular disease
DAVIDE CHICCO*1 , CHRISTOPHER A. LOVEJOY2,3 , and LUCA ONETO4,5
1
University of Toronto, Toronto, Ontario, Canada
2
University College London, London, United Kingdom
3
University College London Hospital, London, United Kingdom
4
Università di Genova, Genoa, Italy
5
ZenaByte Srl, Genoa, Italy
*Corresponding author: Davide Chicco (e-mail: [email protected]).

ABSTRACT Chronic kidney disease (CKD) describes a long-term decline in kidney function and has many
causes. It affects hundreds of millions of people worldwide every year. It can have a strong negative impact
on patients, especially when combined with cardiovascular disease (CVD): patients with both conditions
have lower survival chances. In this context, computational intelligence applied to electronic health records
can provide insights to physicians that can help them make better decisions about prognoses or therapies.
In this study we applied machine learning to medical records of patients with CKD and CVD. First, we
predicted if patients develop severe CKD, both including and excluding information about the year it
occurred or date of the last visit. Our methods achieved top mean Matthews correlation coefficient (MCC)
of +0.499 in the former case and a mean MCC of +0.469 in the latter case. Then, we performed a feature
ranking analysis to understand which clinical factors are most important: age, eGFR, and creatinine when
the temporal component is absent; hypertension, smoking, and diabetes when the year is present. We then
compared our results with the current scientific literature, and discussed the different results obtained when
the time feature is excluded or included. Our results show that our computational intelligence approach
can provide insights about diagnosis and and relative important of different clinical variables that otherwise
would be impossible to observe.

INDEX TERMS machine learning; computational intelligence; feature ranking; electronic health records;
chronic kidney disease; CKD; cardiovascular diseases; CVD.

1 I. INTRODUCTION Several studies involving analyses done with machine learn- 16

2 Chronic kidney disease (CKD) kills around 1.2 million peo- ing applied to clinical records of patients with CKD have 17

3 ple and affects more than 700 million people worldwide appeared in the biomedical literature in the recent past [4]– 18

4 every year [1]. CKD is commonly caused by diabetes and [26]. 19

5 high blood pressure, and are more likely to be developed in Among the studies found, a large number involves appli- 20

6 subjects with a family history of CKD . cations of machine learning methods to the Chronic Kidney 21

7 Individuals with chronic kidney disease are at higher risk Disease dataset of the University of California Irvine Ma- 22

8 of cardiovascular disease (such as myocardial infarction, chine Learning Repository [27]. 23

9 stroke, heart failure) [2], and patients with both diseases are On this dataset, Shawan [16] and Abrar [18] employed 24

10 more likely to have worse prognoses [3]. several data mining methods for patient classification in their 25

11 In this context, computational intelligence methods ap- PhD theses. Wibawa et al. [8] applied a correlation-based 26

12 plied to electronic medical records of patients can provide feature selection methods and AdaBoost to this dataset, while 27

13 interesting and useful information to doctors and physicians, Al Imran et al. [13] employed deep learning techniques to the 28

14 helping them to more precisely predict the trend of the con- same end. 29

15 dition and consequently to make decisions on the therapies. Rashed-al-Mahfuz and colleagues [24] also employed 30

VOLUME X, 2021 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

31 a number of machine learning methods for patient clas- a data mining step, which instead could retrieve additional 87

32 sification and described the dataset precisely. Syed Im- information or unseen patterns in these data. 88

33 ran Ali and coauthors [21] applied several machine learning To fill this gap, we perform here two analyses: first, 89

34 methods to the same dataset to determine a globlal threshold we apply machine learning methods to binary classify the 90

35 to discriminate between useful clinical factors and irrelevant serious CKD development, and then to rank the clinical 91

36 ones. features by importance. Additionally to what Al-Shamsi and 92

37 Salekin and Stankovic [6] used Lasso for feature selection, colleagues [28] did, we also performed the same analysis 93

38 while Belina and colleagues [15] applied a hybrid wrapper excluding the year when the disease happened to each pa- 94

39 and filter based feature selection for the same scope. tient (Figure 1). 95

40 Tazin et al. [5] employed several data mining methods As major results, we show that computational intelligence 96

41 for patient classification. Ogunleye and Wang [11] used is capable of predicting a serious CKD development with or 97

42 an enhanced XGBoost method for patient classification. without the time information, and that the most important 98

43 Satukumati and coauthors [17] used several techniques for clinical features change if the temporal component is con- 99

44 feature extraction. Elhoseny and colleagues [19] developed sidered or not. 100

45 a method called Density based Feature Selection (DFS) We organize the rest of the paper as follows. After this 101

46 with Ant Colony based Optimization (D-ACO) algorithm Introduction, we describe the dataset we analyzed (section II) 102

47 for the classification of patiens with CKD. Polat et al. [7] and the methods we employed (section III). We then report 103

48 showed an application of a Support Vector Machine vari- the binary classification and feature ranking results (sec- 104

49 ant for patient classification to the same dataset. Chit- tion IV) and discuss them afterwards (section V). Finally, we 105

50 tora and colleagues [22] applied numerous machine learn- recap the main points of this study and mention limitations 106

51 ing classifiers and their variants for patient classification. and future developments (section VI). 107

52 Zeynu and Patil [12] published a survey on computational


53 intelligence methods for binary classification and feature
54 selection applied on the same dataset. Charleonnan and coau- EHRs
55 thors [4] applied numerous machine learning classifiers and dataset
56 their variants for patient classification. Subas et al. [9] fo-
57 cused on Random Forests for patient classification and fea-
58 ture ranking. Zeynu and colleagues [10] applied numerous
59 machine learning classifiers for patient classification and Data reading
60 clinical feature selection. All these studies were focused
61 more on the improvement and enhancement of computational
62 intelligence methods, rather than on clinical implications of
63 the results. Dataset Dataset with
64 Few studies published recently employed datasets different without time time
65 from the UC Irvine ML Repository one. Ventrella and coau-
66 thors [23] applied several machine learning methods to an
67 original dataset of EHRs collected at the hospital of Vimer-
68 cate (Italy) for assessing Chronic Kidney Disease progres- Binary
Feature ranking
classification
69 sion. This study indicated creatinine level, urea, red blood
70 cells count, eGFR trend among the most relevant clinical
71 factors for CKD advancement, highlighting that eGFR did
72 not resulted being the top most important one.
73 Ravizza and colleagues [20] employed machine learning Binary
Feature ranking
74 methods on a dataset of patients with diabetes from the classification
results
results
75 IBM Explorys database to predict if they will develop CKD.
76 This study states that the usage of diabetes-related data can
77 generate better predictions on data of patients with CKD.
78 To the best of our knowledge, no study published before FIGURE 1: Flowchart of the computational pipeline of
79 involves the usage of machine learning methods to investi- this study. Cylinder shape: dataset. Rectangular shape: pro-
80 gate a dataset of patients with both CKD and CVD. cess. Parallelogram shape: input/output.
81 In this manuscript, we analyzed a dataset of 491 patients
82 from United Arab Emirates, released by Al-Shamsi and col-
83 leagues [28] in 2018 (section II). In their original study, the II. DATASET 108

84 authors employed multivariable Cox’s proportional hazards In this study, we examine a dataset of electronic medical 109

85 to identify the independent risk factors causing CKD at stages records of 491 patients collected at the Tawam Hospital in 110

86 3-5. Although this analysis was interesting, it did not involve Al-Ain city (Abu Dhabi, United Arab Emirates), between 111

2 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

112 1st January and 31st December 2008 [28]. The patients All the dataset features refer to the first visits had by the 168

113 included 241 women and 250 men, with an average age of patients in January 2008, except the EventCKD35 and the 169

114 53.2 years (Table 2 and Table 3). time year variables that refer to the end of the follow-up 170

115 Each patient has a chart of 13 clinical variables, expressing period, in June 2017. 171

116 her/his values of laboratory tests and exams or data about More information about this dataset can be found in the 172

117 her/his medical history (Table 1). Each patient included in original article [28]. 173

118 this study had cardiovascular disease or was at risk of car-


119 diovascular disease, according to the standards of Tawam III. METHODS 174

120 Hospital [28]. The problem described earlier (section I) can be addressed 175

121 Several features regard the personal history of the pa- as conventional binary classification framework, where the 176

122 tient: diabetes history, dyslipidemia history, hypertension goal is to predict EventCKD35, using the data described 177

123 history, obesity history, smoking history, and vascular disease earlier (section II). This target feature indicates if the patient 178

124 history (Table 3) state if the patient biography had those has the chronic kidney disease in the stage 3 to 5, which 179

125 specific diseases or conditions. Dyslipidemia indicates ex- represents an advanced stage. 180

126 cessive presence of lipids in the blood. Two variables refer In binary classification, the problem is to identify the 181

127 to the blood pressure (diastolic blood pressure and systolic unknown relation R between the input space X (in our case: 182

128 blood pressure), and other variables refer to blood levels the features described in Section II) and an output space 183

129 obtained through laboratory tests (cholesterol, creatinine). Y ⊆ {0, 1} (in our case: the EventCKD35 target) [32]. 184

130 Few features state if the patients have taken specific-disease Once a relation is established, one can find a way to discover 185

131 medicines (dyslipidemia medications, diabetes medications, what the most influencing factors are in the input space for 186

132 and hypertension medications) or inhibitors (angiotensin- predicting the associated element in the output space, namely 187

133 converting-enzyme inhibitors, or angiotensin II receptor to determine the feature importance [33]. 188

134 blockers) which are known to be effective against cardio- Note that, X can be composed by categorical features 189

135 vascular diseases [29] and hypertension [30]. The remaining (the values of the features belong to a finite unsorted set) 190

136 factors describe the physical conditions of each patient: age, and numerical–valued features (the values of the features 191

137 body–mass index, biological sex (Table 3). belong to a possibly infinite sorted set). In case of categorical 192

138 Among the clinical features available for this dataset, features, one-hot encoding [34] can map them in a series of 193

139 the EventCKD35 binary variable states if the patient had numerical features. The consequent resulting feature space is 194

140 chronic kidney disease at high stage (3rd , 4th , or 5th stage). X ⊆ Rd . 195

141 According to the Kidney Disease Improving Global Out- A set of data Dn = {(x1 , y1 ), . . . , (xn , yn )}, with xi ∈ X 196

142 comes (KDIGO) organization [31], CKD’s can be grouped and yi ∈ Y, is available in a binary classification framework. 197

143 into 5 stages: Moreover, some values of xi might be missing [35]. In this 198

144 • Stage 1: normal kidney function, no CKD; case, if the missing value is categorical, we introduce an 199

145 • Stage 2: mildly decreased function of kidney, mild additional category for missing values for the specific feature. 200

146 CKD; Instead, if the missing value is associated with a numerical 201

147 • Stage 3: moderate decrease of kidney function, moder- feature, we replace the missing value with the mean value of 202

148 ate CKD; the specific feature, and we introduce an additional logical 203

149 • Stage 4: severe decrease of kidney function, severe feature to indicate if the value of the feature is missing for a 204

150 CKD; particular sample [35]. 205

151 • Stage 5: extreme CKD and kidney failure. Our goal is to identify a model M : X → Y, which best 206

152 When the EventCKD35 variable has value 0, the pa- approximates R, through an algorithm AH characterized by 207

153 tient’s kidney condition is at stage 1 or 2. Instead, when its set of hyper-parameters H. The accuracy of the model M 208

154 EventCKD35 equals to 1, the patient’s kidney is at stage 3, 4, to represent the unknown relation R is measured using dif- 209

155 or 5 (Table 1). ferent indices of performance (Supplementary information). 210

156 Even if the value of eGFR has a role to the definition of Since the hyper-parameters H influence the ability of AH 211

157 the CKD stages in the KDIGO guidelines [31], we found to estimate R, we need to adopt a proper Model Selection 212

158 weak correlation between the eGFRBaseline variable and (MS) procedure [36]. In this work, we exploited the Com- 213

159 the target variable EventCKD35 in this dataset. The two plete Cross Validation (CCV) procedure [36]. CCV relies on 214

160 variables have Pearson correlation coefficient equal to −0.36 a simple idea: we resample the original dataset Dn many 215

161 and Kendall distance of −0.3, both in the [−1, +1] interval (nr = 500) times without replacement to build a training 216

162 where −1 indicates perfectly opposite correlation, 0 indicates set of size l Lrl while the remaining samples are kept in the 217

163 no correlation, and +1 indicates perfect correlation, validation set Vvr , with r ∈ {1, · · · , nr }. In order to perform 218

164 The time year derived factor indicates in which year the the MS phase, to select the best combination of the hyper- 219

165 patient had a serious chronic kidney disease, or the year when parameters H in the set of possible ones H = {H1 , H2 , · · · } 220

166 he/she had his/her last outpatient visit, whichever occurred using the algorithm AH , the hyper-parameters which mini- 221

167 first (Supplementary information), in the follow-up period. mize the average performance of the model, trained on the 222

VOLUME X, 2021 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

feature explanation measurement unit values


ACEIARB if the patient has taken ACEI or ARB boolean [0, 1]
AgeBaseline age of the patient integer [23, 24, . . . , 80, 89]
BMIBaseline body–mass index of the patient kg/m2 [13, 16, 17, . . . , 53, 57]
CholesterolBaseline level of cholesterol mmol/L [2.23, 2.40, . . . , 8.20, 9.30]
CreatinineBaseline level of creatinine in the blood mol/l [6, 27, . . . , 113, 123]
dBPBaseline diastolic blood pressure mmHg [41, 45, . . . , 110, 112[
DLDmeds if the patient has taken dyslipidemia medications boolean [0, 1]
DMmeds if the patient has taken diabetes medications boolean [0, 1]
eGFRBaseline estimated glomerular filtration rate ml/min/1.73m2 [60, 60.4, . . . , 242.6]
HistoryCHD patient history of coronary heart disease boolean [0, 1]
HistoryDiabetes patient history of diabetes boolean [0, 1]
HistoryDLD patient history of dyslipidemia boolean [0, 1]
HistoryHTN patient history of hypertension boolean [0, 1]
HistoryObesity patient history of obesity boolean [0, 1]
HistorySmoking patient history of smoking boolean [0, 1]
HistoryVascular patient history of vascular diseases boolean [0, 1]
HTNmeds if the patient has taken hypertension medications boolean [0, 1]
sBPBaseline systolic blood pressure mmHg [92, 95, . . . , 177, 180]
Sex if the patient is a woman (0) or a man (1) binary [0, 1]
time year integer [0, 1, ..., 9, 10]
year from follow-up start to severe CKD event or last visit
[target] EventCKD35 if the patient had moderate–extreme CKD boolean [0, 1]

TABLE 1: Meaning, measurement unit, and possible values of each feature of the dataset. ACEI: Angiotensin-converting
enzyme inhibitors. ARB: Angiotensin II receptor blockers. mmHg: millimetre of mercury. kg: kilogram. mmol: millimoles.

223 training set, and evaluated on the validation set, should be are two options to investigate this property. The first one is to 257

224 selected. Since the data in Lrl are independent from the learn a M such that its functional form is, by construction, 258

225 ones in Vvr , the idea is that H∗ should be the set of hyper- interpretable [43] (for example, Decision Trees and Rule 259

226 parameters which allows to achieve a small error on a data based models); this solution, however, usually results in poor 260

227 set that is independent from the training set. generalization performances. The second one, used when the 261

228 Finally, we need to estimate the error (EE) of the optimal functional form of M is not interpretable by construction [43] 262

229 model with a separate set of data Tm = {(xt1 , y1t ), · · · , (for example, Kernel Methods or Neural Network), is to 263

230
t
(xtm , ym )} since the error that our model commits over Dn derive its interpretability a posteriori. A classical method 264

231 would be optimistically biased since Dn has been used to find for reaching this goal is to perform a feature ranking pro- 265

232 M. cedure [33], [44] which gives an hint to the users of M about 266

233 Additionally, another aspect to consider in this analysis the most important features which influence its results. 267

234 is that data available in health informatics are often unbal-


235 anced [37]–[39], and most learning algorithms do not work A. BINARY CLASSIFICATION ALGORITHMS 268

236 well with imbalanced datasets and tend to poorly perform In this paper, for the A , we will exploit different state-of-the- 269

237 on the minority class. For these reasons, several techniques art models. In particular we will exploit Random Forests [45], 270

238 have been developed in order to address this issue [40]. Support Vector Machines (linear and kernelized with the 271

239 Currently the most practical and effective method involves Gaussian Kernel) [46], [47], Neural Network [48], Decision 272

240 the resampling of the data in order to synthesize a balanced Tree [49], XGBoost [50], and One Rule [51]. 273

241 dataset [40]. For this purpose, we can under-sample or over- We tried a number of different hyper-parameter configu- 274

242 sample the dataset. Under-sampling balances the dataset by rations for the machine learning methods employed in this 275

243 reducing the size of the abundant class. By keeping all sam- study. 276

244 ples in the rare class and randomly selecting an equal number For Random Forests, we set the number of trees to 1000 277
245 of samples in the abundant class, a new balanced dataset and we searched number of variables randomly sampled as 278
246 can be retrieved for further modeling. Note that this method candidates at each split in {1, 2, 4, 8, 16}, the minimum size 279
247 wastes a lot of information (many samples might be dis- of samples in the terminal nodes of the trees in {1, 2, 4, 8}, 280
248 carded). For this reason, scientists take advantage of the over- the percentage samples (sampled with bootstrap) during 281
249 sampling strategy more often. Over-sample tries to balance the creation of each tree in {60, 80, 100, 120} [?], [52]– 282
250 the dataset by increasing the size of rare samples. Rather than [54]. For the linear and kernelized Support Vector Ma- 283
251 removing abundant samples, new rare samples are generated chines [46], we searched the regularization hyper-parameters 284
252 (for example by repetition, by bootstrapping, or by synthetic in {10−6.0,−5.8,··· ,4 } and, for the kernelized Support Vec- 285
253 minority). The latter method is the one that we employed in tor Machines, we used the Gaussian Kernel [47] and we 286
254 this study: synthetic minority oversampling [41], [42]. searched the kernel hyper-parameters in {10−6.0,−5.8,··· ,4 }. 287

255 Another important property of M is its interpretability, For the Neural Network we used a single hidden layer net- 288

256 namely the possibility to understand how it behaves. There work (hyperbolic tangent as activation function in the hidden 289

4 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

feature value # %
to make a split in {0, 0.001, 0.005, 0.01}, the fraction 305
ACEIARB 0 272 55.397
ACEIARB 1 219 44.603 of samples in {1, 0.9, 0.7} and features {1, 0.5, 0.2, 0.1} 306

DLDmeds 0 220 44.807 used train the trees and the maxim number of leaves in 307

DLDmeds 1 271 55.193 {1, 2, 4, 8, 16}, and the regularization hyper-parameters in 308
DMmeds 0 330 67.210 {10−6.0,−5.8,··· ,4 } [50]. For One Rule we did not have to tune 309
DMmeds 1 161 32.790 hyper-paramters (OneR in the caret [55] R package). 310
HistoryCHD 0 446 90.835
HistoryCHD 1 45 9.165 Note that these methods have shown to be a set of of the 311

HistoryDiabetes 0 276 56.212 simplest yet best performing methods available in scientific 312

HistoryDiabetes 1 215 43.788 literature [56], [57]. The difference between the methods is 313
HistoryDLD 0 174 35.438 just the functional form of the model which tries to better 314
HistoryDLD 1 317 64.562
HistoryHTN 0 156 31.772
approximate a learning principle. 315

HistoryHTN 1 335 68.228 For example, Random Forests and XGBoost try to imple- 316

HistoryObesity 0 243 49.491 ment the wisdom of the crowd principles, Support Vector Ma- 317
HistoryObesity 1 248 50.509 chines are robust maximum margin classifiers, and Decision 318
HistorySmoking 0 416 84.725 Tree and One Rule represent very easy to interpret models. 319
HistorySmoking 1 75 15.275
HistoryVascular 0 462 94.094 In this paper we tested multiple algorithms since the no-free- 320

HistoryVascular 1 29 5.906 lunch theorem [58] assures us that, for a specific application, 321

HTNmeds 0 188 38.289 it is not possible to know, a-priori, what algorithm will better 322
HTNmeds 1 303 61.711 perform on a specific task. Then we tested the ones which, 323
Sex 0 241 49.084 in the past, have shown to perform well on many tasks and 324
Sex 1 250 50.916
[target] EventCKD35 0 435 88.595 identified the best one for our application. 325

[target] EventCKD35 1 56 11.405 326

total 491 100


TABLE 2: Binary features quantitative characteristics. B. FEATURE RANKING 327

All the binary features have meaning true for the value 1 and Feature rankings methods based on Random Forests are 328

false for the value 0, except sex (0 = female and 1 = male). among the most effective techniques [59], [60], particularly 329

The dataset contains medical records of 491 patients. in the context of bioinformatics [61], [62] and health infor- 330

matics [63]. Since Random Forests obtained the top predic- 331

feature median mean range σ tion scores for binary classification, we focus on this method 332

AgeBaseline 54 53.204 [23, 89] 13.821 for feature ranking. 333


BMIBaseline 30 30.183 [13, 57] 6.237 Several measures are available for feature importance in 334
CholesterolBaseline 5 4.979 [2.23, 9.3] 1.097
CreatinineBaseline 66 67.857 [6, 123] 17.919
Random Forests. A powerful approach is the one based on 335

dBPBaseline 77 76.872 [41, 112] 10.711 the Permutation Importance or Mean Decrease in Accuracy 336

eGFRBaseline 98.1 98.116 [60, 242.6] 18.503 (MDA), where the importance is assessed for each feature 337

sBPBaseline 131 131.375 [92, 180] 15.693 by removing the association between that feature and the 338
time year 8 7.371 [0, 10] 2.175 target. This effect is achieved by randomly permuting [64] 339

TABLE 3: Numeric feature quantitative characteristics. the values of the feature and measuring the resulting increase 340

σ: standard deviation. in error. The influence of the correlated features is also 341

removed. 342

In details, for every tree, the method computes two quanti- 343

290 layer) with dropout (mlpKerasDropout in the caret [55] ties: the first one is the error on the out-of-bag samples as they 344

291 R package), we train it with adaptive subgradient methods are used during prediction, while the second one is the error 345

292 (batch size equal to 32), and we tuned the following hyper- on the out-of-bag samples after a random permutation of the 346

293 parameters: the number of neurons in the hidden layer in values of a variable. These two values are then subtracted and 347

294 {10, 20, 40, 80, 160, 320, 640, 1280}, the dropout rate of the the average of the result over all the trees in the ensemble is 348

295 hidden layer in {0.001, 0.002, 0.004, 0.008}, the learning the raw importance score for the variable under exam. 349

296 rate in {0.001, 0.002, 0.005, 0.01, 0.02.0.05}, the fraction Despite the effectiveness of MDA, when the number of 350

297 of gradient to keep at each step in {0.01, 0.05, 0.1, 0.5}, samples is small these methods might result being unsta- 351

298 and the learning rate decay in {0.01, 0.05, 0.1, 0.5}. For ble [65]–[67]. For this reason, in this work, instead of running 352

299 Decision Tree we we searched the max depth of the trees the Feature Ranking (FR) procedure just once, analogously 353

300 in {4, 8, 16, 24, 32} (rpart2 in the caret [55] R pack- to what we have done for MS and EE, we sub-sample the 354

301 age). For XGBoost we set tree gradient boosting and original dataset and we repeat the procedure many times. The 355

302 we searched the Booster Parameters in {0.001, 0.002, final rank of a feature will be the aggregation of the different 356

303 0.004, 0.008, 0.01, 0.02, 0.04, 0.08} the number of ranking using the Borda’s method [68]. 357

304 trees in {100, 500, 1000}, the minimum loss reduction


VOLUME X, 2021 5

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

358 C. BIOSTATISTICS UNIVARIATE TESTS teristic AUC (Table 4), while the support vector machine with 410

359 Before employing machine learning algorithms, we applied Gaussian kernel achieved the top specificity and precision. 411

360 traditional univariate biostatistics techniques to evaluate the Because of the imbalance of the dataset (section II), all 412

361 relationship between the EventCKD35 target and each fea- the classifiers attained better results among the negative data 413

362 ture. instances (specificity and NPV) than among the positive ele- 414

363 We made use of the Mann–Whitney U test (also known ments (sensitivity and precision). This consequence happens 415

364 as Wilcoxon rank–sum test) [69] for the numerical features because each classifier can observe and learn to recognize 416

365 and of the chi–square test [70] for the binary features. more individuals without CKD during training, and therefore 417

366 The p-values of both these tests range between 0 and 1: a are more capable of recognizing them than recognizing pa- 418

367 low p-value of this test means that the analyzed variable tients with CKD during testing. 419

368 strongly relates to the target feature, while a high p-value XGBoost and One Rule obtained Matthews correlation 420

369 means the no evident relation. These tests are also useful coefficients close to 0, meaning that their performance was 421

370 to detect the importance of each feature with respect to the similar to random guessing. Random Forests, linear SVM, 422

371 target: the lower the p-value of a feature, the stronger its and Decision Tree were the only methods able to correctly 423

372 association with the target. Following the recent advice of classify most of the true positives (TP rate = 0.792, 0.6, and 424

373 Benjamin and colleagues [71], we use 0.005 as threshold of 0.588, respectively). No technique was capable of correctly 425

374 significance for the p-values, that is 5×10−3 . If the p-value of making most of the positive predictions: all PPVs are below 426

375 a test applied to a variable and the target results being lower 0.5 Table 4. 427

376 than 0.005, we consider significant the association between Regarding positives, SVM with Gaussian kernel obtained 428

377 the variable and the target. an almost perfect specificity (0.940), while Random Forests 429

achieved an almost perfect NPV of 0.968 Table 4. 430

378 D. PREDICTION AND FEATURE RANKING INCLUDING These results show that the machine learning classifiers 431

379 TEMPORAL FEATURE Random Forests and SVM with Gaussian kernel can effi- 432

ciently predict patients with CKD and patients without CKD 433
380 In the second analysis we performed for chronic kidney
from their electronic health records, with high prediction 434
381 disease prediction, we decided to include the temporal com-
scores, in few minutes. 435
382 ponent expressing in which year the disease occurred for the
Since Random Forests resulted being the best performing 436
383 CKD patients or which year they had their last outpatient
classifier, we also included the calibration curve plot [78] of 437
384 visit (Supplementary information).
its predictions (Figure 2), for the sake of completeness. The 438
385 We applied a Stratified Logistic Regression [72], [73]
curve follows the trend of the x = y perfect line translated 439
386 to this complete dataset, including all the original clinical
on the x axis between approximately 5% and approximately 440
387 features and the derived year feature, both for supervised
65%, indicating well calibrated predictions in this interval. 441
388 binary classification and feature ranking. We measured the
CKD prediction excluding temporal component. To 442
389 prediction with the typical confusion matrix rates (MCC, F1
show a scenario where no previous disease history of a pa- 443
390 score, and others), and the importance for each variable as
tient is available, we did not include any temporal component 444
391 the logistic regression model coefficient. This method has
providing information about the progress of the disease in the 445
392 no significant hyper-parameters so we did not perform any
previous analysis. We then decided to performed a stratified 446
393 optimization (glm method of the stats R package).
prediction including a time feature indicating the year when 447

the patient developed the chronic kidney disease, or the 448


394 IV. RESULTS last visit for non-CKD patients (Supplementary information). 449
395 In this section, we report the results for the prediction of After having included the year information in the dataset, 450
396 the chronic kidney disease (subsection IV-A) and its feature we applied a Stratified Logistic Regression [73], [79], as 451
397 ranking (subsection IV-B). described earlier (section III). 452

The presence of the temporal feature actually improved 453


398 A. CHRONIC KIDNEY DISEASE PREDICTION RESULTS the prediction, allowing the regression to obtain a MCC of 454

399 CKD prediction. We report the results obtained for the static +0.469, better than all the MCC’s achieved by the classi- 455

400 prediction of the CKD measured with traditional confusion fiers applied to the static dataset version except Random 456

401 matrix indicators in Table 4. We rank our results by the Forests (Table 5). Also in this case, sensitivity and precision 457

402 Matthews correlation coefficient (MCC) because it is the result being much higher than sensitivity and NPV, because 458

403 only confusion matrix rate that generates a high score if of the imbalance of the dataset. 459

404 the classifier was able to correctly predict most of the data This result comes with no surprise: it makes complete 460

405 instances and correctly make most of the predictions, both sense that the inclusion of a temporal feature describing the 461

406 on the positive class and the negative class [74]–[77]. trend of a disease could improve the prediction quality. 462

407 Random Forests outperformed all the other methods for To better understand the prediction obtained by the Strati- 463

408 MCC, F1 score, accuracy, sensitivity, negative predictive fied Logistic Regression, we plotted a calibration curve [78] 464

409 value, precision recall AUC, and receiver operating charac- of its predictions (Figure 3). As one can notice, the Stratified 465

6 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

method MCC F1 score accuracy TP rate TN rate


Random Forests *+0.501 ± 0.035 *0.550 ± 0.034 *0.843 ± 0.012 *0.793 ± 0.038 0.852 ± 0.012
Gaussian SVM +0.319 ± 0.065 0.387 ± 0.063 0.873 ± 0.018 0.353 ± 0.082 *0.940 ± 0.009
Neural Network +0.302 ± 0.075 0.353 ± 0.065 0.840 ± 0.028 0.436 ± 0.173 0.882 ± 0.048
Linear SVM +0.266 ± 0.113 0.340 ± 0.077 0.767 ± 0.032 0.600 ± 0.193 0.785 ± 0.032
Decision Tree +0.253 ± 0.085 0.345 ± 0.078 0.747 ± 0.036 0.588 ± 0.079 0.767 ± 0.036
XGBoost +0.160 ± 0.033 0.286 ± 0.033 0.767 ± 0.035 0.368 ± 0.056 0.824 ± 0.044
One Rule +0.145 ± 0.063 0.267 ± 0.044 0.707 ± 0.034 0.471 ± 0.094 0.737 ± 0.035
PPV NPV PR AUC ROC AUC
Random Forests 0.422 ± 0.034 *0.971 ± 0.006 *0.475 ± 0.046 *0.885 ± 0.015
Gaussian SVM *0.429 ± 0.068 0.919 ± 0.021 0.275 ± 0.032 0.646 ± 0.040
Neural Network 0.313 ± 0.046 0.912 ± 0.052 0.257 ± 0.032 0.669 ± 0.066
Linear SVM 0.237 ± 0.051 0.946 ± 0.023 0.280 ± 0.018 0.693 ± 0.095
Decision Tree 0.244 ± 0.067 0.936 ± 0.015 0.274 ± 0.013 0.678 ± 0.050
XGBoost 0.233 ± 0.036 0.900 ± 0.012 0.267 ± 0.017 0.596 ± 0.019
One Rule 0.186 ± 0.035 0.916 ± 0.021 0.179 ± 0.017 0.604 ± 0.051
TABLE 4: CKD development binary classification results. Linear SVM: Support Vector Machine with linear kernel.
Gaussian SVM: Support Vector Machine with Gaussian kernel. MCC: Matthews correlation coefficient (worst value = −1
and best value = +1). TP rate: true positive rate, sensitivity, recall. TN rate: true negative rate, specificity. PR: precision-recall
curve. PPV: positive predictive value, precision. NPV: negative predictive value. ROC: receiver operating characteristic curve.
AUC: area under the curve. F1 score, accuracy, TP rate, TN rate, PPV, NPV, PR AUC, ROC AUC: worst value = 0 and best
value = +1. Confusion matrix threshold for TP rate, TN rate, PPV, and NPV: 0.5. We highlighted in blue and with an asterisk
* the top results for each score. We report the formulas of these rates in the Supplementary Information.

method MCC F1 score accuracy TP rate TN rate


Stratified Logistic Regression +0.469 ± 0.141 0.507 ± 0.130 0.903 ± 0.031 0.604 ± 0.177 0.933 ± 0.025
PPV NPV PR AUC ROC AUC
Stratified Logistic Regression 0.458 ± 0.141 0.960 ± 0.024 0.345 ± 0.122 0.768 ± 0.089
TABLE 5: CKD prediction results including the temporal feature. The dataset analyzed for these tests contains the time
year feature indicating in which year after the baseline visits the patient developed the CKD. All the abbreviations have the
same meaning described in the caption of Table 4.

466 Logistic Regression returns well calibrated predictions, as it information does not help us to detect the relevance of the 487

467 trends follows the x = y line which represents the perfect features with enough precision. For this reason, we decided to 488

468 calibration from approximately 5% to approximately 75% calculate the feature ranking with machine learning, by em- 489

469 of the probabilities. This calibration curve confirms that the ploying Random Forests, which is the method that achieved 490

470 Stratified Logistic Regression made a good prediction. the top performance results in the binary classification ear- 491

lier (subsection IV-A). 492

471 B. FEATURE RANKING RESULTS We therefore applied the Random Forests feature ranking, 493

472 CKD predictive feature ranking. After verifying that com- and ranked the results by mean accuracy decrease posi- 494

473 putational intelligence is able to predict CKD developments tion (Table 7 and Figure 4). 495

474 among patients, we applied a feature ranking approach to The two rankings show some common aspects, both listing 496

475 detect the most predictive features in the clinical records. We AgeBaseline and eGFRBaseline in top positions, but show 497

476 employed two techniques: one based on traditional univariate also some significant differences. The biostatistics standing, 498

477 biostatistics tests, and one based on machine learning. for example, lists dBPBaseline as unrelevant predictive fea- 499

478 Regarding the biostatistics phase, applied the Mann– ture (Table 6), while Random Forests puts it on the 4th 500

479 Whitney test and of chi-squared test to each variable in position out of 19 (Table 7). Also, the biostatistics tests stated 501

480 relationship with the CKD target (subsection III-C), and that HistoryDiabetes is one of the most significant factors, 502

481 ranked the features by p-value (Table 6). with p-value of 0.0005 (Table 6), while the machine learning 503

482 The application of these biostatistics univariate tests, al- approach put the same feature on the last position of its 504

483 though useful, show a huge number of relevant variables: ranking. 505

484 13 variable of out 19 result being significant, having a p- The two rankings contain other minor differences that we 506

485 value smaller than 0.005 (Table 6). Since the biostatistics consider unimportant. 507

486 tests affirm that 68.42% of clinical factors are important, this CKD predictive feature ranking considering the tem- 508

VOLUME X, 2021 7

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

Mann-Whitney U
position feature test p-value
1 *AgeBaseline 0
2 *CreatinineBaseline 0
3 *eGFRBaseline 0
4 *CholesterolBaseline 9.490 × 10−04
5 *sBPBaseline 4.379 × 10−03
6 dBPBaseline 1.083 × 10−01
7 BMIBaseline 9.134 × 10−01
chi–squared
position feature p-value
1 *HistoryDiabetes 5 × 10−04
2 *HistoryCHD 5 × 10−04
3 *HistoryHTN 5 × 10−04
4 *DLDmeds 5 × 10−04
5 *DMmeds 5 × 10−04
6 *ACEIARB 5 × 10−04
7 *HistoryDLD 1.999 × 10−03
8 *HTNmeds 1.999 × 10−03
9 HistoryVascular 3.698 × 10−02
10 Sex 4.398 × 10−02
11 HistorySmoking 5.397 × 10−02
12 HistoryObesity 4.948 × 10−01
FIGURE 2: Calibration curve and plots for the results ob- TABLE 6: Feature ranking through biostatistics univari-
tained by Random Forests predictions applied on the dataset ate tests. We employed the Mann–Whitney U test [69] for
excluding the temporal component (Table 4). the numerical features and and the chi–square test [70] for the
binary features. We reported in blue and with an asterisk * the
only feature having a p-value lower than the 0.005 threshold,
that is 5 × 10−03 .

MDA average
position position feature
1 1.2 AgeBaseline
2 1.8 eGFRBaseline
3 3.3 DMmeds
4 3.7 dBPBaseline
5 5.2 CholesterolBaseline
6 6.0 HistoryVascular
7 7.0 HistoryCHD
8 8.3 sBPBaseline
9 8.7 CreatinineBaseline
10 11.4 HistoryHTN
11 11.6 HistorySmoking
12 11.9 DLDmeds
13 12.1 Sex
14 13.4 HTNmeds
15 14.6 HistoryObesity
16 15.9 HistoryDLD
17 17.4 ACEIARB
18 17.7 BMIBaseline
FIGURE 3: Calibration plot for the Stratified Logistic Re- 19 18.8 HistoryDiabetes
gression predictions applied on the dataset including the
TABLE 7: Feature ranking generated by Random Forests.
temporal component (Table 5).
MDA average position: average position obtained by each
feature through the accuracy decrease feature ranking of
509 poral component. As we did early for the CKD prediction, Random Forests.
510 we decided to re-run the feature ranking procedure by in-
511 cluding the temporal component regarding the year when the
512 patient developed chronic kidney disease or the year of the previously described ranking generated without it (Table 7). 516

513 last visit. Again, we employed Stratified Logistic Regression. The most relevant differences in ranking positions are the 517

514 The ranking generated considering the time compo- following: 518

st
515 nent (Table 8) showed several differences with respect to the • HTNmeds is at the 1 position in this ranking, while it 519

8 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

Feature ranking on MDA


position importance clinical feature
1 2.56 HTNmeds
AgeBaseline
2 2.43 dBPBaseline
eGFRBaseline
3 2.32 HistoryHTN
DMmeds 4 1.85 ACEIARB
dBPBaseline 5 1.68 HistorySmoking
CholesterolBaseline 6 1.52 HistoryDiabetes
HistoryVascular
7 1.42 sBPBaseline
8 1.31 BMIBaseline
HistoryCHD
9 0.88 CholesterolBaseline
sBPBaseline
10 0.87 HistoryCHD
CreatnineBaseline
11 0.80 eGFRBaseline
features

HistoryHTN 12 0.45 DMmeds


HistorySmoking 13 0.35 Sex
DLDmeds 14 0.19 HistoryVascular
Sex
15 0.18 HistoryObesity
16 0.16 HistoryDLD
HTNmeds
17 0.14 DLDmeds
HistoryObesity
18 0.01 CreatinineBaseline
HistoryDLD 19 0.00 AgeBaseline
ACEIARB

BMIBaseline
TABLE 8: Clinical feature ranking generated by the
HistoryDiabetes
Stratified Logistic Regression, depending on the the tem-
0 5 10 15 20 poral component (the year when the CKD happened or
MDA
of patient’s last visit). Importance: average coefficient of the
FIGURE 4: Barplot of the Random Forests feature rank- trained logistic regression model out of 100 executions.
ing. MDA average position: average position obtained by
eaach feature through the accuracy decrease feature ranking
of Random Forests. kidney disease, in a few minutes, and then use this informa- 547

tion to establish the urgency of the case. Our techniques, of 548

course, do not replace laboratory exams and tests, that will 549

still be needed to further verify and understand the prognosis 550


520 is 14th without considering time; of the disease. However, if used efficiently, our methods will 551
521 • HistoryHTN is at the 3rd position in this ranking, while provide quick, reliable, fast information to physicians to help 552
522 it is 10th without considering time; them with medical decision making. 553
523 • ACEIARB is at the 4th position in this ranking, while it Feature ranking. As mentioned earlier (subsection IV-B), 554
524 is 17th without considering time; some significant differences emerge between the feature 555
525 • AgeBaseline is at the last position in this ranking, while ranking obtained without the time component and generated 556
526 it is 1st without considering time; through Random Forests (Table 7) and the feature ranking 557
527 • CreatinineBaseline is at the 18th position in this rank- obtained considering the year when the patient had the 558
528 ing, while it is 9th without considering time. serious CKD development and generated through Stratified 559

529 We also decided to measure the difference between these Logistic Regression (Table 8). 560

530 two rankings through two traditional metrics such as Spear- The features HTNmeds, ACEIARB, and HistoryDiabetes 561

531 man’s rank correlation coefficient and Kendall distance [80]– had an increase of 13 positions in the year standing (Table 8), 562

532 [82]. Both these metrics range between –1.0 and +1, with compared to their original position in the static ranking (Ta- 563

533 –1 meaning opposite rank orders, 0.0 meaning no correlation ble 7). Also, the feature BMIBaseline had an increase, of 10 564

534 between lists, and +1.0 meaning identical ranking. positions. The AgeBaseline variable, instead, had the biggest 565

535 The comparison between ranking without time (Table 7) position drop possible: it moved from the most important 566

536 and ranking considering time (Table 8) generated Spearman’s feature in the static standing (Table 7) to the less relevant 567

537 ρ = −0.209 and Kendall τ = −0.146. position in the year standing (Table 8). The other variables in 568

the year standing did not show so high position changes. 569

538 V. DISCUSSION These results show that taking medication for hyperten- 570

539 CKD prediction. Our results show that machine learning sion, taking ACE inhibitors, having a personal history of 571

540 methods are capable of predicting chronic kidney disease diabetes, and body–mass index have an important role in 572

541 from medical records of patients at risk of cardiovascular predicting if a patient will have serious CKD, when the 573

542 disease, both including the temporal information about the information about the disease event is included. The age of 574

543 year when the patient has developed the CKD and without the patient is very important when the CKD year is unknown, 575

544 it. These findings can have an immediate impact in the but becomes irrelevant here. 576

545 clinical settings: physicians, in fact, can take advantage of our Difference between temporal feature ranking and non- 577

546 methods to forecast the likelihood of a patient having chronic temporal feature ranking. The significant differences that 578

VOLUME X, 2021 9

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

579 emerge suggest strong overlap between the information con- Hypertension resulted being the 4th most important factor 635

580 tained within the time variable with certain variables in the in Salekin’s study [6], confirming the importance of the 636

581 previous model. It is plausible that some predictors encode a HistoryHTN variable which is ranked at the 3rd position in 637

582 ‘baseline’ level of risk of developing CKD, which is negated our Stratified Logistic Regression ranking (Table 8). Also 638

583 if the model knows in which year the CKD developed. diabetes history has high ranking in both the standings: 3rd 639

584 The variables which reduce most significantly between the position in the ranking of Salekin’s study [6], and 6th of 640

585 models are age, eGFR and creatinine, which are all clinical importance in our Stratified Logistic Regression ranking, as 641

586 indicators of an individual’s baseline risk of CKD. Inspection HistoryDiabetes (Table 8). 642

587 of variables which maintain or increase their position when


588 the year feature is added identifies hypertension, smoking VI. CONCLUSIONS 643
589 and diabetes as key predictive factors in the model (sub- Chronic kidney disease affects more than 700 millions people 644
590 section IV-B). These are all known to play a central role in the world annually, and kills approximately 1.2 million 645
591 in the pathogenesis of micro- and macrovascular disease, of them. Computational intelligence can be an effective 646
592 including of the kidney. While the former variables may means to quickly analyze electronic health records of patients 647
593 encode baseline risk, the latter are stronger indicators for rate affected by this disease, providing information about how 648
594 of progression. likely they will develop severe stages of this disease, or 649
595 It is also worth noting that without the temporal informa- stating which clinical variables are the most important for 650
596 tion, the model is tasked with predicting whether the indi- diagnosis. 651
597 vidual will develop CKD within the next 10 years. Here, the
In this article, we analyzed a medical record dataset of 491 652
598 baseline is highly relevant as it indicates how much further
patients from UAE with CKD and at risk of cardiovascular 653
599 the renal function needs to deteriorate. However, when the
disease, and developed machine learning methods able to 654
600 configuration is altered to include the year in which year the
predict the likelihood they will develop CKD at stages 3-5, 655
601 CKD developed, the relative importance of risk factors may
with high accuracy. Afterwards, we employed machine learn- 656
602 be expected to increase – and indeed, we observed this in our
ing to detect the most important variables contained in the 657
603 models.
dataset, first excluding the temporal component indicating 658
604 Comparison with results of the original study. The
the year when the CKD happened or the patient’s last visit, 659
605 original study of Al-Shamsi and colleagues [28] included
and then including it. Our results confirmed the effectiveness 660
606 a feature ranking phase generated through a multivariable
of our approach. 661
607 Cox’s proportional hazards analysis, which included the tem-
Regarding limitations, we have to report that we per- 662
608 poral component [83]. Their ranking listed older age (Age-
formed our analysis only on a single dataset. We looked 663
609 Baseline), personal history of coronary heart disease (Histo-
for alternative public datasets to use as validation cohorts, 664
610 ryCHD), personal history of diabetes mellitus (HistoryDLD),
but unfortunately we could not find any that have the same 665
611 and personal history of smoking (HistorySmoking) as most
clinical features. 666
612 important factors for risk of CKD serious event.
613 In contrast to their findings, AgeBaseline was ranked in the In the future, we plan to further investigate the probabil- 667

614 last position in our Stratified Logistic Regression standing, ity of diagnosis prediction in this dataset through classifier 668

615 while HistoryCHD and HistoryDLD were at unimportant po- calibration and calibration plots [84], and to perform the 669

616 sitions: 10th and 16th ranks out of 19 variables, respectively. feature ranking with a different feature ranking method such 670

617 Smoking history, instead, occupied a high rank both in our as SHapley Additive exPlanations (SHAP) [85]. Moreover, 671

618 standing and in the original study standing: our approach, in we also plan to study chronic kidney disease by applying our 672

619 fact, listed it as 5th out of 19. methods to CKD datasets of other types, such as microarray 673

620 Comparison with results of other studies. Several pub- gene expression [86], [87] and ultrasonography images [88]. 674

621 lished studies include a feature ranking phase to detect the


622 most relevant variables to predict chronic kidney disease LIST OF ABBREVIATIONS 675

623 from electronic medical records. Most of them, however, use AUC: area under the curve. BP: blood pressure. CHD: coro- 676

624 feature ranking to reduce the number of variables for the nary hearth disease. CKD: chronic kidney disease. CVD: 677

625 binary classification, without reporting a final standing of cardiovascular disease. DLD: dyslipidemia. EE: error estima- 678

626 clinical factors ranked by importance [10], [12], [21]. tion. FR: feature ranking. KDIGO: Kidney Disease Improv- 679

627 Only the article of Salekin and colleagues [6] reports the ing Global Outcomes. HTN: hypertension. MCC: Matthews 680

628 most relevant variables found in their study: specific gravity, correlation coefficient. MDA: Model Decrease in Accu- 681

629 albumin, diabetes, hypertension, hemoglobin, serum creati- racy. MS: model selection. NPV: negative predictive value. 682

630 nine, red blood cells count, packed cell volume, appetite, and p-value: probability value. PPV: positive predictive value. 683

631 sodium resulted being at top positions. Even if the clinical PR: precision–recall. ROC: receiver operating characteristic. 684

632 features present in our datasets mainly differ from theirs, we SHAP: SHapley Additive exPlanations. SVM: Support Vec- 685

633 can notice the difference in the ranking positions between the tor Machine. TN rate: true negative rate. TP rate: true positive 686

634 two studies. rate. UAE: United Arab Emirates. 687

10 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

688 COMPETING INTERESTS [12] Sirage Zeynu and Shruthi Patil. Survey on prediction of chronic kidney 750
disease using data mining classification techniques and feature selection. 751
689 The authors declare they have no competing interest. International Journal of Pure and Applied Mathematics, 118(8):149–156, 752
2018. 753

690 ACKNOWLEDGMENTS [13] Abdullah Al Imran, MD Nur Amin, and Fatema Tuj Johora. Classification 754
of chronic kidney disease using logistic regression, feedforward neural 755
691 The authors thank Saif Al-Shamsi (United Arab Emirates network and wide & deep learning. In Proceedings of ICIET 2018 756

692 University) for having provided additional information about – the 2018 International Conference on Innovation in Engineering and 757
Technology, pages 1–6, Osaka, Japan, 2018. IEEE. 758
693 the dataset. [14] AK Shrivas, Sanat Kumar Sahu, and HS Hota. Classification of chronic 759
kidney disease with proposed union based feature selection technique. 760
In Proceedings of ICIoTCT 2018 – the 3rd International Conference on 761
694 DATA AND SOFTWARE AVAILABILITY Internet of Things and Connected Technologies, pages 26–27, Jaipur, 762
695 The dataset used in this study is publicly available India, 2018. 763

696 under the Creative Commons Attribution 4.0 Interna- [15] S Belina VJ Sara and K Kalaiselvi. Ensemble swarm behaviour based 764
feature selection and support vector machine classifier for chronic kidney 765
697 tional (CC BY 4.0) license at: https://figshare.com/articles/ disease prediction. International Journal of Engineering & Technology, 766
698 dataset/Chronic_kidney_disease_in_patients_at_high_risk_ 7(2.31):190–195, 2018. 767

699 of_cardiovascular_disease_in_the_United_Arab_Emirates_ [16] Naveed Rahman Shawan, Syed Samiul Alam Mehrab, Fardeen Ahmed, 768
and Mohammad Sharatul Hasmi. Chronic kidney disease detection using 769
700 A_population-based_study/6711155?file=12242270 ensemble classifiers and feature set reduction. PhD thesis, BRAC Univer- 770
701 sity, 2019. 771

702 Our software code is publicly available under GNU Gen- [17] Suresh Babu Satukumati and Raghu Kogila Shivaprasad Satla. Feature 772
extraction techniques for chronic kidney disease identification. Kidney, 773
703 eral Public License v3.0 at: https://github.com/davidechicco/ 15:29, 2019. 774
704 chronic_kidney_disease_and_cardiovascular_disease [18] Tahmid Abrar, Samiha Tasnim, and MD Hossain. Early detection of 775
chronic kidney disease using machine learning. PhD thesis, BRAC 776
University, 2019. 777
705 REFERENCES [19] Mohamed Elhoseny, K Shankar, and J Uthayakumar. Intelligent diagnostic 778
prediction and classification system for chronic kidney disease. Scientific 779
706 [1] Valerie A Luyckx, Marcello Tonelli, and John W Stanifer. The global
Reports, 9(1):1–14, 2019. 780
707 burden of kidney disease and the sustainable development goals. Bulletin
708 of the World Health Organization, 96(6):414, 2018. [20] Stefan Ravizza, Tony Huschto, Anja Adamov, Lars Böhm, Alexander 781
Büsser, Frederik F Flöther, Rolf Hinzmann, Helena König, Scott M 782
709 [2] Sarmad Said and German T Hernandez. The link between chronic kidney
McAhren, Daniel H Robertson, Titus Schleyer, Bernd Schneidinger, and 783
710 disease and cardiovascular disease. Journal of Nephropathology, 3(3):99,
Wolfgang Petrich. Predicting the early risk of chronic kidney disease in 784
711 2014.
patients with diabetes using real–world data. Nature Medicine, 25(1):57– 785
712 [3] Kevin Damman, Mattia A Valente, Adriaan A Voors, Christopher M
59, 2019. 786
713 O’Connor, Dirk J van Veldhuisen, and Hans L Hillege. Renal impairment,
[21] Syed Imran Ali, Gwang Hoon Park, and Sungyoung Lee. Cost–sensitive 787
714 worsening renal function, and outcome in patients with heart failure: an
ensemble feature ranking and automatic threshold selection for chronic 788
715 updated meta-analysis. European Heart Journal, 35(7):455–469, 2014.
kidney disease diagnosis. Preprints, 2020050458, 2020. 789
716 [4] Anusorn Charleonnan, Thipwan Fufaung, Tippawan Niyomwong, Wandee [22] Pankaj Chittora, Sandeep Chaurasia, Prasun Chakrabarti, Gaurav Ku- 790
717 Chokchueypattanakit, Sathit Suwannawach, and Nitat Ninchawee. Pre- mawat, Tulika Chakrabarti, Zbigniew Leonowicz, Michał Jasiński, Łukasz 791
718 dictive analytics for chronic kidney disease using machine learning tech- Jasiński, Radomir Gono, Elżbieta Jasińska, and Vadim Bolshev. Prediction 792
719 niques. In Proceedings of 2016 MITicon – the 2016 Management and of chronic kidney disease-a machine learning perspective. IEEE Access, 793
720 Innovation Technology International Conference, pages 80–83, Bang- 9:17312–17334, 2021. 794
721 Saen, Chonburi, Thailand, 2016. IEEE. [23] Piervincenzo Ventrella, Giovanni Delgrossi, Gianmichele Ferrario, Marco 795
722 [5] Nusrat Tazin, Shahed Anzarus Sabab, and Muhammed Tawfiq Chowdhury. Righetti, and Marco Masseroli. Supervised machine learning for the 796
723 Diagnosis of chronic kidney disease using effective classification and assessment of chronic kidney disease advancement. Computer Methods 797
724 feature selection technique. In Proceedings of MediTec 2016 – the 2016 and Programs in Biomedicine, 209:106329, 2021. 798
725 International Conference on Medical Engineering, Health Informatics and [24] Md Rashed-Al-Mahfuz, Abedul Haque, Akm Azad, Salem A Alyami, Ju- 799
726 Technology, pages 1–6, Dhaka, Bangladesh, 2016. IEEE. lian MW Quinn, and Mohammad Ali Moni. Clinically applicable machine 800
727 [6] Asif Salekin and John Stankovic. Detection of chronic kidney disease and learning approaches to identify attributes of chronic kidney disease (CKD) 801
728 selecting important predictive attributes. In Proceedings of IEEE ICHI for use in low-cost diagnostic screening. IEEE Journal of Translational 802
729 2016 – the 2016 IEEE International Conference on Healthcare Informatics, Engineering in Health and Medicine, 9:1–11, 2021. 803
730 pages 262–270, Chicago, Illinois, USA, 2016. IEEE. [25] Surya Krishnamurthy, Kapeleshh Ks, Erik Dovgan, Mitja Luštrek, Bar- 804
731 [7] Huseyin Polat, Homay Danaei Mehr, and Aydin Cetin. Diagnosis of bara Gradišek Piletič, Kathiravan Srinivasan, Yu-Chuan Jack Li, Anton 805
732 chronic kidney disease based on support vector machine by feature se- Gradišek, and Shabbir Syed-Abdul. Machine learning prediction models 806
733 lection methods. Journal of Medical Systems, 41(4):55, 2017. for chronic kidney disease using national health insurance claim data in 807
734 [8] Made Satria Wibawa, I Made Dendi Maysanjaya, and I Made Agus Wira- Taiwan. In Healthcare, volume 9, page 546. Multidisciplinary Digital 808
735 hadi Putra. Boosted classifier and features selection for enhancing chronic Publishing Institute, 2021. 809
736 kidney disease diagnose. In Proceedings of CITSM 2017 – the 5th [26] Monika Gupta and Parul Gupta. Predicting chronic kidney disease using 810
737 International Conference on Cyber and IT Service Management, pages 1– machine learning. Emerging Technologies for Healthcare: Internet of 811
738 6, Denpasar, Bali, Indonesia, 2017. IEEE. Things and Deep Learning Models, pages 251–277, 2021. 812
739 [9] Abdulhamit Subasi, Emina Alickovic, and Jasmin Kevric. Diagnosis [27] University of California Irvine Machine Learning Repository. Chronic 813
740 of chronic kidney disease by using random forest. In Proceedings of kidney disease data set. https://archive.ics.uci.edu/ml/datasets/chronic_ 814
741 CMBEBIH 2017 – the 2017 International Conference on Medical and kidney_disease URL visited on 4th October, 2021. 815
742 Biological Engineering, pages 589–594. Springer, 2017. [28] Saif Al-Shamsi, Dybesh Regmi, and Romona D Govender. Chronic kidney 816
743 [10] Sirage Zeynu and Shruti Patil. Prediction of chronic kidney disease using disease in patients at high risk of cardiovascular disease in the United Arab 817
744 data mining feature selection and ensemble method. International Journal Emirates: a population-based study. PLoS One, 13(6), 2018. 818
745 of Data Mining in Genomics & Proteomics, 9(1):1–9, 2018. [29] Gary S Francis. ACE inhibition in cardiovascular disease. New England 819
746 [11] Adeola Ogunleye and Qing-Guo Wang. Enhanced XGBoost–based au- Journal of Medicine, 342:201–202, 2000. 820
747 tomatic diagnosis system for chronic kidney disease. In Proceedings of [30] Jun Agata, Daigo Nagahara, Shuichi Kinoshita, Yoshitoki Takagawa, Nori- 821
748 IEEE ICCA 2018 – the 14th IEEE International Conference on Control hito Moniwa, Daisuke Yoshida, Nobuyuki Ura, and Kazuaki Shimamoto. 822
749 and Automation, pages 805–810, Anchorage, Alaska, USA, 2018. IEEE. Angiotensin II receptor blocker prevents increased arterial stiffness in 823

VOLUME X, 2021 11

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

824 patients with essential hypertension. Circulation Journal, 68(12):1194– [57] Michael Wainberg, Babak Alipanahi, and Brendan J Frey. Are random 896
825 1198, 2004. forests truly the best classifiers? Journal of Machine Learning Research, 897
826 [31] Kidney Disease: Improving Global Outcomes (KDIGO) Transplant Work 17(1):3837–3841, 2016. 898
827 Group. KDIGO clinical practice guideline for the care of kidney transplant [58] David H Wolpert. The lack of a priori distinctions between learning 899
828 recipients. American Journal of Transplantation, 9:S1, 2009. algorithms. Neural Computation, 8(7):1341–1390, 1996. 900
829 [32] Shai Shalev-Shwartz and Shai Ben-David. Understanding machine learn- [59] Yvan Saeys, Thomas Abeel, and Yves Van de Peer. Robust feature 901
830 ing: from theory to algorithms. Cambridge University Press, 2014. selection using ensemble feature selection techniques. In Proceedings of 902

831 [33] André Altmann, Laura Toloşi, Oliver Sander, and Thomas Lengauer. ECML PKDD 2008 – the 2008 Joint European Conference on Machine 903

832 Permutation importance: a corrected feature importance measure. Bioin- Learning and Knowledge Discovery in Databases, Antwerp, Belgium, 904

833 formatics, 26(10):1340–1347, 2010. 2008. 905

834 [34] Melissa A Hardy. Regression with Dummy Variables. Sage, 1993. [60] Robin Genuer, Jean-Michel Poggi, and Christine Tuleau-Malot. Variable 906

835 [35] A Rogier Donders, Geert J van der Heijden, Theo Stijnen, and Karel G selection using random forests. Pattern Recognition Letters, 31(14):2225– 907

836 Moons. Review: a gentle introduction to imputation of missing values. 2236, 2010. 908

837 Journal of Clinical Epidemiology, 59(10):1087–1091, 2006. [61] Yanjun Qi. Random forest for bioinformatics. In Ensemble Machine 909

838 [36] Luca Oneto. Model Selection and Error Estimation in a Nutshell. Springer, Learning, Boston, Massachusetts, USA, 2012. 910

839 Berlin, Germany, 2020. [62] Ramón Díaz-Uriarte and Sara Alvarez De Andres. Gene selection and clas- 911

840 [37] Kathleen F Kerr. Comments on the analysis of unbalanced microarray sification of microarray data using random forest. BMC Bioinformatics, 912

841 data. Bioinformatics, 25(16):2035–2041, 2009. 7(1):3, 2006. 913

842 [38] Rosalía Laza, Reyes Pavón, Miguel Reboiro-Jato, and Florentino Fdez- [63] Davide Chicco and Cristina Rovelli. Computational prediction of diagno- 914

843 Riverola. Evaluating the effect of unbalanced data in biomedical document sis and feature selection on mesothelioma patient health records. PLoS 915

844 classification. Journal of Integrative Bioinformatics, 8(3):105–117, 2011. One, 14(1):e0208737, 2019. 916

845 [39] Kyunghee Han, Kyee Zu Kim, Jung Mi Oh, In Wha Kim, Kyungim Kim, [64] Phillip Good. Permutation tests: a practical guide to resampling methods 917

846 and Taesung Park. Unbalanced sample size effect on the genome-wide for testing hypotheses. Springer Science & Business Media, 2013. 918

847 population differentiation studies. International Journal of Data Mining [65] M Luz Calle and Victor Urrea. Letter to the editor: stability of random 919

848 and Bioinformatics, 6(5):490–504, 2012. forest importance measures. Briefings in Bioinformatics, 12(1):86–89, 920

849 [40] Guo Haixiang, Li Yijing, Jennifer Shang, Gu Mingyun, Huang Yuanyue, 2010. 921

850 and Gong Bing. Learning from class-imbalanced data: review of methods [66] Miron Bartosz Kursa. Robustness of random forest-based gene selection 922

851 and applications. Expert Systems with Applications, 73:220–239, 2017. methods. BMC Bioinformatics, 15(1):8, 2014. 923

852 [41] Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip [67] Huazhen Wang, Fan Yang, and Zhiyuan Luo. An experimental study of the 924

853 Kegelmeyer. Smote: synthetic minority over-sampling technique. Journal intrinsic stability of random forest variable importance measures. BMC 925

854 of artificial intelligence research, 16:321–357, 2002. Bioinformatics, 17(1):60, 2016. 926

855 [42] Tuanfei Zhu, Yaping Lin, and Yonghe Liu. Synthetic minority oversam- [68] D Sculley. Rank aggregation for similar items. In Proceedings of the 2007 927

856 pling technique for multiclass imbalance problems. Pattern Recognition, SIAM International Conference on Data Mining, Minneapolis, Minnesota, 928

857 72:327–340, 2017. 2007. 929

858 [43] Christoph Molnar. Interpretable machine learning. https://christophm. [69] Thomas W MacFarland and Jan M Yates. Mann–Whitney U test. In 930

859 github.io/book/, 2018. Introduction to Nonparametric Statistics for the Biological Sciences using 931

860 [44] Isabelle Guyon and André Elisseeff. An introduction to variable and R, pages 103–132. Springer, Berlin, Germany, 2016. 932

861 feature selection. Journal of Machine Learning Research, 3(Mar):1157– [70] Priscilla E Greenwood and Michael S Nikulin. A guide to chi–squared 933

862 1182, 2003. testing, volume 280. John Wiley & Sons, Hoboken, New Jersey, USA, 934
1996. 935
863 [45] Leo Breiman. Random forests. Machine Learning, 45(1):5–32, 2001.
[71] Daniel J Benjamin, James O Berger, Magnus Johannesson, Brian A Nosek, 936
864 [46] John Shawe-Taylor and Nello Cristianini. Kernel Methods for Pattern
E-J Wagenmakers, Richard Berk, Kenneth A Bollen, Björn Brembs, 937
865 Analysis. Cambridge University Press, Cambridge, England, United
Lawrence Brown, Colin Camerer, David Cesarini, Christopher D Cham- 938
866 Kingdom, 2004.
bers, Merlise Clyde, Thomas D Cook, Paul De Boeck, Zoltan Dienes, 939
867 [47] S Sathiya Keerthi and Chih-Jen Lin. Asymptotic behaviors of support
Anna Dreber, Kenny Easwaran, Charles Efferson, Ernst Fehr, Fiona Fidler, 940
868 vector machines with gaussian kernel. Neural Computation, 15(7):1667–
Andy P Field, Malcolm Forster, Edward I George, Richard Gonzalez, 941
869 1689, 2003.
Steven Goodman, Edwin Green, Donald P Green, Anthony G Greenwald, 942
870 [48] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep Learning. Jarrod D Hadfield, Larry V Hedges, Leonhard Held, Teck Hua Ho, Herbert 943
871 MIT Press, Cambrdige, Massachusetts, USA, 2016. Hoijtink, Daniel J Hruschka, Kosuke Imai, Guido Imbens, John P A 944
872 [49] Mohammed J Zaki and Wagner Meira Jr. Data Mining and Machine Ioannidis, Minjeong Jeon, James Holland Jones, Michael Kirchler, David 945
873 Learning: Fundamental Concepts and Algorithms. Cambridge University Laibson, John List, Roderick Little, Arthur Lupia, Edouard Machery, 946
874 Press, Cambridge, England, United Kingdom, 2019. Scott E Maxwell, Michael McCarthy, Don A Moore, Stephen L Morgan, 947
875 [50] Tianqi Chen and Carlos Guestrin. XgBoost: a scalable tree boosting sys- Marcus Munafó, Shinichi Nakagawa, Brendan Nyhan, Timothy H Parker, 948
876 tem. In Proceedings of KDD ’16 – the 22nd ACM SigKDD International Luis Pericchi, Marco Perugini, Jeff Rouder, Judith Rousseau, Victoria 949
877 Conference on Knowledge Discovery and Data Mining, San Francisco, Savalei, Felix D Schönbrodt, Thomas Sellke, Betsy Sinclair, Dustin Tin- 950
878 California, USA, 2016. gley, Trisha Van Zandt, Simine Vazire, Duncan J Watts, Christopher 951
879 [51] Robert C Holte. Very simple classification rules perform well on most Winship, Robert L Wolpert, Yu Xie, Cristobal Young, Jonathan Zinman, 952
880 commonly used datasets. Machine Learning, 11(1):63–90, 1993. and Valen E Johnson. Redefine statistical significance. Nature Human 953
881 [52] Ilenia Orlandi, Luca Oneto, and Davide Anguita. Random forests model Behaviour, 2(1):6–10, 2018. 954
882 selection. In European Symposium on Artificial Neural Networks, Com- [72] Cyrus R Mehta and Nitin R Patel. Exact logistic regression: theory and 955
883 putational Intelligence and Machine Learning, pages ES2016–48, Bruges, examples. Statistics in Medicine, 14(19):2143–2160, 1995. 956
884 Belgium, 2016. [73] Davide Chicco and Giuseppe Jurman. Machine learning can predict 957
885 [53] Frank Hutter, Holger Hoos, and Kevin Leyton-Brown. An efficient survival of patients with heart failure from serum creatinine and ejection 958
886 approach for assessing hyperparameter importance. In International Con- fraction alone. BMC Medical Informatics and Decision Making, 20(1):16, 959
887 ference on Machine Learning, pages 754–762, Beijing, China, 2014. 2020. 960
888 [54] Simon Bernard, Laurent Heutte, and Sébastien Adam. Influence of [74] Davide Chicco. Ten quick tips for machine learning in computational 961
889 hyperparameters on random forest accuracy. In International Workshop biology. BioData Mining, 10(35):1–17, 2017. 962
890 on Multiple Classifier Systems, pages 171–180, Reykjavik, Iceland, 2009. [75] Davide Chicco, Matthijs J Warrens, and Giuseppe Jurman. The Matthews 963
891 [55] Max Kuhn. Building predictive models in R using the caret package. correlation coefficient (MCC) is more informative than Cohen’s Kappa and 964
892 Journal of Statistical Software, 28(1):1–26, 2008. Brier score in binary classification assessment. IEEE Access, 9:78368– 965
893 [56] M Fernández-Delgado, E Cernadas, S Barro, and D Amorim. Do we need 78381, 2021. 966
894 hundreds of classifiers to solve real world classification problems? Journal [76] Davide Chicco, Niklas Tötsch, and Giuseppe Jurman. The Matthews 967
895 of Machine Learning Research, 15(1):3133–3181, 2014. correlation coefficient (MCC) is more reliable than balanced accuracy, 968

12 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

969 bookmaker informedness, and markedness in two-class confusion matrix CHRISTOPHER A. LOVEJOY (ORCID: 0000- 1036
970 evaluation. BioData Mining, 14, 2021. 0003-0919-1264) is a medical doctor with inter- 1037
971 [77] Davide Chicco, Valery Starovoitov, and Giuseppe Jurman. The benefits ests in applied machine learning and bioinformat- 1038
972 of the Matthews correlation coefficient (MCC) over the diagnostic odds ics. He undertook his undergraduate medical de- 1039
973 ratio (DOR) in binary classification assessment. IEEE Access, 9:47112– gree at the University of Cambridge (United King- 1040
974 47124, 2021. dom) before completing a post-graduate degree in 1041
975 [78] Peter C Austin and Ewout W Steyerberg. Graphical assessment of internal data science and machine learning at University 1042
976 and external calibration of logistic regression models by using loess
College London (United Kingdom). 1043
977 smoothers. Statistics in Medicine, 33(3):517–535, 2014.
1044
978 [79] Norman E Breslow, Lue P Zhao, Thomas R Fears, and Charles C
1045
979 Brown. Logistic regression for stratified case–control studies. Biometrics,
980 44(3):891–899, 1988.
981 [80] Jerrold H Zar. Spearman rank correlation. In Encyclopedia of Biostatistics,
982 volume 7, Hoboken, New Jersey, USA, 2005. Wiley Online Library.
983 [81] Franz J Brandenburg, Andreas Gleißner, and Andreas Hofmeier. Com-
984 paring and aggregating partial orders with Kendall tau distances. In
985 Proceedings of WALCOM 2012 – the 6th International Workshop on
986 Algorithms and Computation, pages 88–99, Dhaka, Bangladesh, 2012.
987 Springer.
988 [82] Davide Chicco, Eleonora Ciceri, and Marco Masseroli. Extended Spear-
989 man and Kendall coefficients for gene annotation list correlation. In Pro-
990 ceedings of CIBB 2014 – the 11th International Meeting on Computational
991 Intelligence Methods for Bioinformatics and Biostatistics, volume 8623 of
992 Lecture Notes in Computer Science, pages 19–32, Cambridge, England,
993 United Kingdom, 2015. Springer.
994 [83] David Clayton and Jack Cuzick. Multivariate generalizations of the
995 proportional hazards model. Journal of the Royal Statistical Society: Series
996 A (General), 148(2):82–108, 1985.
997 [84] Peter A Flach. Classifier calibration. In Encyclopedia of Machine Learning
998 and Data Mining. Springer, Berlin, Germany, 2016. LUCA ONETO (ORCID: 0000-0002-8445- 1046
999 [85] Scott M Lundberg and Su-In Lee. A unified approach to interpreting 395X) received his Bachelor of Science degree 1047
1000 model predictions. In Proceedings of NIPS 2017 – the 31st International and Master of Science degree in electronic engi- 1048
1001 Conference on Neural Information Processing Systems, pages 4768–4777, neering at Università di Genova, Italy respectively 1049
1002 2017. in 2008 and 2010. In 2014 he received his PhD 1050
1003 [86] Le-Ting Zhou, Shen Qiu, Lin-Li Lv, Zuo-Lin Li, Hong Liu, Ri-Ning Tang, from the same university in the School of Sciences 1051
1004 Kun-Ling Ma, and Bi-Cheng Liu. Integrative bioinformatics analysis and Technologies for Knowledge and Informa- 1052
1005 provides insight into the molecular mechanisms of chronic kidney disease. tion Retrieval with the thesis "Learning based on 1053
1006 Kidney and Blood Pressure Research, 43(2):568–581, 2018.
empirical data". In 2017 he obtained the Italian 1054
1007 [87] Zhi Zuo, Jian-Xiao Shen, Yan Pan, Juan Pu, Yong-Gang Li, Xing-hua
National Scientific Qualification for the role of 1055
1008 Shao, and Wan-Peng Wang. Weighted gene correlation network analy-
1009 sis (WGCNA) detected loss of MAGI2 promotes chronic kidney disease associate professor in computer engineering and in 2018 he obtained the 1056

1010 (CKD) by podocyte damage. Cellular Physiology and Biochemistry, one in Computer Science. He worked as Assistant Professor in Computer 1057

1011 51(1):244–261, 2018. Engineering at Università di Genovaa from 2016 to 2019. In 2018 he was 1058

1012 [88] Chih-Yin Ho, Tun-Wen Pai, Yuan-Chi Peng, Chien-Hung Lee, Yung- co-funder of the ZenaByte s.r.l. spin-off company. In 2019 he obtained 1059

1013 Chih Chen, Yang-Ting Chen, and Kuo-Su Chen. Ultrasonography image the Italian National Scientific Qualification for the role of full professor in 1060

1014 analysis for detection and classification of chronic kidney disease. In Pro- computer science and computer engineering. In 2019 he became associate 1061
1015 ceedings of CISIS 2012 – the 6th International Conference on Complex, professor in computer science at Università di Pisa and currently is associate 1062
1016 Intelligent, and Software Intensive Systems, pages 624–629, Palermo, professor in computer engineering at Università di Genova. He has been 1063
1017 Italy, 2012. IEEE. involved in several Horizon 2020 projects (S2RJU, ICT, DS) and he has 1064
been awarded with the Amazon AWS Machine Learning and Somalvico 1065
(best Italian young AI researcher) Awards. His first main topic of research is 1066
the statistical learning theory with particular focus on the theoretical aspects 1067
of the problems of (semi) supervised model selection and error estimation. 1068
His second main topic of research is data science with particular reference 1069
1018 DAVIDE CHICCO (ORCID: 0000-0001-9655- to the problem of trustworthy AI and the solution of real world problems by 1070
1019 7142) obtained his Bachelor of Science and Mas- exploiting and improving the most recent learning algorithms and theoretical 1071
1020 ter of Science degrees in computer science at Uni- results in the fields of machine learning and data mining. 1072
1021 versità di Genova (Genoa, Italy) respectively in 1073
1022 2007 and 2010. He then started the PhD program
1074
1023 in computer engineering at Politecnico di Milano
1024 university (Milan, Italy), where he graduated in
1025 spring 2014. He also spent a semester as visiting
1026 doctoral scholar at University of California Irvine
1027 (USA). From September 2014 to September 2018,
1028 he has been a post-doctoral researcher at the Princess Margaret Cancer Cen-
1029 tre and guest at University of Toronto. From September 2018 to December
1030 2019, he was a scientific associate researcher at the Peter Munk Cardiac
1031 Centre (Toronto, Ontario, Canada). From January 2020 to January 2021, he
1032 has been a scientific associate researcher at the Krembil Research Institute
1033 (Toronto, Ontario, Canada). Since January 2021, he started to work as a
1034 scientific research associate at the Institute of Health Policy Management
1035 and Evaluation of University of Toronto.

VOLUME X, 2021 13

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2021.3133700, IEEE Access

D.Chicco et al.: Chronic kidney disease and machine learning feature ranking

1075 SUPPLEMENTARY INFORMATION (worst value = 0; best value = 1) 1099

1076 DATA ENGINEERING


1077 We derived the time year feature from the TimeToEvent- (
1078 Months variable present in the original dataset. We associated f alse positive rate on the x axis
ROC curve =
1079 to the time year 1 all the patients who had TimeToEvent- true positive rate on the y axis
1080 Months between 0 and 12 months, time year 2 all the patients (9)
1081 who had TimeToEventMonths between 13 and 24 months,
1082 time year 3 all the patients who had TimeToEventMonths (worst value = 0; best value = 1) 1100

1083 between 25 and 36 months, and so on. If a patient has


1084 time year = x (where x ∈ N), it means that the CKD
1085 development occurred in the xth year of the follow-up (since
1086 2008) for each patient who developed stages 3-5 CKD, or
1087 that the subject last visit happened in the xth year, for healthy
1088 controls.

1089 BINARY STATISTICAL RATES


1090 List of statistical rates to evaluate confusion matrices and
1091 their formulas:

TP · TN − FP · FN
MCC = p
(T P + F P ) · (T P + F N ) · (T N + F P ) · (T N + F N )
(1)
1092 (worst value = −1; best value = +1)

2 · TP
F1 score = (2)
2 · TP + FP + FN
1093 (worst value = 0; best value = 1)

TP + TN
accuracy = (3)
TP + FN + TN + FP
1094 (worst value = 0; best value = 1)

TP
true positive rate, recall, sensitivity = (4)
TP + FN
1095 (worst value = 0; best value = 1)

TN
true negative rate, specificity = (5)
TN + FP
1096 (worst value = 0; best value = 1)

TP
positive predictive value, precision = (6)
TP + FP
1097 (worst value = 0; best value = 1)

TN
negative predictive value = (7)
TN + FN
1098 (worst value = 0; best value = 1)

(
true positive rate on the x axis
Precision-Recall (PR) curve =
precision on the y axis
(8)
14 VOLUME X, 2021

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like