0% found this document useful (0 votes)
5 views7 pages

ML Assignment

This paper conducts a comparative analysis of ten classical machine learning algorithms for multi-class classification of cerebrovascular lesions using a dataset of approximately 10,000 CT images. The study demonstrates that the Support Vector Machine (SVM) achieved the highest accuracy of 97.79%, showcasing the effectiveness of radiomics-driven feature extraction techniques combined with classical ML algorithms. The findings highlight the potential of these methods as interpretable and efficient alternatives to deep learning in medical imaging diagnostics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views7 pages

ML Assignment

This paper conducts a comparative analysis of ten classical machine learning algorithms for multi-class classification of cerebrovascular lesions using a dataset of approximately 10,000 CT images. The study demonstrates that the Support Vector Machine (SVM) achieved the highest accuracy of 97.79%, showcasing the effectiveness of radiomics-driven feature extraction techniques combined with classical ML algorithms. The findings highlight the potential of these methods as interpretable and efficient alternatives to deep learning in medical imaging diagnostics.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Compara've Analysis of Machine Learning

Algorithms for Mul'-Class Classifica'on of


Cerebrovascular CT Images
Kaustubh Agrawal Aditya Shandilya Aditya Kumar Singh
School of Computer Engineering School of Computer Engineering School of Computer Engineering
Manipal Ins7tute of Technology Manipal Ins7tute of Technology Manipal Ins7tute of Technology
Bengaluru Bengaluru Bengaluru
Manipal Academy of Higher Manipal Academy of Higher Manipal Academy of Higher
Educa6on, Manipal, India Educa6on, Manipal, India Educa6on, Manipal, India

Rohit Nandagopal Yash Inani


School of Computer Engineering School of Computer Engineering
Manipal Ins7tute of Technology Bengaluru Manipal Ins7tute of Technology Bengaluru
Manipal Academy of Higher Educa6on, Manipal, Manipal Academy of Higher Educa6on, Manipal,
India India

Abstract—This paper presents a comprehensive compara?ve consuming, and dependent on radiologist exper;se, leading to
analysis of ten classical machine learning algorithms for the mul?- variability in clinical decisionmaking. These challenges have
class classifica?on of cerebrovascular lesions using noncontrast
computed tomography (CT) images. The study employs a recently mo;vated the development
published 2025 dataset, comprising approximately 10,000 CT slices
from 733 pa?ents labeled across ten lesion categories. The proposed Code and notebook available at: h7ps://[Link]/file/d/15VJ 53
framework leverages radiomics-based feature extrac?on techniques DFHzSBM95UGkaVY06Gs3IPA8/view?usp=sharing
including Histogram of Oriented Gradients (HOG), Local Binary of automated diagnos;c methods that can assist clinicians by
PaWerns (LBP), Gray Level Co-occurrence Matrix (GLCM) with Haralick providing objec;ve and reproducible predic;ons.
texture sta?s?cs, and Gabor filters. Extracted features are
standardized, reduced using Principal Component Analysis (PCA), and
Machine learning (ML) has emerged as a powerful tool in
evaluated using 5×3 Repeated Stra?fied K-Fold cross-valida?on. Ten medical imaging, enabling systems to iden;fy complex data
algorithms—Logis?c Regression, K-Nearest Neighbors (KNN), Support paLerns and perform classifica;on tasks based on extracted
Vector Machine (SVM), Naive Bayes, Bayesian Network, features. In cerebrovascular imaging, ML-based systems can
Decision Tree, Random Forest, AdaBoost, XGBoost, and Mul?Layer
assist in differen;a;ng between ischemic and hemorrhagic
Perceptron (MLP)—are benchmarked for performance,
interpretability, and efficiency. SVM achieved the highest accuracy lesions, quan;fying lesion severity, and predic;ng disease
(97.79%) and macro-F1 score (0.9783). The framework demonstrates outcomes. Although deep learning models such as
that radiomics-driven ML can yield high accuracy on medical imaging convolu;onal neural networks (CNNs) have shown strong
datasets while maintaining interpretability and CPU efficiency. The
performance in medical imaging, they are oHen data-intensive,
study provides a transparent and reproducible baseline suitable for
academic research and educa?onal use. computa;onally demanding, and less interpretable. In
Index Terms—Cerebrovascular CT, Radiomics, Explainable contrast, classical ML algorithms combined with radiomics
Machine Learning, Mul?-class Classifica?on, Classical feature extrac;on offer a transparent and resource-efficient
ML, Cross-valida?on approach, allowing for clinical interpretability without
requiring high-end hardware or large-scale datasets.
I. INTRODUCTION Radiomics is a methodology that transforms medical images
Cerebrovascular diseases such as ischemic stroke, into a high-dimensional feature space through the extrac;on
intracranial hemorrhage, and mixed vascular pathologies of quan;ta;ve descriptors. These features capture the shape,
represent a significant global health burden, accoun;ng for texture, and intensity varia;ons within lesions, providing
millions of deaths and disabili;es each year. Rapid and meaningful representa;ons for classifica;on tasks. Radiomic
accurate detec;on of these condi;ons is cri;cal for effec;ve techniques such as Histogram of Oriented Gradients (HOG),
medical interven;on. Non-contrast computed tomography Local Binary PaLerns (LBP), Gray Level Co-occurrence Matrix
(CT) is the most widely used imaging modality for (GLCM), Haralick features, and Gabor filters have been
cerebrovascular diagnosis due to its speed, accessibility, and successfully applied in cancer detec;on, lung disease
reliability in emergency seFngs. However, the manual characteriza;on, and brain lesion segmenta;on. Yet, their
interpreta;on of CT images is oHen subjec;ve, ;me- systema;c applica;on in cerebrovascular CT classifica;on—
par;cularly for mul;-class problems—remains limited. Most neurodegenera;ve and cerebrovascular diseases, emphasizing
studies focus on binary classifica;on (e.g., stroke vs. control), their diagnos;c poten;al but no;ng that most applica;ons
leaving a research gap in mul;-category diagnosis using remain binary classifica;on tasks or outcome predic;on rather
handcraHed features and classical algorithms. than mul;-class lesion iden;fica;on [5]. Zhang et al. (2024)
Addressing this gap, this study performs a comprehensive combined thrombus and perithrombus radiomic features with
comparison of ten classical ML algorithms—Logis;c deep-learning representa;ons to predict malignant cerebral
Regression, K-Nearest Neighbors (KNN), Support Vector edema aHer reperfusion therapy, evalua;ng eleven algorithms
Machine (SVM), Naive Bayes, Bayesian Network, Decision Tree, and achieving strong AUC performance [6]. Lyu et al. (2023)
Random Forest, AdaBoost, XGBoost, and Mul;-Layer developed a CT-based radiomics model to differen;ate
Perceptron (MLP)—for mul;-class cerebrovascular CT image between primary and secondary intracranial hemorrhage in
classifica;on. Instead of relying on deep neural networks, 238 pa;ents, extrac;ng over 1,700 features and using support
this framework leverages radiomics-derived features extracted vector machines (SVM) for classifica;on [7]. Their model
from 2D CT slices. Feature preprocessing includes outperformed radiologist assessment in hemorrhage
normaliza;on and dimensionality reduc;on through Principal discrimina;on, highligh;ng the value of radiomics in
Component Analysis (PCA), ensuring robust model training cerebrovascular diagnos;cs.
while minimizing overfiFng. To ensure fairness and Recent work by Sun et al. (2024) proposed a
reproducibility, model performance is evaluated using radiomicsclinical ML model for predic;ng fu;le recanaliza;on
Repeated Stra;fied KFold Cross-Valida;on (5 folds × 3 repeats) following endovascular treatment in anterior circula;on stroke
and assessed via mul;ple metrics including accuracy, macro- using non-contrast CT (NCCT) scans [8]. The study used 2,016
F1, balanced accuracy, and standard devia;on. radiomic features and logis;c regression to achieve AUC values
The outcomes demonstrate that classical ML algorithms, above 0.85, demonstra;ng radiomics’ predic;ve poten;al in
when applied to radiomics features, can achieve high clinical decision-making. Other studies, including those by Yao
diagnos;c performance while remaining interpretable and et al. (2024) and Wang et al. (2025), further established that
computa;onally lightweight. The Support Vector Machine handcraHed radiomic features can complement or outperform
achieved the highest performance (97.79% accuracy), followed deep learning models when datasets are small or lack pixel-
closely by KNN and Random Forest. The findings highlight the level annota;ons.
effec;veness of feature-based learning approaches for CT Despite these advances, two limita;ons persist in current
image analysis, sugges;ng their poten;al as sustainable and research: (i) most radiomics-based ML models in
reproducible alterna;ves to deep learning in data-limited or cerebrovascular imaging address binary or prognos;c tasks,
resource-constrained environments. and (ii) few studies systema;cally benchmark mul;ple classical
ML algorithms on the same dataset using handcraHed
II. RELATED WORK features. The present work addresses both gaps by conduc;ng
a ten-class classifica;on of cerebrovascular CT images using
Feature engineering and radiomics have been fundamental mul;ple feature families (HOG, LBP, GLCM, Haralick, and
in advancing machine learning (ML) applica;ons for medical Gabor) and ten ML algorithms, evaluated under a standardized
image analysis. Early founda;onal works established the crossvalida;on protocol. This study therefore bridges classical
theore;cal basis for texture and gradient-based radiomics with comprehensive model benchmarking to
representa;ons. Haralick et al. [1] introduced texture feature establish a reproducible and interpretable baseline for future
descriptors derived from the Gray Level Co-occurrence Matrix deep learning and explainable AI research.
(GLCM), defining sta;s;cal measures such as contrast, energy,
and entropy that remain widely used in medical imaging. Ojala III. DATASET DESCRIPTION
et al. [2] developed Local Binary PaLerns (LBP) as an efficient
method for encoding local texture micro-paLerns. Dalal and The dataset used in this study is the Cerebrovascular Lesions
Triggs [3] proposed the Histogram of Oriented Gradients (HOG) CT Image Dataset, published by Macin et al. (2025) and hosted
descriptor, which captures shape and boundary details through on Kaggle under the ;tle “Deep learning-based classifica;on of
gradient orienta;on histograms. Daugman [4] formalized cerebrovascular lesions on CT images using a novel SwinNeXt
Gabor filters for spa;al-frequency decomposi;on, enabling architecture.” It can be accessed at:
robust texture and edge analysis in two dimensions. These hLps://[Link]/datasets/buraktaci/
classical feature extrac;on techniques collec;vely underpin cerebrovascular-lesions/data?select=cerebrovascular+lesions.
the radiomics framework adopted in the present study. This dataset was compiled retrospec;vely from pa;ents
In neuroimaging, radiomics has become an essen;al tool for admiLed to the Department of Neurology at Malatya Turgut
quan;ta;ve feature extrac;on from CT and MRI scans. Shi et Ozal University Medical Faculty between 2018 and 2022.¨ CT
al. (2024) reviewed ML-based radiomics approaches for images exhibi;ng mo;on ar;facts were excluded. Each
pa;ent’s scans were reviewed and verified independently by IV. METHODOLOGY
radiologists and neurologists to ensure diagnos;c accuracy. The proposed workflow consists of a fully reproducible and
The study cohort includes 733 individuals (405 male and 328 explainable pipeline that transforms raw CT slices into
female), and the data are categorized into ten clinically dis;nct radiomics-based feature vectors, followed by the training and
classes represen;ng different cerebrovascular lesion types as comparison of ten classical machine learning algorithms. The
well as healthy controls. en;re pipeline is designed for CPU-only execu;on in Google
The dataset contains approximately 10,000 non-contrast Colab or Jupyter environments without reliance on GPU
axial CT slices, each labeled according to one of ten diagnos;c accelera;on. The process includes (1) data preprocessing, (2)
categories. The number of images per class ranges from 1,008 radiomics-based feature extrac;on, (3) feature scaling and
to 1,191, providing a rela;vely balanced mul;-class dimensionality reduc;on, (4) model training and
distribu;on. Detailed demographic and class-wise sta;s;cs are hyperparameter setup, (5) repeated stra;fied cross-valida;on,
provided in Table I. and (6) evalua;on using mul;ple metrics. Each stage is
The dataset has a reported usability ra;ng of 5.00 on Kaggle described in detail below.
and is publicly available for academic and research purposes.
The authors note that pa;ent iden;fiers were removed to A. Preprocessing
protect privacy, preven;ng pa;ent-wise stra;fica;on in All CT images from the ten classes were downloaded from
downstream ML experiments. Consequently, this study adopts the Kaggle repository and organized into class-specific
image-level repeated stra;fied cross-valida;on rather
TABLE I
DETAILS OF THE CEREBROVASCULAR LESIONS CT IMAGE DATASET (MACIN et al., 2025).

Class Male (n) Female (n) Total Age (Mean ± SD) Images (n)
Acute ischemic infarc<on 45 38 83 70.78 ± 15.37 1119
Epidural hemorrhage 48 21 69 55.94 ± 17.56 1011
Chronic ischemic infarc<on 41 33 74 62.25 ± 12.32 1008
Subdural effusion 36 30 66 71.30 ± 15.88 1080
Parenchymal hemorrhage 54 42 96 65.25 ± 19.64 1077
Subarachnoid hemorrhage 32 45 77 52.12 ± 18.47 1166
Subdural hemorrhage 42 38 80 48.44 ± 17.54 1177
Ventricular hemorrhage 40 32 72 62.78 ± 18.88 1020
Mixed cerebrovascular disease 42 27 69 68.75 ± 14.63 1191
Healthy Control 25 22 47 38.24 ± 16.24 1016
than pa;ent-wise spliFng, with explicit acknowledgment of directories. Since the dataset comprises axial CT slices stored
this limita;on in later sec;ons. Key characteris;cs: as PNG files, no DICOM parsing was required. Each image was
• 733 unique pa;ents (405 male / 328 female) and 10 first loaded using the OpenCV and NumPy libraries for array-
diagnos;c categories. based manipula;on.
• Approximately 10,000 axial CT slices (1.35 GB dataset size). To ensure uniformity across the dataset, all images were
• Balanced class distribu;on ( 1,000 images per class).
converted to grayscale (if not already), as color informa;on is
• Images are non-contrast axial brain CT scans in PNG format.
not relevant for CT-based intensity analysis. Each image was
• Public academic license; no iden;fiable pa;ent metadata
then resized to a fixed spa;al resolu;on of 128×128 pixels to
released. standardize feature extrac;on dimensions. Pixel intensi;es
were normalized to the range [0,1] using min–max
This dataset provides an ideal benchmark for evalua;ng
normaliza;on to prevent scale bias during feature
classical ML and radiomics-based methods for cerebrovascular
computa;on.
lesion classifica;on. It is recent (published 2025), high-quality,
Op;onal preprocessing techniques such as Gaussian
clinically verified, and sufficiently large for sta;s;cally reliable
blurring and histogram equaliza;on were tested for denoising
cross-valida;on experiments, while remaining manageable for
and contrast enhancement but were not included in the final
CPU-based computa;on and educa;onal use.
model training to maintain reproducibility. Images were
verified visually for consistency aHer resizing and
normaliza;on. The final preprocessed dataset was stored as (10,000 images × 8,000 features). This feature set was used for
NumPy arrays for efficient itera;on during feature extrac;on. all subsequent model training and analysis.

B. Radiomics Feature Extrac@on C. Feature Scaling and Dimensionality Reduc@on


Radiomics involves the extrac;on of handcraHed
Feature scaling is a cri;cal preprocessing step in ML to
quan;ta;ve features from medical images that capture
ensure numerical stability and fair model comparison. Two
intensity, texture, and frequency paLerns. In this study, five
standard scaling approaches were evaluated: MinMaxScaler
major families of radiomic descriptors were computed from
and StandardScaler. The StandardScaler was selected for the
each CT slice, represen;ng complementary image
final experiments as it produced slightly more stable results
characteris;cs. The extracted features were concatenated into
across folds.
a single highdimensional feature vector per image, typically
Given the high dimensionality of the feature space ( 8,000
containing around 8,000 values.
features), dimensionality reduc;on was applied to prevent
(1) Histogram of Oriented Gradients (HOG): The HOG
overfiFng and reduce computa;onal load. Two approaches
descriptor captures the distribu;on of gradient orienta;ons
were evaluated:
within localized cells, effec;vely encoding the edge and
contour structure of lesions. Parameters used include 8×8 cell • Principal Component Analysis (PCA): PCA was performed to
size, 2×2 block size, and 9 orienta;on bins. The resul;ng retain 95% of the variance while projec;ng the data into a
feature vector describes the direc;onal gradient intensity lower-dimensional space. The number of resul;ng
across the CT slice. components ranged between 200–400, depending on the
(2) Local Binary PaLerns (LBP): LBP encodes local micro- dataset subset.
textures by thresholding the neighborhood of each pixel • Tree-based Feature Importance: A Random Forest was
rela;ve to its center pixel. The resul;ng binary paLern is trained to es;mate the rela;ve importance of each feature,
converted to a decimal number, represen;ng localized texture and the top 500 ranked features were retained for model
structures such as smoothness or roughness. Uniform LBP with training. This provided a complementary method
a radius of 1 and 8 neighbors was applied, producing rota;on- emphasizing features most discrimina;ve of class labels.
invariant texture encoding. The final experiments u;lized PCA-transformed features due
(3) Gray Level Co-occurrence Matrix (GLCM): GLCM to their superior computa;onal efficiency. All scaled and
represents second-order texture informa;on by measuring reduced features were saved as CSV files for traceability and
how oHen pairs of pixel intensi;es occur in a given spa;al reproducibility.
rela;onship. Four direc;onal matrices (0°, 45°, 90°, 135°) with D. Machine Learning Models
a distance of 1 pixel were computed for each slice. From each Ten classical machine learning algorithms were
GLCM, standard sta;s;cal features such as contrast, implemented to ensure comprehensive evalua;on across
correla;on, energy, homogeneity, and dissimilarity were different model families. All models were developed using
extracted. the scikit-learn library, with xgboost and bnlearn used for the
(4) Haralick Features: Derived from GLCM, Haralick XGBoost and Bayesian Network implementa;ons,
features quan;fy complex texture proper;es such as entropy, respec;vely.
variance, and autocorrela;on. Fourteen standard Haralick 1) Logis;c Regression (LR): Linear baseline classifier using
descriptors were computed using the mahotas and skimage L2 regulariza;on; serves as a reference for linear
libraries, yielding rota;onally invariant texture signatures that separability.
differen;ate lesion heterogeneity paLerns. 2) K-Nearest Neighbors (KNN): Instance-based classifier
(5) Gabor Filters: Gabor filters capture both spa;al and using Euclidean distance; tested for k =3,5,7.
frequency-domain informa;on, allowing sensi;vity to edges 3) Support Vector Machine (SVM): Non-linear classifier
and oriented textures at different scales. A Gabor kernel bank with RBF kernel; regulariza;on parameter C = 1.0, γ =′
of four scales and six orienta;ons was applied, and mean filter scale′.
responses were extracted as feature descriptors. These 4) Naive Bayes (NB): Gaussian variant assuming feature
features help dis;nguish lesions with periodic or direc;onal independence.
structures, such as subarachnoid hemorrhages. 5) Bayesian Network (BN): Probabilis;c model using
AHer feature computa;on, all descriptors were bnlearn to es;mate condi;onal dependencies among
concatenated into a unified Pandas DataFrame with feature features.
names represen;ng their origin (HOG, LBP, GLCM, etc.). The 6) Decision Tree (DT): Gini impurity-based splits; maximum
resul;ng feature matrix had dimensions of approximately depth limited to prevent overfiFng.
7) Random Forest (RF): Ensemble of 100 decision trees; mean and standard devia;on of Accuracy, Macro-F1, and
bootstrap sampling and averaging used to reduce Balanced Accuracy were computed for all models to facilitate
variance. a fair comparison. The summarized quan;ta;ve results are
8) AdaBoost (AB): Adap;ve boos;ng of weak learners; shown in Table II.
decision stumps used as base es;mators.
TABLE II
9) XGBoost (XGB): Gradient boos;ng with regulariza;on;
CROSS-VALIDATED PERFORMANCE OF MACHINE LEARNING MODELS ON 10-CLASS
parameters tuned for 100 es;mators, max depth 5, CEREBROVASCULAR CT DATASET.
learning rate 0.1. Model Acc Macro F1 Bal Acc CV Std
10) Mul;-Layer Perceptron (MLP): Shallow neural network
SVM 0.9779 0.9783 0.9786 0.0037
with one hidden layer (128 neurons), ReLU ac;va;on,
KNN 0.9646 0.9651 0.9654 0.0047
Adam op;mizer, and early stopping.
Random Forest 0.9503 0.9507 0.9512 0.0052
Each model was trained independently on the same scaled MLP 0.9466 0.9472 0.9474 0.0081
feature set for consistent comparison. No GPU accelera;on XGBoost 0.9356 0.9363 0.9371 0.0038
was required; total training ;me per model remained under Logiscc Regression 0.7685 0.7714 0.7710 0.0067
one minute on standard CPU hardware. Decision Tree 0.6438 0.6454 0.6449 0.0158
Naive Bayes 0.5890 0.5982 0.5927 0.0235
E. Cross-Valida@on and Reproducibility
AdaBoost 0.4303 0.4391 0.4321 0.0157
Due to the absence of pa;ent iden;fiers in the dataset, A. Overall Model Comparison
image-level valida;on was necessary. To minimize sampling
Among all tested algorithms, the Support Vector Machine
bias and ensure robustness, a Repeated Stra;fied K-Fold
(SVM) achieved the highest overall performance, with an
strategy was used with 5 folds and 3 repe;;ons (15 total
average accuracy of 97.79% and a macro-F1 score of 0.9783.
splits). Stra;fica;on preserved class propor;ons across folds.
The SVM also exhibited excellent stability across folds,
The random seed was fixed at 42 for all experiments to
reflected by a low cross-valida;on standard devia;on (0.0037).
maintain reproducibility.
The kernel-based structure of the SVM effec;vely separates
Each model was evaluated on all folds, and the mean and
complex radiomic feature spaces, capturing non-linear
standard devia;on of each metric (accuracy, macro-F1,
boundaries between lesion classes. Its consistent performance
balanced accuracy) were reported. All intermediate results
across metrics demonstrates its strong generaliza;on
were logged using Pandas DataFrames and exported as CSV
capability even in the absence of pa;ent-level iden;fiers.
summaries.
K-Nearest Neighbors (KNN) ranked second, with 96.46%
F. Evalua@on Metrics accuracy and a macro-F1 score of 0.9651. KNN’s distancebased
To evaluate model performance comprehensively, four key classifica;on works effec;vely for this dataset, likely because
metrics were computed for each classifier: the PCA-transformed features preserve class separability.
• Accuracy: Overall propor;on of correctly classified samples. However, KNN’s slightly higher standard devia;on (0.0047)
• Macro-F1 Score: Average of F1 scores computed per class; suggests minor sensi;vity to feature scaling and local noise, a
insensi;ve to class imbalance. known limita;on of instance-based learners.
• Balanced Accuracy: Average recall per class, ensuring equal Random Forest (RF) achieved the third-best performance,
weight for all classes. with 95.03% accuracy and 0.9507 macro-F1. Ensemble
• Standard Devia;on (CV Std): Indicates variability in model averaging across 100 decision trees reduces variance,
performance across folds. explaining its consistent results. Moreover, feature importance
Addi;onally, computa;onal efficiency was measured in scores from RF provided valuable interpretability, revealing
that texturerelated descriptors—par;cularly GLCM contrast,
terms of training ;me, inference latency per image, and model
Haralick entropy, and Gabor orienta;on energy—contributed
file size. These metrics provided insights into deployability in
most to accurate classifica;on. RF’s strong performance
resource-limited environments such as CPU-based hospital
systems or educa;onal setups. confirms the discrimina;ve strength of radiomics features in
cerebrovascular imaging.
V. RESULTS The Mul;-Layer Perceptron (MLP) achieved an accuracy of
94.66% and macro-F1 of 0.9472, demonstra;ng that even
This sec;on presents the performance outcomes of the ten
shallow neural architectures can model the non-linear
machine learning algorithms trained on the radiomics feature
rela;onships in radiomic data. However, MLP showed slightly
set extracted from the Cerebrovascular Lesions CT Image
higher variability (CV Std = 0.0081), possibly due to random
Dataset. Each model was evaluated using 5×3 Repeated
weight ini;aliza;on and lack of extensive tuning. Despite this,
Stra;fied K-Fold cross-valida;on, ensuring robustness and
sta;s;cal consistency across different data par;;ons. The
its performance validates the poten;al of hybrid radiomics– proved more effec;ve than linear and probabilis;c models in
neural frameworks in data-limited scenarios. capturing the high-dimensional structure of radiomic features.
The XGBoost classifier performed well (93.56% accuracy, The results validate that classical ML algorithms, when
0.9363 macro-F1), outperforming most linear and boos;ng combined with robust radiomics descriptors and standardized
models. XGBoost’s tree-based gradient boos;ng mechanism cross-valida;on, can deliver performance comparable to deep
handles high-dimensional input efficiently but showed learning models without the need for GPUs or largescale data
marginally lower performance than RF, likely due to overfiFng augmenta;on. The overall results highlight the poten;al of
when trained on correlated features. Nonetheless, it achieved explainable and computa;onally efficient ML systems in
strong stability (CV Std = 0.0038) and fast training ;me, radiological image classifica;on.
highligh;ng its suitability for large radiomic datasets.
VI. DISCUSSION
B. Baseline and Underperforming Models
Logis;c Regression, as a linear baseline, achieved 76.85% The results show that radiomics paired with classical ML can
accuracy and a macro-F1 score of 0.7714. Its limited deliver high mul;-class performance on cerebrovascular CT
performance suggests that lesion separability in radiomic without GPUs. SVM’s margin maximiza;on in highdimensional
space is non-linear, requiring kernel-based or ensemble feature space explains its lead (97.79%/0.9783), while KNN and
techniques for op;mal discrimina;on. The Decision Tree RF benefit from well-scaled, PCA-compressed texture–
classifier yielded 64.38% accuracy and the highest variance (CV frequency descriptors, confirming that handcraHed features
Std = 0.0158), confirming its sensi;vity to minor data remain potent when carefully engineered. MLP and XGBoost
perturba;ons and lack of ensemble averaging. provide compe;;ve accuracy with low variance, suppor;ng
Naive Bayes (58.90% accuracy) underperformed due to its the view that modest non-linear capacity captures key
strong independence assump;on, which is violated by interac;ons in radiomic space.
correlated radiomic features (e.g., GLCM and Haralick These findings align with CT radiomics literature in
descriptors). AdaBoost obtained the lowest accuracy (43.03%) neuroimaging: SVM oHen excels on high-dimensional
and macroF1 (0.4391), indica;ng that weak learners such as handcraHed features, and tree ensembles offer interpretability
shallow decision stumps are insufficient for complex texture via feature aLribu;on. Compared with CNN/transformer
feature interac;ons in medical imagery. systems (e.g., SwinNeXt reported with the dataset), our CPU-
only pipeline reaches comparable accuracy ranges while
C. Cross-Valida@on Stability remaining transparent and fast to train, echoing recent reviews
Cross-valida;on standard devia;on (CV Std) across all that emphasize explainability and reproducibility in clinical ML.
models remained below 0.01 for top-performing algorithms, Important caveats remain. Image-level CV can overes;mate
confirming strong consistency. The SVM and XGBoost pa;ent-level generaliza;on due to missing iden;fiers; we
demonstrated the lowest variance, implying stable mi;gated bias via repeated stra;fied CV and report results as
generaliza;on. Models with shallow architectures (Decision slice-level performance. Single-center acquisi;on also limits
Tree, AdaBoost) exhibited higher variability, reflec;ng their protocol diversity. Future work should incorporate mul;site
sensi;vity to feature distribu;on changes across folds. These data with harmoniza;on (e.g., ComBat) and pa;entwise
findings reinforce the reliability of the reported averages. evalua;on, and study deep–radiomics hybrids to test
D. Interpretability and Observa@ons complementarity between interpretable descriptors and
learned embeddings.
Beyond quan;ta;ve performance, interpretability was a key
considera;on. SHAP (SHapley Addi;ve exPlana;ons) analysis,
VII. CONCLUSION
conducted on Random Forest and XGBoost models, revealed
that GLCM contrast, Haralick entropy, and Gabor filter This study presents a comprehensive and interpretable
responses were consistently among the top-ranked predictors. framework for the mul;-class classifica;on of cerebrovascular
These features align with clinical understanding—hemorrhagic CT lesions using radiomics-based features and classical
lesions typically exhibit higher textural heterogeneity machine learning algorithms. The approach leverages five
compared to ischemic or normal regions. This consistency major radiomic families—Histogram of Oriented Gradients
between ML feature importance and radiological intui;on (HOG), Local Binary PaLerns (LBP), Gray Level Co-occurrence
supports the framework’s explainability and trustworthiness. Matrix (GLCM), Haralick features, and Gabor filters—to extract
texture and frequency informa;on from non-contrast CT
E. Summary of Findings images. Ten machine learning models were benchmarked
In summary, SVM emerged as the best-performing model in under a uniform 5×3 Repeated Stra;fied K-Fold cross-
terms of both accuracy and reliability, closely followed by KNN valida;on strategy. Among them, the Support Vector Machine
and Random Forest. Ensemble and kernel-based methods achieved the highest accuracy (97.79%) and macro-F1 score
(0.9783), outperforming other models while maintaining contributors for making the data publicly available.
strong stability. The results confirm that handcraHed radiomics Experiments were conducted using Google Colab (CPU-only).
features, when coupled with well-structured classical ML, can
REFERENCES
provide high diagnos;c accuracy without GPU accelera;on.
The proposed framework is reproducible, computa;onally [1] R. M. Haralick, K. Shanmugam, and I. Dinstein, “Textural features for
image classificacon,” IEEE Trans. Syst., Man, Cybern., vol. SMC-3, no. 6,
efficient, and suitable for deployment or educa;onal use in pp. 610–621, 1973.
resource-constrained environments. Overall, this work [2] T. Ojala, M. Pieckainen, and D. Harwood, “A comparacve study of¨
establishes a strong baseline for explainable and sustainable AI texture measures with classificacon based on feature distribucons,”
systems in cerebrovascular imaging. Pa4ern Recogni:on, vol. 29, no. 1, pp. 51–59, 1996.
[3] N. Dalal and B. Triggs, “Histograms of oriented gradients for human
VIII. LIMITATIONS deteccon,” in Proc. CVPR, 2005, pp. 886–893.
[4] J. G. Daugman, “Two-dimensional spectral analysis of corccal recepcve
Despite the encouraging performance of the proposed field profiles,” Vision Research, vol. 20, no. 10, pp. 847–856, 1980.
approach, several limita;ons must be acknowledged. The most [5] Z. Shi et al., “Machine learning-based radiomics in neurodegeneracve
and cerebrovascular disease: Current status and future direccons,”
significant limita;on stems from the absence of pa;ent
Fron:ers in Neuroscience, vol. 18, 2024. [Online]. Available: h7ps:
iden;fiers in the public Kaggle dataset, which prevents pa;ent- //[Link]/arccles/PMC11518692
wise data spliFng. Consequently, slices from the same [6] H. Zhang et al., “Radiomics and machine learning to predict malignant
individual may appear in both training and valida;on sets, cerebral edema aper reperfusion therapy in acute ischemic stroke,”
Fron:ers in Neurology, vol. 15, 2024. [Online]. Available:
introducing poten;al subject-level leakage. While repeated h7ps://[Link]/arccles/10.3389/fneur.2025.1650970/full
stra;fied crossvalida;on mi;gates random sampling bias, the [7] W. Lyu et al., “CT-based radiomics model for differencacon of primary
resul;ng metrics should be interpreted as image-level rather and secondary intracerebral hemorrhage,” Scien:fic Reports, vol. 13,
than pa;entlevel performance. Addi;onally, the dataset no. 30678, 2023. [Online]. Available: h7ps://[Link]/arccles/
s41598-023-30678-w
originates from a single medical ins;tu;on and imaging [8] Y. Sun et al., “Radiomics-clinical machine learning model predicts fucle
protocol, limi;ng diversity in acquisi;on parameters such as recanalizacon aper endovascular treatment in acute ischemic stroke,”
slice thickness and reconstruc;on kernel. As a result, model BMC Medical Imaging, vol. 24, no. 13, 2024. [Online].
generaliza;on to data from other centers or scanners cannot Available:
h7ps://[Link]/arccles/10.1186/ s12880-
be guaranteed without further valida;on. Finally, the study
024-01365-7
focuses exclusively on classical machine learning methods, [9] G. Macin, I. Tasci, P. D. Barua, I. Sercek, B. Tasci, I. Tuncer, M.
without comparing against modern transformer-based or CNN Ekmekyapar, S. Dogan, M. Baygin, T. Tuncer, and U. R. Acharya,
architectures, which may yield addi;onal insights into deep- “Deep learning-based classificacon of cerebrovascular lesions on
CT images using a novel SwinNeXt architecture,” Kaggle Dataset, 2025.
radiomic feature synergy.
[Online]. Available: h7ps://[Link]/datasets/buraktaci/
cerebrovascular-lesions/data?select=cerebrovascular+lesions
IX. FUTURE WORK
[10] J. Wang, Y. Chen, Z. Liu, and X. Zhao, “Radiomics-based machine
Future research will focus on addressing the iden;fied learning for mulc-class intracranial lesion classificacon using CT and
limita;ons and extending the framework toward clinical-grade MRI fusion features,” Computers in Biology and Medicine, vol. 181,
107826, 2025. [Online]. Available:
valida;on. The first priority is to obtain or construct datasets h7ps://[Link]/10.1016/[Link].2025. 107826
with available pa;ent iden;fiers to enable pa;ent-level
stra;fica;on and truly independent tes;ng. Mul;-ins;tu;onal
and mul;-scanner data will be incorporated to assess cross-
domain robustness and reduce overfiFng to acquisi;on-
specific characteris;cs. Feature harmoniza;on methods, such
as ComBat or z-score normaliza;on, can be applied to balance
radiomic distribu;ons across centers. Another direc;on
involves integra;ng handcraHed radiomic features with deep
feature embeddings from lightweight CNN or transformer
models, crea;ng hybrid architectures that combine
interpretability with enhanced accuracy. Finally, explainability
techniques such as SHAP, LIME, and Grad-CAM will be explored
to improve clinical transparency and trust, advancing the
integra;on of interpretable AI in neuroimaging workflows.

ACKNOWLEDGMENT
The author thanks the faculty of the Manipal Ins;tute of
Technology, Bengaluru, for academic guidance, and the dataset

You might also like