0% found this document useful (0 votes)
72 views11 pages

Autism Detection Hybrid ML Model

This document presents a hybrid machine learning model, BDML-MDCASD, designed to improve the accuracy and efficiency of Autism Spectrum Disorder (ASD) diagnosis by utilizing big data and advanced algorithms. The model incorporates an improved Squirrel Search Algorithm for feature selection and combines Autoencoder with the Butterfly Optimization Algorithm for enhanced classification, achieving a classification accuracy of 92%. The study highlights the potential of this model to automate and streamline ASD detection, addressing significant challenges in the healthcare industry related to data management and diagnosis.

Uploaded by

23023089
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
72 views11 pages

Autism Detection Hybrid ML Model

This document presents a hybrid machine learning model, BDML-MDCASD, designed to improve the accuracy and efficiency of Autism Spectrum Disorder (ASD) diagnosis by utilizing big data and advanced algorithms. The model incorporates an improved Squirrel Search Algorithm for feature selection and combines Autoencoder with the Butterfly Optimization Algorithm for enhanced classification, achieving a classification accuracy of 92%. The study highlights the potential of this model to automate and streamline ASD detection, addressing significant challenges in the healthcare industry related to data management and diagnosis.

Uploaded by

23023089
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Received 6 November 2024, accepted 5 December 2024, date of publication 18 December 2024,

date of current version 30 December 2024.


Digital Object Identifier 10.1109/ACCESS.2024.3520009

A Hybrid Machine Learning Model for


Accurate Autism Diagnosis
DURGA PRASAD KAVADI 1 , VENKATA RAMI REDDY CHIRRA 2 , PALACHARLA RAVI KUMAR3 ,
SAI BABU VEESAM 2 , SAGAR YERUVA4 , AND LALITHA KUMARI PAPPALA2
1 Department of CSE–AI & ML, DRK Institute of Science and Technology, Hyderabad 500043, India
2 Schoolof Computer Science and Engineering, VIT-AP University, Amaravati 522241, India
3 Department of AI & ML, R.V.R. & J.C. College of Engineering, Guntur 522529, India
4 Department of CSE–AIML & IoT, VNR VJIET, Hyderabad 500090, India

Corresponding author: Venkata Rami Reddy Chirra ([email protected])

ABSTRACT The healthcare industry faces significant challenges in managing and processing large volumes
of unstructured, real-time medical data. As such, there is a growing need for advanced techniques to handle
complex data in the diagnosis of disorders like Autism Spectrum Disorder (ASD). This study presents
a Big Data and Machine Learning-based Medical Data Classification (BDML-MDCASD) model aimed
at improving the accuracy and efficiency of ASD diagnosis. The proposed model employs an improved
Squirrel Search Algorithm-based Feature Selection (ISSA-FS) to identify the most relevant features from
medical data. Additionally, a hybrid classification approach is introduced, combining Autoencoder (AE) with
the Butterfly Optimization Algorithm (BOA) to enhance detection accuracy. To manage and process large
datasets effectively, the MapReduce tool is used for efficient data handling. The model was evaluated across
multiple ASD datasets, including ASD-Children (292 instances), ASD-Adolescent (104 instances), and
ASD-Adult (704 instances). Simulation results demonstrate that the BDML-MDCASD model outperforms
traditional methods, achieving a classification accuracy of 92%, precision of 90%, and recall of 93%. These
results underscore the potential of the proposed model in providing a robust, automated solution for early
ASD detection, offering a significant advancement over existing diagnostic methods.

INDEX TERMS Auto encoder, autism spectrum disorder, big data, machine learning (ML), Butterfly
Optimization, Internet of Things (IoT), MapReduce.

I. INTRODUCTION In contrast, other sectors have progressively mitigated


Artificial Intelligence (AI) and recent advancements in big inefficiencies by leveraging AI technology [2]. Incorporating
data technologies have significantly enhanced collaborative AI into healthcare could reduce unnecessary costs, increase
and interactive decision-making across various domains. efficiency, improve clinician decision-making, and enhance
However, in the medical sector, the integration of AI is still the quality of medical services. Nonetheless, the development
in its early stages [1]. This early-stage integration is largely of cost-effective technologies is crucial and directly tied to
due to the fragmented nature of the medical system, which resource consumption.
inherently drives up costs and complexity, leading to a lack of In the early stages of exploratory data analysis, there is
collaboration among stakeholders and misaligned interests. often uncertainty regarding the resources required by specific
Furthermore, issues such as poor platform interoperability analytical tools to manage complex and disparate data.
and time-consuming manual processes impose unneces- These tools, which may be associated with significant costs,
sary strain and create a substantial administrative burden. often require specialized software to interconnect various
file formats and databases [3]. Advanced computing power,
The associate editor coordinating the review of this manuscript and such as graphical processing units (GPUs), is necessary
approving it for publication was Sotirios Goudos . for handling rapid data processing. Big data techniques
2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License.
VOLUME 12, 2024 For more information, see https://creativecommons.org/licenses/by/4.0/ 194911
D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

can address these limitations by integrating heterogeneous B. CONTRIBUTIONS


data from multiple sources, including sensors, electronic • Development of a big data and machine learning-based
health records (EHRs), and more, thereby enabling faster medical data classification model (BDML-MDCASD)
and more accurate healthcare data analysis for early disease for automated ASD diagnosis.
detection [4]. Big data can also facilitate precision medicine, • Introduction of an improved Squirrel Search Algorithm
an emerging tool that helps transplant surgeons select the (ISSA-FS) for optimal feature selection, enhancing the
most suitable organs for recipients. efficiency of ASD classification.
Autism Spectrum Disorder (ASD) is a serious neurode- • Proposal of a novel classification approach using
velopmental disorder affecting 1–3% of the population [5]. Autoencoder (AE) and Butterfly Optimization Algorithm
It has a profound and lasting impact, leading to significant (BOA) to improve the accuracy of ASD diagnosis.
societal and personal costs. Given the substantial impact • Utilization of the MapReduce tool to effectively manage
and associated costs of ASD, identifying high-risk groups big data, ensuring faster and more accurate processing in
and understanding ASD etiology are critical. Although the the ASD detection process.
precise causes of ASD are not fully understood, it is
The remainder of this paper is structured as follows:
believed to be multifactorial, involving genetic, behavioral,
Section II covers related work on the BDML-MDCASD tech-
and environmental risk factors [6]. A family history of
nique, Section III presents the proposed model, Section IV
autism is a known risk factor, suggesting the influence of
discusses the results and analysis, and finally, Section V
genetic factors as well as shared environmental, nutritional,
concludes the paper.
and social risks. Due to the heterogeneous nature and high
prevalence of ASD, many researchers favor machine learning
(ML) over traditional statistical methods for data analysis. II. RELATED WORK
ML, a branch of AI, focuses on pattern recognition and Several studies have explored the use of machine learning
inductive reasoning by extracting common patterns and (ML) and deep learning (DL) methods to diagnose autism
rules from large datasets to generate new knowledge [7], spectrum disorder (ASD), focusing on diverse data types
[8]. The ability of ML to process data has already had and classification techniques. Kashef [9] employed a deep
a considerable impact on fields like customer behavior learning method to identify ASD patients using brain
analytics. imaging data from the ABIDE (Autism Brain Imaging
This study presents a big data and machine learning-based Data Exchange) dataset. The study utilized a convolutional
medical data classification (BDML-MDCASD) model for neural network (CNN) framework to investigate functional
diagnosing ASD. The BDML-MDCASD model includes an connectivity patterns among various brain regions, success-
improved squirrel search algorithm-based feature selection fully recognizing specific patterns associated with ASD
(ISSA-FS) technique to identify the optimal subset of diagnosis.
features. Additionally, the butterfly optimization algorithm In another study, Omar et al. [10] examined various
(BOA) combined with an Autoencoder (AE) model is used tree-based machine learning techniques for predicting autism
for the detection and classification of ASD. The MapReduce behaviors across different age groups. The research compared
tool is employed to manage big data throughout the ASD distinct tree-based approaches for developing predictive
diagnosis process. A series of simulations on benchmark models and evaluated their performance using two separate
datasets were conducted to evaluate the performance of the datasets. The study culminated in the development of
BDML-MDCASD technique. a novel tree-based method, which integrates Regression
and Classification Trees with Iterative Dichotomiser 3 in
a Random Forest (RF) classification model to enhance
A. MOTIVATION prediction accuracy.
The exponential growth of data generated by IoT devices in Amador et al. [11] explored the application of data
the healthcare industry poses significant challenges in terms mining (DM) and machine learning techniques for early ASD
of data storage, management, and processing, especially for diagnosis. Their method included several steps: (1) loading,
unstructured and real-time data. In the context of autism extracting, and transforming data; (2) selecting and searching
spectrum disorder (ASD), which affects communication relevant data sources; (3) visualizing results; (4) creating
and emotional development, there is a pressing need for a data mart and warehouse; and (5) applying machine
automated, efficient diagnostic solutions. Existing healthcare learning algorithms to extract useful patterns for accurate
systems lack tools that can effectively handle large volumes ASD categorization. This structured approach demonstrated
of healthcare data while providing accurate and timely the utility of DM and ML methods in extracting valuable
ASD diagnoses. This study is motivated by the demand information for ASD diagnosis.
for innovative solutions that leverage big data environments Eslami et al. [12] proposed a hybrid approach combining
and machine learning models to improve the efficiency of conventional ML and DL techniques to identify ASD
ASD detection and provide a robust, automated approach to biomarkers from MRI datasets. Their model, named Auto-
support clinical decisions. ASD-Network, integrates deep learning with support vector

194912 VOLUME 12, 2024


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

machines (SVM) to classify ASD images from neurotypical


ones, demonstrating the efficacy of combining DL and
traditional ML methods for image-based ASD diagnosis.
Min [13] developed an architecture designed to detect,
record, and label the behavioral patterns of children with
ASD using both static and wearable sensors. Static sensors,
such as cameras and microphones, capture video, sound,
and images of the subject, while wearable sensors, such as
accelerometers, detect behavioral patterns. This multimodal
approach improves the accuracy and comprehensiveness of
behavioral pattern recognition in ASD patients.
Recent research has also focused on feature selection
techniques for ASD classification. Krishnan et al. [24]
and Natarajan et al. [25] proposed deep learning-based
feature selection methods to enhance the classification of
ASD. Parikh et al. [26] and Omar et al. [27] introduced FIGURE 1. Workflow of the IoTC-DTLDRC approach for DR stage
machine learning models for ASD classification, while recognition in IoT-Assisted cloud framework.

Reghunathan et al. [30] proposed a machine learning


model tailored for ASD classification across different age
groups. Additionally, Punia et al. [28] and Prasad et al. improvement, and the comparison results demonstrated its
[29] developed related machine learning models for health superiority to other approaches. The MapReduce tool was
diagnosis, contributing to advancements in automated ASD used to handle big data ASD, while the BDML-MDCASD
detection. technique comprises major subprocesses such as prepro-
Andrade et al. [14] developed a hybrid model based on cessing, ISSA-based feature selection, AE-based classifica-
machine learning to derive insights using Verbal Decision tion, and BOA-based parameter optimization. We strongly
Analysis, a multi-criteria decision support system (DSS) believe that the BDML-MDCASD technique will be the
technique. Their model effectively applied the ICD-10 pro- most prominent approach for identifying and classifying
tocol, allowing for more agile ASD diagnosis by identifying ASDs.
even minor symptoms.
Furthermore, Raj and Masood [15] evaluated the feasibility B. MapReduce TOOL
of using various machine learning classifiers, including MapReduce is one essential framework that is applied in
Naive Bayes (NB), Support Vector Machines (SVM), distributed data processing, mainly in handling large-scale
Logistic Regression (LR), K-Nearest Neighbors (KNN), datasets. In our case of the ASD diagnosis model, it has been
Neural Networks (NN), and CNN, for predicting ASD effective in handling extensive healthcare data associated
issues in adults, children, and adolescents. Their work with ASD diagnosis. Therefore, there is a need for scaling,
demonstrated the potential of these models, particularly when parallelization, and fault tolerance when dealing with data
evaluated on three distinct open-source non-clinical ASD processing.
datasets. Since our model involves large and complicated datasets,
integrating varied clinical features across different age
groups, traditional approaches may conflict with both mem-
III. THE PROPOSED MODEL
ory and time constraints. However, this can be overcome
In this proposed study, a new IoTC-DTLDRC approach by applying the MapReduce approach since we can break
has been presented for the recognition and classification of large data into smaller, manageable chunks for processing in
different DR stages in the IoT-assisted cloud framework. parallel across a distributed system.
The developed IoTC-DTLDRC technique encompasses sev-
eral subprocesses such as BF-based preprocessing, region 1) DATA PARTITIONING AND PROCESSING
growing segmentation, EfficientNet-based feature extrac-
This dataset, which contains features of behavioral data
tion, ICSA-based hyper-parameter tuning, and classification
and the diagnostic labels, is split into small blocks. Parallel
(XGBoost and Adaboost). Figure 1 displays the working pro-
handling of these blocks is taken care of by the mappers.
cess of the developed IoTC-DTLDRC technique. The details
The mapper will apply a specific customized logic tailored
related to each module are elaborated in the subsections that
to some particular features that have relevance to ASD diag-
follow.
nosis. For example, these include age-based data clustering
or feature selection, based on their relevance to diagnosis. For
A. NOVELTY OF THE WORK example, there will be different blocks which the mappers can
A series of simulations were carried out on bench- work on, so that all these age-specific patterns in the ASD
mark datasets to ensure the BDML-MDCASD technique’s dataset are captured.

VOLUME 12, 2024 194913


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

2) INTERMEDIATE OUTPUT AND DISK WRITING Algorithm 1 Butterfly Optimization Algorithm (BOA)
The mapper generates intermediate outputs that include select Input : Dim: Number of dimensions
features, computed statistics, etc., which are written to the Input : Max_Iter: Maximum number of iterations
local disk instead of HDFS, thus reducing duplicate storage to Input : curr_Iter: Current iteration
hold results temporarily before actual transfer for processing. Input : Objective Function: Function to be optimized
Input : X: Primary population
3) DATA SHUFFLING AND COPYING Input : c: Sensory modality
The outputs of the mappers are shuffled and replicated Input : I : Stimulus intensity
to reducer nodes within the distributed system. At the Input : p: Switch probability
reducer’s stage, it merges the processed outputs from different Output: g∗: Optimal butterflies
mappers. For instance, if the mappers do feature selection or Initialize( Create a uniform distributed solution
classification, the reducer combines these outputs in order to X = (x1 , x2 , . . . , xn );
refine the final decision. Determine sensory modality c, stimulus intensity I , and
switch probability p;
Compute stimulus intensity Ii at xi utilizing f (xi );
4) MERGING, SORTING, AND AGGREGATION
)
At the reducer nodes, once the data is received from the while curr_Iter < Max_Iter do
mappers, they sort and merge results based upon relevant
diagnostic features—for example, classification accuracy and end
feature importance. The final aggregation step is passed along for all butterflies in X do
sorted data where the ASD diagnostic predictions are refined Compute fragrance utilizing in Eq. (18);
and validated in relation to combined inputs of all the age end
groups involved. g* = optimum butterflies;
for all butterflies in X do
5) FINAL REDUCING AND OUTPUT GENERATION end
The results are processed in the reduce phase on the r = rand();
aggregated data, and the final results, which the tasks if r < p then
produced, are stored in HDFS. These results correspond to the Upgrade butterflies place utilizing in Eq. (19);
processed, accurate, and full insights into the ASD diagnosis end
within the full scope of the dataset, including age-related else
variations. Upgrade butterfly place utilizing in Eq. (20);
The critical requirement of using MapReduce in our end
approach is handling large volumes of data from multiple Update value of a;
sources, which promises efficiency and accuracy simul- Increment curr_Iter;
taneously. The dispersal of workload on multiple nodes return g*
brings down the computation considerably, which speeds
up the time cycle for processing. Also, through parallel Algorithm 1 describes the Butterfly Optimization
processing, the method ensures that the model developed does Algorithm (BOA) [19], a metaheuristic inspired by the
scale appropriately and has the ability to handle real-time foraging behavior of butterflies. The algorithm begins by
data streams from clinical sources without a detectable initializing a population of butterflies with random positions
degradation of performance. across the search space. Each butterfly represents a potential
solution, and its quality is evaluated using the objective
function.
C. DATA PREPROCESSING
During the optimization process, each butterfly’s position
The three main preprocessing steps for patient data were is updated based on its fragrance, which is proportional
format conversion, handling missing values, and class to its fitness. The algorithm utilizes both global and local
labeling. Initially, data in .arff format was converted to a search strategies to explore the solution space effectively.
compatible .csv format. Missing values were filled using the In the global search phase, butterflies are attracted towards
median process. Finally, class labeling was applied to map the the best solution found so far, while in the local search phase,
data’s class labels to ASD. their positions are updated based on a probabilistic switching
mechanism.
D. ALGORITHMIC PROCESS IN ISSA-FS TECHNIQUE The algorithm iterates until a specified number of iterations
After the data preprocessing stage, the optimal feature subsets is reached. At each iteration, the fragrance of each butterfly
are selected using the ISSA-FS technique. SSA updates the is recalculated, and their positions are updated accordingly.
location of the squirrels based on the current season, the type The best solution found throughout the iterations is returned
of food, and the presence of predators. as the optimal solution.

194914 VOLUME 12, 2024


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

This approach leverages the balance between exploration Levy indicates a random walk approach in which the step
and exploitation to find high-quality solutions in complex satisfies the Levy distribution, as given in Eq. (11):
optimization problems. ra
Consider the total squirrel population as N , and let the Le’vy(x) = 0.01 × (11)
|rb|1/β
maximum and minimum limits of the search area be FSu and
FSl , respectively. N squirrels are randomly initialized using where β is a constant that can be determined using Eq. (12):
Eq. (1):
πβ
   
1+β  sin 2
FSi = FSl + rand(1, D) × (FSu − FSl ) (1) β=   ×  (12)
1+β 1+β
0 2 β2(β−1)
where FSi represents the i-th individual, i = 1, . . . , N ; 2

rand is a random number between 0 and 1, and D is the Once a flying squirrel creates novel locations, its natural
dimension. The squirrels update their positions by sliding behavior is influenced by predator existence, controlled by
towards hickory/acorn trees. The update process can be predator existence probability Pdp . In the initial search phase,
represented as: the flying squirrel population is usually far from the food
source, and their distribution range is large, meaning they
FSit+1 = FSit + dg · Gc × (Fht − FSit )
face a higher threat from predators. As the process evolves,
if r > Pdp , random location otherwise (2) the flying squirrel positions get closer to food sources (i.e.,
FSit+1 = FSit + dg · Gc × (Fat − FSit ) better solutions), reducing the predator threat. To enhance
if r > Pdp , random location otherwise (3) the exploitation capability of SSA, an adaptive Pdp , which
dynamically varies as a function of the iteration number,
Here, r is a random number between 0 and 1, Pdp represents is implemented:
the predator appearance probability, and Gc is a constant. The  10
gliding distance dg is given by Eq. (4):  Iter
Pdp = Pdpmax − Pdpmin × 1 − + Pdpmin (13)
Itermax
dg = hg · tan(θ) · sf (4)
where Pdpmax and Pdpmin refer to the maximal and minimal
where hg and sf are constants, and the gliding angle θ is predator occurrence probabilities, respectively.
determined by Eq. (5): During the FS process in the ISSA-FS technique, if the
D feature vector size is N , the number of possible feature
tan(θ) = (5) combinations is 2N , which represents a large search space.
L
The proposed hybrid approach dynamically reduces the
with D being the drag force and L the lift force, calculated by feature space, selecting the required group of features. Due to
Eqs. (6) and (7): the multi-objective nature of the problem, FS minimizes the
1 2 subset of features while maximizing classifier accuracy. The
D= V SCD (6) fitness function is defined to balance these two objectives,
2
1 as given in Eq. (14):
L = V 2 SCL (7)
2
fitness = 1RD + α|Y | (14)
p, V , S, and CD are constants. Initially, the SSA requires
the entire population to be in the winter season [17]. Each where 1RD represents the classification error rate, |Y | refers
squirrel gets updated, and the season change is checked using to the subset size selected by the technique, and |T | is the
Eqs. (8) and (9): total number of features in the dataset. The parameter α ∈
[0, 1] adjusts the balance between the classifier’s accuracy
D
X 2 and feature reduction, while β = 1 − α emphasizes feature
Sct = Fat i ,k − Fh,k
t
i = 1, 2, . . . , Nfs (8)
reduction. The classifier’s accuracy is weighted to prioritize
k=1
  dimensionality reduction in the optimization process.
365 T
Smin = 10e−6 (9)
i 2.5 E. ALGORITHMIC PROCESS IN BOA-AE TECHNIQUE
where T represents the maximum number of iterations. At the final stage, the BOA-AE-based classification model
If Sct < Smin , winter ends, and the season changes to summer; assigns the correct classes to the input data. The core of
otherwise, it remains the same. When the season changes to this model uses deep learning (DL), specifically deep neural
summer, every squirrel that glided to Fh stops at an updated networks (DNNs) [31] or multilayer perceptrons (MLPs)
location. The squirrels that glided to Fa but failed to encounter [32], to represent a complex function mapping the input
predators move to the respective location using Eq. (10): data x ∈ Rdin to the output y ∈ Rdout . The classical
DNN consists of input, output, and L hidden layers. Each
FSinew = FSL + Le’vy × (FSu − FSL ) (10) hidden layer transforms the output of the previous layer using

VOLUME 12, 2024 194915


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

two operations: an affine mapping followed by a non-linear By having the latent dimension p much smaller than the input
activation function, as shown below: dimension d, the encoding function henc is trained to learn
  compressed representations of x, denoted as the embedded
x (l) = σ W (l) x (l−1) + b(l) l = 1, . . . , L x ′ ∈ Rp . The decoding function hdec then reconstructs the
input data by mapping the embedded representation back to
The rest of the SSA process follows as described in [17].
the high-dimensional space. The application of AE is based
on manifold theory, which assumes that the high-dimensional
input data, denoted as E, lies on a low-dimensional manifold
E ′ that is embedded within the high-dimensional vector
space.

G. FLOWCHART OF BOA
To optimally select the parameters in the AE model, the
BOA is applied, which enhances the classifier’s performance.
Butterflies use their sense of taste, smell, and sight to
locate food or mating partners. BOA, introduced by Arora
and Singh [19], is a nature-inspired optimization technique
based on butterfly foraging behavior. Biologically, butterflies
have sensory receptors distributed throughout their bodies,
assumed to act as chemoreceptors, used to sense food or
flower fragrances. In BOA, all butterflies are considered
to emit a fragrance with a certain intensity. A butterfly
capable of sensing the fragrance of an optimal butterfly
moves towards it. If a butterfly cannot sense any fragrance,
it moves randomly through the search space. In BOA, the
fragrance is computed as a function of physical intensity,
as given by Eq. (17). The global search (exploration) and
local search (exploitation) stages, referred to as upgrading
butterflies and updating butterfly positions, are represented
by Eqs. (18) and (19), respectively. Figure 2 illustrates the
flowchart of BOA.
FIGURE 2. Workflow of the BOA approach for DR stage recognition in pfi = cI a (17)
IoT-Assisted cloud framework.
xit+1 = xit ∗
+ r2 (g − xit )pfi (18)
xit+1 = xit + r2 (xjt − xkt )pfi (19)
F. ALGORITHMIC PROCESS IN BOA-AE TECHNIQUE
To optimally select the parameters in the AE model, the where pfi represents the perceived fragrance by another
BOA is applied. The fitness function in BOA is defined butterfly, c refers to the sensory modality, and I and a signify
as the minimization of classification error, as described by the stimulus intensity and power exponent, respectively [20].
Eq. (15): Autoencoders (AEs) [33] are an unsupervised The BOA-AE technique computes a fitness function,
learning approach where the DNN framework is leveraged which defines a positive integer representing the optimal
for dimensionality reduction or representation learning. result of the candidate solution. In this context, the fitness
Specifically, the goal of an AE is to optimally copy its function indicates the minimization of the classification error
input to output using a lower-dimensional representation rate, as defined in Eq. (20). A candidate with a better solution
by establishing a low-dimension embedded layer. An AE will have a lower error rate and vice versa.
consists of two parts: an encoding function henc (·; enc) : fitness(xi ) = ClassifierErrorRate(xi )
Rd → Rp and a decoding function hdec (·; dec) : Rp → Rd . Number of misclassified documents
The AE is defined as follows: = × 100
Total number of documents
x ′ = hdec (henc (x)) := hdec (henc (x; enc); dec) (15) (20)
:= hdec (henc (x; enc); dec) (16)
IV. RESULTS AND DISCUSSION
where p < d implies the embedded dimensionality, and In this section, the experimental results of the BDML-
enc and dec represent the DNN parameters of the encoding MDCASD technique are analyzed. The results are evaluated
and decoding parts, respectively. X ′ stands for the output using three datasets, namely ASD-Children [21], ASD-
of the AE, which is a reformulated version of the input x. Adolescent [22], and ASD-Adult [23], with 292, 104, and

194916 VOLUME 12, 2024


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

704 instances respectively. All datasets contain 21 attributes.


Table 1 and Fig. 3 show the best cost (BC) analysis of the
ISSA-FS technique compared to other existing techniques.
The results demonstrate that both the GWO and PSO
algorithms failed to achieve effective feature selection (FS)
results, with maximum BC values of 0.6523 and 0.7891,
respectively. The QODF algorithm achieved a moderately
reduced BC of 0.3127. However, the proposed ISSA-FS
technique selected 9 features with the lowest BC of 0.2867.

TABLE 1. Selected features of existing methods compared to ISSA-FS


method.

The confusion matrix derived by the BDML-MDCASD


technique on the three ASD datasets is shown in Fig. 4. For
the ASD-Children dataset, the BDML-MDCASD classified
FIGURE 3. BC analysis of ISSA-FS technique compared with existing
139 instances as ASD-positive and 149 as ASD-negative. methods.
Similarly, for the ASD-Adolescent dataset, 185 instances
were classified as ASD-positive and 509 as ASD-negative.
For the ASD-Adult dataset, the BDML-MDCASD tech-
nique classified 63 instances as ASD-positive and 40 as
ASD-negative.

TABLE 2. Result analysis of BDML-MDCASD method on applied datasets.

Table 2 shows the overall classification results of the


BDML-MDCASD method on the three ASD datasets.
Fig. 5 presents the sensitivity, specificity, F-Score, and
Kappa analysis of the BDML-MDCASD technique. For the
ASD-Children dataset, the BDML-MDCASD demonstrated FIGURE 4. Confusion Matrix: a) ASD-Children Dataset, b) ASD-Adolescent
Dataset, c) ASD-Adult Dataset.
effective results with sensitivity, specificity, F-Score, and
Kappa values of 98.58%, 98.68%, 98.58%, and 98.21%,
respectively. Similar performance was observed for the
Fig. 3 provides a comparison of the BDML-MDCASD
ASD-Adolescent and ASD-Adult datasets.
technique with existing methods. The DT model showed
Fig. 6 showcases the overall accuracy analysis of the
the least classification performance, while the LR model
BDML-MDCASD technique on the three ASD datasets. The
performed slightly better. The NN model achieved further
BDML-MDCASD technique classified ASD with accuracies
improvement over the LR and DT models, but not as much
of 98.63%, 98.58%, and 99.04% for the ASD-Children, ASD-
as other methods. The FS-DSAN model achieved moder-
Adolescent, and ASD-Adult datasets, respectively.
ately competitive results. However, the BDML-MDCASD
technique demonstrated superior classification performance,
TABLE 3. Comparative analysis of BDML-MDCASD method with existing
techniques.
particularly with the ASD-Children dataset, where it
obtained sensitivity, specificity, F-Score, and Kappa values
of 0.9858, 0.
The results in Table 3 show that the BDML-MDCASD
technique achieved superior results across all evaluation met-
rics compared to existing methods. Specifically, the technique
demonstrated a sensitivity of 100% on the ASD-Adult dataset
and consistently higher accuracy on all datasets.

VOLUME 12, 2024 194917


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

FIGURE 5. Performance analysis of BDML-MDCASD method. FIGURE 7. Accuracy analysis of BDML-MDCASD on ASD datasets.

training losses, indicating that the model was able to learn


effectively from the training data and avoid overfitting.

FIGURE 8. Loss analysis of BDML-MDCASD on ASD datasets.


FIGURE 6. Accuracy analysis of BDML-MDCASD method on three datasets.

The loss values are significantly lower compared to


other models, suggesting that the BDML-MDCASD method
Fig. 7 showcases the accuracy analysis of the BDML- converges faster and generalizes better across datasets.
MDCASD technique on the ASD datasets. The BDML-
MDCASD technique achieved accuracy values of 98.63%, B. COMPARATIVE ANALYSIS OF BDML-MDCASD WITH
98.58%, and 99.04% for the ASD-Children, ASD- EXISTING METHODS
Adolescent, and ASD-Adult datasets, respectively. These In order to further validate the performance of the BDML-
results indicate the effectiveness of the BDML-MDCASD MDCASD technique, a comparative analysis with recent
method in accurately classifying ASD. methods is shown in Fig. 9. The DT model showcased the
least classification performance, achieving an accuracy of
A. LOSS ANALYSIS OF BDML-MDCASD 54.70%, while the LR model performed slightly better with
Fig. 8 presents the loss analysis of the BDML-MDCASD an accuracy of 59.10%. The NN model achieved a moderate
technique. The method achieved minimal validation and improvement, reaching 62.00% accuracy. The FS-DSAN

194918 VOLUME 12, 2024


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

model provided competitive results with accuracy values The figure illustrates that the Decision Tree (DT), Logistic
of 97.60%, 97.87%, and 97.12% for the ASD-Children, Regression (LR), and Neural Network (NN) models showed
ASD-Adolescent, and ASD-Adult datasets, respectively. minimum accuracy values of 0.5470, 0.5910, and 0.6200,
However, the BDML-MDCASD technique outperformed all respectively. The FS-DSAN model achieved enhanced out-
the existing models, achieving superior accuracy of 98.63%, comes, with accuracy values of 0.9760, 0.9787, and 0.9712 on
98.58%, and 99.04% on the same datasets. the test ASD-Children, ASD-Adolescent, and ASD-Adult
datasets, respectively.
However, the presented BDML-MDCASD technique
demonstrated superior results, with accuracy values of
0.9863, 0.9858, and 0.9904 on the same test datasets.
These results highlight that the BDML-MDCASD technique
has proven to be an effective method for ASD detection
and classification, outperforming existing methods across
multiple datasets.

V. CONCLUSION AND FUTURE DIRECTIONS


In this study, a novel BDML-MDCASD technique has been
presented for the automated and early detection of ASD
in a big data environment. To efficiently handle big data
in ASD diagnosis, the MapReduce tool is employed. The
BDML-MDCASD technique comprises several key subpro-
cesses, including data pre-processing, Improved Sparrow
Search Algorithm (ISSA) based feature selection, Autoen-
coder (AE) based classification, and Butterfly Optimization
FIGURE 9. Comparative analysis of BDML-MDCASD with existing
Algorithm (BOA) for parameter optimization. The proposed
methods. BDML-MDCASD technique has achieved superior ASD
classification performance due to the efficient utilization of
These results emphasize the effectiveness of the ISSA for feature selection and BOA for parameter tuning. To
BDML-MDCASD technique in providing enhanced classi- validate the performance of the BDML-MDCASD technique,
fication accuracy for ASD detection, outperforming state-of- a series of simulations were conducted on benchmark
the-art methods across all datasets. datasets. Comparative results demonstrated the superiority of
the BDML-MDCASD technique over other existing methods.
Therefore, the BDML-MDCASD technique emerges as a
promising approach for ASD detection and classification,
with potential for further development.
While the BDML-MDCASD technique has shown remark-
able results, there are several avenues for future research
to improve the system further. First, incorporating outlier
detection mechanisms can help identify anomalies in the
dataset, which could enhance the robustness of the model.
Additionally, clustering processes could be integrated to
group ASD-related data into distinct categories, enabling
more granular classification and diagnosis. Exploring alterna-
tive deep learning techniques and feature extraction methods
could also help in improving the model’s adaptability across
diverse datasets. Future work can focus on real-time ASD
detection using edge computing and distributed learning,
which could extend the applicability of this model in practical
healthcare environments.

REFERENCES
[1] R. Patan, G. Nagasubharmanian, and B. Balusamy, ‘‘Big data and IoT:
FIGURE 10. Accuracy comparison of the BDML-MDCASD method with Trends, issues and applications,’’ Recent Adv. Comput. Sci. Commun.,
existing methods. vol. 13, no. 6, pp. 1251–1252, Jan. 2021.
[2] M. V. Lombardo, M.-C. Lai, and S. Baron-Cohen, ‘‘Big data approaches to
decomposing heterogeneity across the autism spectrum,’’ Mol. Psychiatry,
Figure 10 provides a comparison of the BDML-MDCASD vol. 24, no. 10, pp. 1435–1450, Oct. 2019, doi: 10.1038/s41380-018-
technique with recent approaches in terms of accuracy. 0321-0.

VOLUME 12, 2024 194919


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

[3] S. Neelakandan, S. Divyabharathi, S. Rahini, and G. Vijayalakshmi, [20] A. S. Assiri, ‘‘On the performance improvement of butterfly optimization
‘‘Large scale optimization to minimize network traffic using MapReduce approaches for global optimization and feature selection,’’ PLoS ONE,
in big data applications,’’ in Proc. Int. Conf. Comput. Power, Energy vol. 16, no. 1, Jan. 2021, Art. no. e0242612, doi: 10.1371/jour-
Inf. Commuincation (ICCPEIC), Melmaruvathur, India, Apr. 2016, nal.pone.0242612.
pp. 193–199, doi: 10.1109/ICCPEIC.2016.7557196. [21] UCI Machine Learning Repository: Autistic Spectrum Disorder Screening
[4] L. Ejlskov, J. N. Wulff, A. Kalkbrenner, C. Ladd-Acosta, M. D. Fallin, Data for Children. Accessed: Oct. 15, 2024. [Online]. Available:
E. Agerbo, P. B. Mortensen, B. K. Lee, and D. Schendel, ‘‘Prediction https://tinyurl.com/34wzjmha
of autism risk from family medical history data using machine learning: [22] UCI Machine Learning Repository: Autistic Spectrum Disorder Screening
A national cohort study from Denmark,’’ Biol. Psychiatry Global Open Data for Adolescents. Accessed: Oct. 15, 2024. [Online]. Available:
Sci., vol. 1, no. 2, pp. 156–164, Aug. 2021, doi: 10.1016/j.bpsgos.2021. https://tinyurl.com/2abbyjbe
04.007. [23] UCI Machine Learning Repository: Autism Screening Adult. Accessed:
[5] R. Manikandan, R. Patan, A. H. Gandomi, P. Sivanesan, and Oct. 15, 2024. [Online]. Available: https://archive.ics.uci.edu/ml/datasets/
H. Kalyanaraman, ‘‘Hash polynomial two factor decision tree using IoT Autism+Screening+Adult
for smart health care scheduling,’’ Exp. Syst. Appl., vol. 141, Mar. 2020, [24] U. Krishnan and L. Parthiban, ‘‘An optimal metaheuristic based feature
Art. no. 112924. selection with deep learning model for autism spectrum disorder diagnosis
[6] M. S. Mekala, R. Patan, S. H. Islam, D. Samanta, G. A. Mallah, and and classification,’’ Iioabj, vol. 12, no. 1, p. 17, 2021.
S. A. Chaudhry, ‘‘DAWM: Cost-aware asset claim analysis approach [25] V. A. Natarajan, M. S. Kumar, R. Patan, S. Kallam, and
on big data analytic computation model for cloud data centre,’’ Secur. M. Y. N. Mohamed, ‘‘Segmentation of nuclei in histopathology images
Commun. Netw., vol. 2021, pp. 1–16, May 2021. using fully convolutional deep neural architecture,’’ in Proc. Int. Conf.
[7] S. Neelakandan and D. Paulraj, ‘‘An automated learning model of Comput. Inf. Technol. (ICCIT), Sep. 2020, pp. 1–7.
conventional neural network based sentiment analysis on Twitter [26] M. N. Parikh, H. Li, and L. He, ‘‘Enhancing diagnosis of autism
data,’’ J. Comput. Theor. Nanosci., vol. 17, no. 5, pp. 2230–2236, with optimized machine learning models and personal characteristic
May 2020. data,’’ Frontiers Comput. Neurosci., vol. 13, p. 9, Feb. 2019, doi:
[8] E. Jokiranta, A. S. Brown, M. Heinimaa, K. Cheslack-Postava, 10.3389/fncom.2019.00009.
A. Suominen, and A. Sourander, ‘‘Parental psychiatric disorders and [27] K. S. Omar, P. Mondal, N. S. Khan, M. R. K. Rizvi, and M. N. Islam,
autism spectrum disorders,’’ Psychiatry Res., vol. 207, no. 3, pp. 203–211, ‘‘A machine learning approach to predict autism spectrum disorder,’’ in
May 30, 2013, doi: 10.1016/j.psychres.2013.01.005. Proc. Int. Conf. Electr., Comput. Commun. Eng. (ECCE), Feb. 2019,
[9] R. Kashef, ‘‘ECNN: Enhanced convolutional neural network for pp. 1–6.
efficient diagnosis of autism spectrum disorder,’’ Cognit. Syst. [28] S. K. Punia, M. Kumar, T. Stephan, G. G. Deverajan, and
Res., vol. 71, pp. 41–49, Jan. 2022, doi: 10.1016/j.cogsys.2021. R. Patan, ‘‘Performance analysis of machine learning algorithms for
10.002. big data classification: ML and AI-based algorithms for big data
[10] K. S. Omar, M. N. Islam, and N. S. Khan, ‘‘Chapter 9—Exploring analysis,’’ Int. J. E-Health Med. Commun., vol. 12, no. 4, pp. 60–75,
tree-based machine learning methods to predict autism spectrum disor- Jul. 2021.
der,’’ in Neural Engineering Techniques for Autism Spectrum Disorder, [29] D. P. Kavadi, R. Patan, M. Ramachandran, and A. H. Gandomi,
A. S. El-Baz and J. S. Suri, Eds., Cambridge, MA, USA: Academic, 2021, ‘‘Partial derivative nonlinear global pandemic machine learning predic-
pp. 165–183, doi: 10.1016/B978-0-12-822822-7.00009-0. tion of COVID 19,’’ Chaos, Solitons Fractals, vol. 139, Oct. 2020,
Art. no. 110056, doi: 10.1016/j.chaos.2020.110056.
[11] S. Amador, A. Polo, S. Rotbei, J. Peral, D. Gil, and J. Medina,
[30] K. R. Resmi, P. N. P. Venkidusamy, G. Raju, B. George, and N. Thomas,
‘‘Data mining and machine learning techniques for early detection in
‘‘Machine learning-based classification of autism spectrum disorder across
autism spectrum disorder,’’ in Neural Engineering Techniques for Autism
age groups,’’ Eng. Proc., vol. 62, p. 12, Mar. 2024, doi: 10.3390/eng-
Spectrum Disorder. Cambridge, MA, USA: Academic, 2021, pp. 77–125,
proc2024062012.
doi: 10.1016/B978-0-12-822822-7.00006-5.
[31] S. R. G. Reddy, G. P. S. Varma, and R. L. Davuluri, ‘‘Deep neural network
[12] T. Eslami, J. S. Raiker, and F. Saeed, ‘‘Explainable and scalable machine
(DNN) mechanism for identification of diseased and healthy plant leaf
learning algorithms for detection of autism spectrum disorder using fMRI
images using computer vision,’’ Ann. Data Sci., vol. 11, no. 1, pp. 243–272,
data,’’ in Neural Engineering Techniques for Autism Spectrum Disorder.
Feb. 2024.
Cambridge, MA, USA: Academic, 2021, pp. 39–54.
[32] H. Taud and J. F. Mas, ‘‘Multilayer perceptron (MLP),’’ in Geomatic
[13] C.-H. Min, ‘‘Automatic detection and labeling of self-stimulatory
Approaches for Modeling Land Change Scenarios. New York, NY, USA:
behavioral patterns in children with autism spectrum disorder,’’ in
Springer, 2018, pp. 451–455.
Proc. 39th Annu. Int. Conf. IEEE Eng. Med. Biol. Soc. (EMBC),
[33] K. Berahmand, F. Daneshfar, E. S. Salehi, Y. Li, and Y. Xu, ‘‘Autoencoders
Jeju, Korea, Jul. 2017, pp. 279–282, doi: 10.1109/EMBC.2017.
and their applications in machine learning: A survey,’’ Artif. Intell. Rev.,
8036816.
vol. 57, no. 2, p. 28, Feb. 2024.
[14] E. Andrade, S. Portela, P. R. Pinheiro, L. C. Nunes, M. S. Filho,
W. S. Costa, and M. C. D. Pinheiro, ‘‘A protocol for the diagnosis of autism
spectrum disorder structured in machine learning and verbal decision
analysis,’’ Comput. Math. Methods Med., vol. 2021, pp. 1–14, Mar. 2021,
doi: 10.1155/2021/1628959.
[15] S. Raj and S. Masood, ‘‘Analysis and detection of autism spectrum
disorder using machine learning techniques,’’ Proc. Comput. Sci., vol. 167,
pp. 994–1004, Jan. 2020, doi: 10.1016/j.procs.2020.03.399. DURGA PRASAD KAVADI received the Ph.D.
[16] M. Ramachandran, R. Patan, A. Kumar, S. Hosseini, and A. H. Gandomi, degree from Jawaharlal Nehru Technological Uni-
‘‘Mutual informative MapReduce and minimum quadrangle classification versity, Kakinada, Andhra Pradesh. He is currently
for brain tumor big data,’’ IEEE Trans. Eng. Manag., vol. 70, no. 8,
a Professor with the Department of CSE–AIML
pp. 2644–2655, Aug. 2023.
& IOT, DRK Institute of Science and Technology,
[17] P. Rizwan, K. Suresh, and M. R. Babu, ‘‘Real-time smart traffic
Hyderabad, Telangana, India. He has published
management system for smart cities by using Internet of Things and big
25 papers in international journals and interna-
data,’’ in Proc. Int. Conf. Emerg. Technological Trends (ICETT), Oct. 2016,
pp. 1–7. tional conferences and book chapters in repute.
[18] A. Mujeeb, W. Dai, M. Erdt, and A. Sourin, ‘‘One class based He has edited books in the domain of machine
feature learning approach for defect detection using deep autoen- learning and deep learning. He has eight Indian
coders,’’ Adv. Eng. Informat., vol. 42, Oct. 2019, Art. no. 100933, doi: and foreign patents on his credit. He has contributed to a variety of
10.1016/j.aei.2019.100933. high-impact research works and has served as an editor and a reviewer for
[19] S. Arora and S. Singh, ‘‘Butterfly optimization algorithm: A novel international journals. His innovative contributions in AI, ML, DL, and NLP
approach for global optimization,’’ Soft Comput., vol. 23, no. 3, domains have been recognized by multiple awards and patents. His research
pp. 715–734, Feb. 2019, doi: 10.1007/s00500-018-3102-4. interests include machine learning, NLP, and AI.

194920 VOLUME 12, 2024


D. P. Kavadi et al.: Hybrid Machine Learning Model for Accurate Autism Diagnosis

VENKATA RAMI REDDY CHIRRA received the SAGAR YERUVA received the Ph.D. degree in
Ph.D. degree from the Department of Computer computer science and engineering from JNTU
Applications, National Institute of Technology, Hyderabad, in 2017. He is currently an Associate
Tiruchirappalli, Tamil Nadu, India. He is currently Professor with the Department of CSE–AIML &
a Senior Assistant Professor with the School IoT, VNR VJIET, Hyderabad, Telangana, India.
of Computer Science and Engineering, VIT-AP He has 21 years of teaching experience in
University, Andhra Pradesh, India. With more than engineering education. His teaching areas include
13 years of teaching experience, he has published data mining, machine learning, deep learning,
over 33 research articles in reputed international data analytics, and database management systems.
conferences and journals. His research contribu- He has published 45 research papers in interna-
tions focus on advanced algorithms in image processing and machine tional journals and conferences. He also holds four Indian patents. His
learning, making significant impacts in the field of AI. His research interests research contributions have focused on AI and data analytics, where he
include computer vision, digital image processing, and machine learning. has developed several innovative techniques in machine learning and deep
learning.

PALACHARLA RAVI KUMAR received the Ph.D.


degree from Andhra University, Visakhapatnam,
Andhra Pradesh, India. He is currently an Asso-
ciate Professor with the Department of CSE (AI
& ML), R.V.R. & J.C. College of Engineering,
Guntur, Andhra Pradesh, India. He has published
12 articles in international journals and eight
papers in international conferences. He has edited
books in the fields of machine learning and deep
learning and holds two Indian and foreign patents.
His research emphasizes innovative techniques in AI and network security,
where his contributions have garnered patents and high recognition. His LALITHA KUMARI PAPPALA received the
research interests include machine learning, NLP, AI, and network security. B.Tech. and M.Tech. degrees (Hons.) in computer
science and engineering from JNTU Kakinada,
Andhra Pradesh, India, and the Ph.D. degree from
the National Institute of Technology Warangal,
Telangana, India. She is an Assistant Professor (Sr.
SAI BABU VEESAM is currently pursuing the Gd-1) with the School of Computer Science and
Ph.D. degree with VIT-AP University, Amaravati, Engineering, VIT-AP University, India. She has
India. He has a robust background in IT and about 18 years of teaching experience. She has
academia, with over seven years of experience in published more than 20 research papers in refereed
software development and four years as an Assis- international journals and conferences. Her research interests include
tant Professor. He has contributed significantly to machine learning, deep learning, and image processing. She has received
the field through publications in esteemed inter- several best paper awards for her research contributions in international
national journals, conferences, and book chapters. conferences. She actively participates in research related to AI and image
His research interests include computer vision, processing.
machine learning, and artificial intelligence.

VOLUME 12, 2024 194921

You might also like