J3-334 (KG25-4939) Edited+
J3-334 (KG25-4939) Edited+
Abstract
Diabetes mellitus is a widespread concern worldwide and the most serious microvascular
diseases that usually occur as a result are those of the eye, which include the retinopathy and
macular edema. Over the past decade, DR has emerged as a significant player in causing
vision disability and blindness. Provided that diabetes-related eye complications are promptly
diagnosed and handled, the consequences of them may be significantly improved and the
level of sugar in the blood can be kept on an adequate level. Nevertheless, the DR symptoms
are not consistent and may be complicated; thus, doctors may spend a lot of time to diagnose
[Link] of the approaches to detecting and classifying DR on fundus retina photographs
that is taken into account in the paper is the one which relies on CNNs and deep learning. All
the experimental data used in the present study was taken at the Department of
Ophthalmology at Xiangya No. 2 Hospital at Changsha in China. The sample of cases is not
considerable and the information included in this dataset is imbalanced. That is why a system
was made that can be used to rectify the variety and excellence of the information utilized in
the training by normalizing and creating [Link], many CNNs such as "ResNet"-101,
"ResNet"-50, and "VGGNet"-16 were employed to ascertain the phases of DR. "ResNet"-101
outperformed the other models by getting 98.88% accuracy and losing 0.3499 during training
and 0.9882 during testing. The model was checked on datasets such as HRF, STARE,
DIARETDB0, and XHO, which contain 1,787 examples and resulted in an average accuracy
of 97%, making it higher than existing methods on the same subject. As a result, using this
proposed model enhances DR detection accuracy more than "ResNet"-50 and "VGGNet"-16,
making it promising for DR screening in health services.
1. Introduction
Both types of diabetes, namely diabetes type 1 and type 2, are said to be factors that give rise
to the microvascular disease, known as diabetic retinopathy. Due to such a disease the
structure and functioning of the retina become impaired and this happens to be one of the
common causes of people becoming blind or losing their vision in the world. According to
epidemiologic reports, close to 33 per cent of diabetics are affected by DR and long-term
diabetics are almost on the verge of acquiring this condition. It has been estimated that the
prevalence of people affected by DR may even touch 191 million by 2030 [1,2].
Blindness caused by diabetes is largely preventable and an early intervention as well as the
detection of its impact is beneficial [3,4] Individuals with diabetes that is not controlled well
should also undergo screening of DR annually and those that present with the condition
should allow the condition to be screened more frequently as the guidelines outline. During a
screening, the quality pictures of the retina are taken and provided to the ophthalmologists to
examine thoroughly to make sure that they do not find traces of any retinal damage.
In the absence of treatment, there is an initial development of mild non proliferative stage
diabetic retinopathy which tends to progress up to proliferative diabetic retinopathy (PDR).
The development of DR is a confirmation of the significance of the usage of the right
screening and diagnostic procedures in order to detect the disease in the easiest treatable
stage.(Figure 1)
The prevalence of diabetes in the world has led to an expansion in the need of individuals to
be screened through the use of an ophthalmologist which poses a significant challenge in the
provision of specialized treatment to people. Due to the rise in the number of patients,
healthcare systems require enormous quantities of medical personnel and funds to attend to
their patients, hence patients take long in seeing an eye doctor [5, 6,7]. Consequently, it
becomes evident that the improved automated tools of diagnostics will be useful to
ophthalmologists or will serve as autonomous tools to diagnose disabilities in vision. In the
recent years, deep learning (DL) methods based on artificial neural networks have
contributed to the successful discovery and classification of diabetic retinopathy (DR). Such
models are highly sensitive and precise to a great extent as a human being. Also, DL may
assist in detecting other eye diseases which are harmful to vision, i.e., diabetic macular
edema, glaucoma, and age-related macular degeneration [8], [9], [10],[11].
"The classification is performed with the help of such models by using CNN architecture as
"ResNet"-101, "ResNet- 50, and "VGGNet-16". Both sets of data each model is constructed
on are tested and evaluated against training accuracy, training loss, and testing accuracy. The
method assists clinicians to ascertain patients quickly, saves the clinicians time and brings
about improved management of diabetic retinopathy in cases of the public patients. The paper
recommends a deep learning approach of CNNs to address the issue of diabetic retinopathy
(DR) classification. The widespread use of CNN architectures can be explained by many
successes of image classification overall and within ImageNet Challenge in particular.
Most health experts apply CNNs due to their effectiveness when it comes to the identification
of critical features. Unlike the earlier procedures, CNNs eliminate the procedure of
identifying crucial features in a manual way; hence it saves on time and costs. Above all, they
do not require much preprocessing, depend less on special features that have to be manually
engineered, and discover valuable patterns that exist in images automatically. In addition,
CNNs do not simply process the calculations effectively but can succeed in carrying out
accurate tasks in recognizing the images. They have greatly contributed to better
performances in image recognition using numerous benchmark datasets.
As far as we know, it is the first work in which the classification of DR into symptomatic and
asymptomatic types using different CNN architecture, i.e., "ResNet"-101, "ResNet"-50, and
"VGGNet"-16, on a proprietary dataset to be collected in association with Xiangya Second
Ophthalmic Hospital in Changsha, China, during the March-October 2016 period. The most
recent advances of deep learning have also enhanced the adaptability of CNN to huge
datasets". This way, we seek a more systematic overview of the power of various CNN
models to correctly identify DR and offer a fully automated solution to the problem of
bringing the vision impairment burden to a minimal level because of delayed diagnosis.
State-of-the-Art Performance with CNNs: This paper is published to show that CNNs-based
models can also provide state-of-the-art performance in the detection of DR. Critical image
features of disease grading are the information learned autonomously by the networks and
interpreted and applied to the expert ophthalmologic knowledge. The CNN architectures are
rigorously tested and finer points of the extracted features are taken into account focusing on
their interpretability and clinical value.
This paper aims at examining the accuracy of three well-known Convolutional Neural
Network (CNN) architectures ResNet-101, ResNet-50, and VGGNet-16 to detect and identify
the minor visual characteristics related to the various stages of Diabetic Retinopathy (DR). A
comparative analysis offers the study the potentials of each of the networks to identify
associated fine-grained variations in retinal fundus images. The intended purpose is to
identify the best architecture to be deployed practically in the task of medical images
analysis, especially in automated classification of Stages of DR.
Criterion to Compare Models and Select Models:
In order to compare the three CNN models, extensive analysis is provided through the major
performance indicators as such, classification performance, training performance, and
generalization capability. The given comparison helps understand which is the most effective
model to classify DR having found the right balance between the predictive performance and
the practicality of the model in terms of computing.
Organization of paper: The rest of the paper is organized in the following way(Section 2
provides a comprehensive review of the available literature on DR classification with CNNs,
Section 3 provides the methodology that was used in this study, Section 4 gives the set of
experiments, including dataset characteristics, preprocessing, model training processes and an
in-depth discussion of the results obtained by each CNN model, Section 5 concludes this
paper with a conclusion and a discussion of possible future directions)
2. Related Work
Initial attempts at the detection of diabetic retinopathy (DR) have involved mainly on the
processing of the retinal image into basically anatomical and pathological elements such as
blood vessels, microaneurysms, fovea, exudates, hemorrhages, and the optic disc. These
elements are looked at in order to evaluate DR severity. To cite just one example, in [15] a
multi-facing two-dimensional Gaussian-matched filter was suggested to identify retinal
vasculature, and in [16] Sinthanayothin et al. delocalized the optic disc on the basis of high
predominance of local average gray variance. Morphological methods were used to locate
microaneurysm in the image of a fluorescein angiography by Baudoin et al. [17]. These
techniques present a problem of limited applicability to low quality images and do not
classify DR severity directly, even though they work well in localization of retinal structures.
Automatic classification of DR has also explored usual processes of machine learning. These
methods typically rely on the manual extraction of the features followed by the classification
of the images using the customary classifiers. A set of descriptors of different types of
retinopathy was designed using modeling with the use of scale-invariant feature transform
(SIFT) [18]. The Bag-of-Visual-Words (BoVW) paradigm was used in the research [19,20] to
unify low-level data as Speeded-Up Robust Features (SURF) and mid-level description with
the help of the semi-soft encoding. This kind of approach has been very efficient as there are
areas under the receiver operating characteristic (ROC) curves of 97.8 and 93.5 percent in
exudates and red lesions respectively [21], [22].
To enhance representation, Seoud et al. [23,24] presented the probability of lesions as a
probability map through the combination of lesion location, size and probability and scored
0.393 on Free-Response Receiver Operating Characteristic (FROC), which is higher than the
previous records. SVMs were used to detect DR using features extracted in vascular as well
as exudate regions [25], [26], 27], yet they are extremely sensitive to the quality of images
and extremely variable to large amounts of data. Other morphological filtering and watershed
transformation based methods have been applied as well to detected optic discs [28,29]
though these show a reasonable sensitivity and predictive values on testing small datasets.
"In recent years, the advent of deep learning—particularly convolutional neural networks
(CNNs)—has revolutionized the field of computer vision, including medical image analysis,
CNN-based models have shown superior performance in feature extraction and classification
tasks, including blood vessel segmentation and retinal lesion detection" [30,31]. LeNet-5
[32], one of the earliest CNN architectures, was adapted for vascular segmentation. However,
early CNN applications suffered from limitations, such as reliance on small and low-quality
datasets and the need for expert-defined features.
The next breakthrough of CNN-based image classification was published as AlexNet [33,34],
and won the 2012 ImageNet Large Scale Visual Recognition Challenge (ILSVRC). This
achieved success, leading to further more sophisticated architectures, such as VGGNet
[35,36], GoogleNet [37] and ResNet [38]. Such architectures are among the best to have
enhanced classification in many areas. In this respect, whereas in [39], CNN models were
applied to find small differences in the DR pictures with an accuracy in classification being
95.68%.
Other researchers like [40], saw the use of two deep learning models which include; a
CNN512 based model that was used to run a five class DR classification on APTOS and
DDR datasets, with an accuracy of 84.1 and 88.6 respectively; and a model based on
YOLOv3 used to localize the presence of a lesion, which achieved a mean average precision
(mAP) of 0.216. Also proposed [41] an SVM-based method of classifying DR in the three
classes namely normal, non-proliferative and proliferative based on features extracted in
fundus images.
There have also been CNN based network architectures that have been customized to the DR
grading. In another example, [42] employed a four-layer deep CNN to classify DR as normal,
mild and severe. The multiscale lesion detection done with scale-invariant representational
model was shown to be highly generalizable in [43]. A VGG16-based CNN achieved 0.95
sensitivity and 0.75 accuracy, on a 5,000 image validation set in [44], trained on the Kaggle
DR dataset [45]. These techniques, although they have a more oriented classification, are
mostly inefficient in the localization of the important pathological areas.
To address this limitation, class activation mapping (CAM) was introduced in [46] to
generate spatial attention maps using weighted activations after global average pooling [47].
This technique was later adapted for regression tasks in [30], using a modified VGG16
architecture without fully connected layers to enhance lesion localization.
Other recent methods combined deep learning and image processing in a fully automatic DR
diagnosis method. An example is given by [48] that suggested a mixed-detection of glaucoma
and DR using artificial neural networks and handcrafted features. In [49], many
preprocessing operations were used such as contrast-limited adaptive histogram equalization
(CLAHE), morphological operations and Canny edge detection to increase the visibility of
vessels before being classified.
Figure ( 2 ) this research points out the main areas where research needs to be done in DR
detection with CNNs. Experts are trying to solve these issues by increasing the availability of
suitable datasets. coming up with effective ways to process data and designing simple but
effective deep learning models for real use in medicine.
Figure 2. Studying the limitations in the research of Retinopathy by segmenting the lesions
with CNN.
3. Methods
Data augmentation and preprocessing were both applied on the laboratory data in different
ways to enable deep learning models to perform more efficiently and to be generalizable.
These operations were intended to deliver more and diverse training samples and cause the
model to cooperate better with the pictures of any quality. Subsequently, procedures known
as CNN were applied to process the images and perform a screening of diabetic retinopathy
(DR), and the outcomes of multiplying accuracy of their classification were documented.
Considering that the quality of images determines the accuracy of a classification to a great
extent, the corresponding pre-processing was initiated to bring the data to the standard form
and make it more efficient. Images of worse quality may account to errors that decrease the
effectiveness of a CNN-based model. Furthermore, the data contained the images of the
retina in several such scenarios as the images of different people, age groups, and light levels.
Due to these aspects, some pixel intensities differed without linking to the diabetic
retinopathy level.
The difficulties were minimized with preprocessing the images with the extensive sequence
of functions the OpenCV library offers. Specifically, the geometric transformations were
performed via cv2. warpAffine and cv2. warpPerspective, whereas the filters were applied to
improve the appearance of an image were denoising ones. The preprocessing involved two
main steps only and they were cropping and resizing. 1) Cropping of each fundus image was
done by maintaining only circular region of eye and getting rid of unwanted black areas and
marks in context with the patient. Due to this method, the significant parcel of the retina
remained intact. The images all were cropped, and then they were all put to 300 pixels by 300
pixels. Due to resizing, the data was standardized in terms of dimensions of the input and the
accuracy could be learned with regard to the resolution.
Original Fundus Image Dataset and Retinal Assessment Protocols
" "
The investigation was conducted based on data of the hospital Xiangya No. 2 Hospital
Ophthalmology (XHO), which was nationally known and used experts in diabetic retinopathy
monitoring and complications. Retina pictures were captured during March 2016 to October
2016 with the resolution of 1956 x 1934.
All fundus images were classified based on three factors namely (i) presence of diabetic
retinopathy (ii) existence of macular edema and (iii) grading capability of the image. The
cases were identified as symptomatic or asymptomatic with international scales of diabetes
severity of the disease accepted among clinical practitioners. All of them were divided into
two categories non-gradeable and gradeable pictures. Only those pictures which could be
graded, were taken as ones to use in the model.
The trained ophthalmologists who have more than ten years of experience in grading DR
made all the annotations to the images. Each picture was characterized by two
ophthalmologists to guarantee quality control. In the event of the diagnosis under debate, the
image was not retained in the set so as to ensure the ground truth labels could still be
compatible.
Following this rigorous grading process, a total of 1,607 images (607 symptomatic and 1,000
asymptomatic) were allocated for model development and validation. An independent test set
comprising 322 images (200 symptomatic and 122 asymptomatic) was used to evaluate
model performance, as summarized in Table 1.
Training data images (shown in Figure 3) were read and pre-processed into a TensorFlow-
GPU library. The image processing type of libraries fits exceptionally well in this library,
since it has very rich set of inbuilt functions which have efficient and accelerated
computations and therefore when implemented in GPU enabled environments, the
computations are more efficient.
One of the most important preprocessing presentation was to reshape the input data to
maximize the memory capacity and minimization of its computation. The unprocessed retinal
images which measure 1956 x 1934 pixels each require large memory space which might
cause more pressure to the RAM and hence slower execution time. In an attempt of
mitigating this limitation, all the images were scaled down to an average size of 300 x 300
pixels, thus considerably decreasing the workload without losing important visual
characteristics that are central to the classification. Figure 4b and 4c show the consequences
of preprocessing and data augmentation actions, respectively.
Figure 3. Examples of retinal picture frames. The two frames in the bottom row are from individuals
with diabetic retinopathy, whereas the first two frames in the top row are from healthy participants.
Deep learning (DL) models are based on lots of data in order to offer effective work.
Nevertheless, insubstantiality of data and image noise are some of the reasons which can
restrict the learning, and the performance of models. To eliminate these challenges, samples
have to be created through data augmentation such that they do not require an increase in the
training set through more picture collection. Consequently, the model is more dependable and
assists in its avoidance of overfitting.
In the course of the study, the augmentation was conducted with the help of the tools found in
Keras. Simple augmentation procedures on the images included rotation of images (in the
race to have the model learn to ignore the effect of image rotation), image flipping
(horizontally), scaling, clipping, and translating. They allow them to simulate a variety of
body shapes and images so as to improve they way the model is fitted to different patient
cases. Also pictures were cut down in such a way that the model concentrated on areas that
could contain vital information.
At the recent past, GANs have been used to supplement data since it is able to create realistic
images resembling authentic ones. The methodology in [52] demonstrates that it is possible
to expand the data in medical imaging using GANs since the information to be trained on can
be scarce.
The study is expected to be able to automatically detect and categorize diabetic retinopathy
(DR) identified in retinal fundus photographs using CNNs. Due to these special hierarchies
sets in CNNs, they can easily learn during the training process, and choose and extract
valuable information present in medical images.
The principal components of a CNN structure are convolutional, pooling, and fully connected
layers. The convolution layers are responsible to detect local patterns through the usage of
numerous optimized filters in the training process. They act similarly to conventional image-
processing kernels, though are setup automatically based on data. Max pooling and average
pooling assist to minimize data consumed by every layer and also minimize the probability of
overfitting due to the retention of critical features. Fully connected layers at the output of the
network outline the significant features of the network into classes arranged by the aforesaid
categories.
"VGGNet"-16
Despite its large size, "VGGNet"-16 remains a benchmark model for visual tasks due to its
simple architecture and consistent performance. In this study, "VGGNet"-16 was employed
for DR screening tasks. A modified and compressed version of the original architecture was
also utilized, offering the following advantages:
The core innovation of "ResNet" lies in its residual learning framework, which introduces
skip (shortcut) connections. These connections allow the network to bypass one or more
layers, effectively enabling the direct propagation of input features to deeper layers. This
mechanism alleviates the vanishing gradient problem, ensures information preservation,
and improves training convergence.
In traditional CNNs, increasing depth often leads to a degradation in both training and test
performance beyond a certain point. This is not necessarily due to overfitting, but rather a
result of optimization difficulties in very deep networks. "ResNet" addresses this degradation
by facilitating the learning of residual functions, where the network learns the difference
F(x)=H(x)−xF(x) = H(x) - xF(x)=H(x)−x, rather than the direct mapping H(x)H(x)H(x). This
formulation ensures that identity mappings are easily learned when additional layers do not
contribute positively, thereby enhancing model robustness and accuracy.
In this study, "ResNet"-50 and "ResNet"-101 were selected due to their superior
performance in deep feature representation and classification tasks. These models
significantly outperformed "VGGNet"-16 in diabetic retinopathy detection by achieving
better generalization on the validation set, especially in complex image classification
scenarios. (Figure 5)
Figure 5. Latest CNN (a) "ResNet"-101, (b) "ResNet"-
V1-50, and (c) VGG-16 Architecture.
ResNet-50 is a 50-layers deep convolutional neural network that has been developed to work
with the issues that deep network faces once they have become very elaborate. An ImageNet
based version of ResNet 50 was trained on more than one million images. That way the
feature representations that the model has learned are both robust and varied and can be used
to multiple image classification tasks, including medical images.
ResNet -2-101 is also based on ResNet model, but it encompasses a total of 101 layers. This
network may be trained on ImageNet where it classified images 1,000 distinct categories.
Due to a detailed structure, the "ResNet"-101 excels in categorizing images, numerous finely-
detailed components of which refer to.
In this research, retinal image datasets were evaluated with the help of VGGNet-16, ResNet-
50, and ResNet-101 checking whether they had signs of diabetic retinopathy (DR). They have
been selected due to their potential use and extensive application in various forms of
classification task. In Figures 5a-c, the architectures are depicted; they refer to the
architectures of b: ResNet-101, a: ResNet-50 and e: VGGNet-16. Also, the lab dataset
demonstrated that DR image analysis involving ResNet -101 is superior to resnet- 50 or
VGGNet- 16.
To In order to train and test the functionality of the proposed DR framework, the TensorFlow
framework in Python was applied with models: ResNet -101, ResNet -V1-50, VGG-16.
These models have been chosen due to their favourable performance, good scalability, and
ability to accomplish complex work of image classification.
The networks used input images of 300 x 300 pixels since that is what images taken in the
retina of the patient were taken to before they were then inputted. The sequence of
convolution and pooling layers is followed by placement ReLU layer after each group in
order to introduce non-linearity and increase the capability of the network to comprehend.
The aspect of hyperparameter tuning was instrumental in ensuring better models. As opposed
to the weights of the model, hyperparameters are not learned; in order to be chosen properly,
hyperparameters should be chosen on the basis of an empirical analysis. In this experiment,
different hyperparameters were fine-tuned, including learning rate, batch size, the number of
epochs, type of optimizer, and a dropout rate, to get the best results by classifying 100
percent of the data and reducing overfitting. Table 2 is a summary of the particular values and
settings of the following hyperparameters applied in the implementation of the suggested
models.
Table 2. Parameter tuning in the latest CNN models.
The data utilized in this research consisted of 1,607 retina images that were collected by a
number of hospitals participating in the research in Iraq. e.g., Baghdad, Basra, Mosul, and
Dhi Qar. The healthy group had 1,000 images and 607 images revealed presence of DR.
The files were broken into test data and training data so that it became easier to create and
analyze the model. Of the whole data we took 800 healthy images and 485 defective pictures
to train, and we reserved 200 healthy images and 122 defective pictures to be used in testing.
The separation of the cases to normal and pathological components allowed us to compare
the results of classification.
They passed through the processing of all pictures and converted them into tensors that can
be involved in the convolutional neural networks. The models were then established and
refined through the process of continuous training on them and tweaking the
hyperparameters. I have gone through the process of tuning the learning rate to begin at
0.0005 by setting the number of iteration in each training of each image to 4,000.
Table 2 contains a comprehensive overview of the hyperparameters used in the training of the
created deep learning models.
[Link]
The research database used in this research was retrieved by four medical institutions in Iraq:
Al-Firdaws Private Hospital (Baghdad), Ibn Al-Haytham Teaching Eye Hospital (Baghdad),
Al- Shami Eye Center- Erbil Branch and Al-Kafeel Specialist Hospital (Karbala). It had one
major aim, which was to differentiate two categories of the patients, those with symptoms
and those without symptoms of diabetes. First, the data was heterogeneous and some of the
images were not organized, unfiltered and most of them were irrelevant to diabetes. The
dataset was then cleaned and tabulated into two clear groups that included the diabetic
patients that showed symptoms of retinopathy and the diabetic patients that have not shown
these symptoms.
The preprocessing pipeline to which all images were subjected to is aimed at alleviating the
problems associated with noise and inadequate illumination. Image resizing and crop were
applied in the preprocessing process. Every ouanine was edited to take a focus on a middle
circle portion of the fundus that has the most diagnostic features. The size of the original
images was rather large (1956 x 1934 pixels), which caused serious difficulties in the
computational terms due to the size of the memory that was required. To deal with this, the
images were scaled down to 300 x 300 pixels which greatly increased speed of processing
and consuming fewer resources.
The total number of pictures that was available initially in the provided dataset was 607
symptomatic and 1000 asymptomatic. Because of the small sample size, it was synthetically
oversampled by data augmentation methods. Image rotation and synthetic noise generation
with the help of opencv were used as augmentation methods as presented in Table 3 as
mythough in Figure 6. Before the augmentation, the training set included 800 asymptomatic
images and 485 symptomatic ones and the test set 322 images. There was an increase in size
of the training set to 1982 and 1204 images in the asymptomatic and symptomatic class post-
augmentation respectively.
The training process was configured with an initial learning rate of 0.001, a momentum value
of 0.9, a batch size of 800, and 3000 training iterations.(Table 3)
Table 3. The number of images before and after the augmentation process.
In the first experiment, the VGG-16 convolutional neural network architecture (containing 16
layers) was performed to identify the input retinal fundus images. The architecture uses
several layers of convolutions in different scales and alternates with layers of the argmax-
pooling operation that decreases spatial sizes, eliminates overfitting, and speeding up the
training.
The VGG-16 model was able to successfully separate the training and testing classes as it is
stated in the Table 4. The overall accuracy of classification was, but, modest. The main cause
of this limitation has been associated with the vanishing gradient problem which makes the
model to have difficulties in learning within deep architectures. What is more, the
comparatively small depth of network did not seem to allow retrieval of the complex and
hierarchical features required to achieve the best diabetic retinopathy indicators classification
in the high-resolution retinal photos.
Figure 6. Representative examples from the training dataset illustrating the pre-processing steps
employed to augment the input data. (a) Retinal images exhibiting symptoms of diabetic retinopathy.
(b) Retinal images without any observable symptoms. (c) Augmented images generated through left
and right rotational transformations. (d) Augmented images enhanced with synthetic noise to increase
data variability and improve model generalization.
In the second experiment, the "ResNet"-50 architecture was used that has 50 convolutional layers with
the organizational structure built on residual blocks. Such residual links are particularly crafted in
order to overcome the problem of vanishing gradient experienced in deep neural networks. Every
residue block has an individual number of convolutional blocks and scales as shown in Figure 5
where the input of a given block is added to the output of the block. This is an architecture design that
improves information flow between layers and more abilities of feature extraction. The model
ResNet-50 demonstrated better classification performance than VGG-16, and it might be explained by
a residual learning framework and greater depth applied in that model.
To continue the increase of the classification performance, the last experiment had a deeper
architecture named ResNet-101 comprising 101 convolutional layers. The greater depth enables more
abstract and complicated aspects to be learned in the input images to give a better generalization in the
model. Although deeper networks are usually prone to overfitting, the residual of the ResNet-101
lessens this chance of overfitting. According to Tables 4 and 5, the accuracy on both the training and
testing data proved that ResNet-101 was the most accurate in classification.
In every experiment, the cross-entropy loss was used in all experiments so that it can direct the search
and modify model weights in the training process. The arithmetic mean of the results of the
classification was used to assess the model accuracy on the set of testing data.
The results of the CNN models namely, VGG-16, ResNet-50, and ResNet-101 are graphically
displayed in Figures 7a 9b". The graphs represent the iterations of the training process (x-axis) versus
the values of accuracy and loss (y-axis) of the training process. The performance and stability were
most promising in terms of "ResNet"-101 among the three models and showed more potential in
recognizing retinal images with better capability of distinguishing between the healthy and diabetic
retinopathy-affected retina images".
Altogether, the findings show that the offered framework, especially, the model known as the
"ResNet"-101, can be theoretical and practical to the classification of images captured by XHO
dataset better than the model of one known as the "ResNet"-50 and the one of the VGG-16, in terms
of both accuracy and stability.
(a)
(b)
Figure 7. Training performance of the "ResNet"-101 model, illustrating (a) the accuracy progression
and (b) the loss reduction over successive training iterations.
The results demonstrated by classification accuracies shown in Tables 4 and 5 indicate the
increase of performance due to data augmentation techniques used. It has been seen that CNN
models trained by augmented data perform better than models trained with non-augmented
data. This is majorly attributed to the fact that, due to the simulating real-world variation
characteristics of data augmentation, this strategy leads to a significant improvement in the
robustness of the model, and increases the generalization ability of the model when inferring.
Moreover, the XHO dataset exhibited an imbalanced distribution, with a greater number of
normal (non-DR) images compared to images displaying signs of diabetic retinopathy (DR).
To mitigate this imbalance, the dataset was initially categorized into two stages of DR and
subsequently partitioned into training and testing subsets. Preprocessing techniques were then
applied to enhance the quality of the retinal images—an essential step, as poor image quality
can significantly impair the feature extraction capabilities of deep learning models and,
consequently, reduce classification accuracy. Ensuring consistency in image quality and
enhancing salient image features are therefore critical to achieving reliable classification
outcomes.
(b)
Figure 8. Training performance of the "ResNet"-50 model, illustrating (a) the progression of
classification accuracy and (b) the reduction in loss over successive training iterations.
"The proposed diabetic retinopathy (DR) detection approach of the ResNet-101, was tested
on various openly available retinal image datasets, besides the in-house XHO data. In
particular, the assessment of the approach involved the High-Resolution Fundus (HRF)
dataset, comprising 30 images with a resolution of 3304 × 2336 pixels [53]; the Structured
Analysis of the Retina (STARE) dataset, containing 20 images of size 700 × 605 pixels [54];
the DIARETDB0 dataset, consisting of 130 images with dimensions of 1500 × 1152 pixels
[55]; and the MESSIDOR dataset, which includes 1200 images at a resolution of 1440 × 960
pixels" [56]. The XHO dataset used in this study comprises 1607 retinal images standardized
to 300 × 300 pixels.
For evaluation purposes, it was decided to group the testing datasets into those with no DR
signs and those that were confirmed as having DR by expert ophthalmologists. Among the
overall 2987 images labeled as DR, 1089 were used in the testing phase. The proposed
method was tested using normal retinal images to check if it can specifically identify cases
without DR.(Figure 9)
Table 6 summarizes all the datasets used in the study and mentions the number of images,
their image resolutions, and the distribution of the classes.
(a)
(b)
Figure 9. Both the accuracy and the loss curves for "VGGNet"-16 as it is trained.
The chosen models of neural networks were implemented and their performances were then
measured against a pack of steady postulated performance measures that are mostly used in
medical image classification systems. These were precision (ACC), specificity (SP),
sensitivity (SEN), as well as area under the receiver operating characteristic curve (AUC).
These (key) indicators were used to compute the positive predictive value (PPV) to evaluate
the model accuracy in the true positive cases. Besides, two more evaluation metrics were
used, including negative predictive value (NPV) and the F1 score (F1). All these metrics give
a hint of various performance of the model studied. Accuracy (ACC) estimates the
percentage of the right classification of the images compared with the number of samples,
providing an overall estimate of the success of classification. Specificity (SP) is the one that
assesses how well the model discriminates non-diseased (non-DR) cases, thus, its ability to
achieve low false positives. Sensitivity (SEN) or recall will measure the proportion of the
number of true levels (the number of actual DR cases that were actually positive) that the
model properly classifies into the positive category or the true positive, which reflects the
performance of the model to identify positive cases of diabetic retinopathy. The AUC gives
an overview of the diagnostic capacity of the model at all levels of the classification range.
In the meantime, PPV and NPV give the reflection of trustworthiness of affirmative and
negative prognoses respectively. F1 score, which is the function of both precision and recall,
is especially helpful in cases when the classes are unevenly distributed since it takes into
account such tradeoff between false negatives and false positives.
ROC curve eases the comprehension of the way of balancing sensitivity and specificity
during algorithm assessment. The AUC can assist us in describing how well the model may
discriminate. The PPV is a proportion of correctly labeled DR-positive images out of the
number of images that were granted as DR. In contrast, NPV represents the sum of the
number of non-DR images, which have been correctly identified as non-DR out of the entire
numbers that are predicted to be [Link] of these metrics join to help compare the
diagnostic results of the "ResNet"-101 algorithm to other algorithms. Music pieces are
measured using these formulas in mathematics.
TN
SP = (1)
TN + FP
TP
SEN = (2)
TP+ FN
(TN +TP)
ACC = (3)
(TN +TP+ FN + FP)
TP
PPV = (4)
TP+ FP
TN
NPV = (5)
TN + FN
F1score = 2 ∗
Precision∗Recall
(6)
Precision+ Recall
Those diabetic retinopathy (DR) images that are missing a diagnosis of DR are called false
negative (FN). to put it another way, FPs are errors where healthy images get labeled as
having DR. Correctly identified DR and non-DR images are called true positive (TP) and
true negative (TN).
"ResNet"-101 was checked with several measures, among them accuracy (ACC), sensitivity
(SEN), specificity (SP), and the F1 score. Two-thousand nine hundred and eighty-seven
fundus retinal photos taken from various public sources were used to assess the model’s
ability to detect DR. As shown in Table 7, the "ResNet"-101 model’s evaluation led to the
following ACC, SP, SEN, area under the ROC curve, and F1 score results.
For the "HRF, DRIVE, STARE, MESSIDOR, DIARETDB0, and DIARETDB1" datasets,
"
"AUC" values were measured, giving useful information about how well the model works on
various images . The graphs in Figure 10 clearly show how sensitivity and specificity differ
"
Results from the "ResNet"-101-based system were compared to those from recent top-
performing approaches to see how well it works. As demonstrated in Table 8, "ResNet"-101
does a better job at finding DR than the other two models and is likely a reliable diagnostic
method.
Table 7. Restated, the "ResNet"-101 CNN model was used to detect DR from four different
datasets.
Correctly
Dataset Test Accuracy (%) Sensitivity (%) Specificity (%) F1 Score (%) AUC
Detected
Images (%)
Al-Firdaws 200 196 98 97.14 97.65 97.36 98.55
Hospital
(Baghdad)
dataset
Ibn Al-Haytham 30 30 100 99.98 99.98 99.98 99.99
Teaching Eye
Hospital
(Baghdad)
datasets
Al-Shami Eye 20 19 95 94.96 95.11 95.03 95.04
Center - Erbil
branch datasets
Al-Kafeel Specialist 110 105 95.45 95.39 99.38 95.45 95.46
Hospital (Karbala)
datasets
Westeye Eye 349 347 99.42 99.45 99.38 99.41 99.42
Hospital
in )Sulaymaniyah
) datasets
Total 360 349 97 96.87 98.03 96.95 97.26
5. Conclusions
Automatic identification of diabetic retinopathy (DR) within fundus images could go a long
way in helping ophthalmologists to greatly reduce the time of diagnosis as well as improving
early diagnosis. Here, the recovery of preprocessing and regularizing of the retinal datasets,
both of which were received in the laboratory, were done in a comprehensive way in a bid to
ensure the deep learning-based classification system is optimized in performance. It was
found during initial inquiry that the small training dataset size negatives the model
performances and qualifies the requirements of sufficient data diversity and volumes.
Besides, this study proved that DR classification can be successfully tackled as binary
classification task to screen the population at the national level using convolutional neural
networks (CNNs). One of the considered structures, namely, ".ResNet"-101, ".ResNet"-50
and VGG-16 had the best efficacy. The accuracy of the model in testing was 98.82 percent
with corresponding training accuracy of 98.88 percent and a training loss of 34.99 percent".
Upon testing on five benchmark datasets "HRF, STARE, DIARETDB0, MESSIDOR, and
XHO", ResNet-101 achieved a classification accuracy of 98%, 100%, 95%,95.45 and 97
respectively. Such findings indicate that convolutional neural network models can be trained
to capture the distinguishing characteristics of DR using fundus images and that advanced
CNN can train the features of DR in fundus images.
Balance of the quality and classes of datasets is a critical factor in building powerful DR
detection systems. Future research ought to concentrate on synthesizing various datasets to
create a more capable training set so as to combat the imbalance in data. Although the
existing CNNs models perform very well when applied to binary classification, these models
cannot cope with the fine-grained classification, or the multi-stages of DR severity. As such,
advances will be based on extending these architectures by adding more layers and new
designs of architectures together with the ability to classify in real-time to allow more
accurate and scalable detection of DR in clinical situations.
References
[27] T. Joachims, Making large-scale SVM learning practical, Tech. Rep., Univ. of
Dortmund, 1998. [Online]. Available: [Link]
[31] Y. Yang, T. Li, W. Li, H. Wu, W. Fan, and W. Zhang, "Lesion detection and
grading of diabetic retinopathy via two-stages deep convolutional neural networks," in
Int. Conf. Med. Image Computing and Computer-Assisted Intervention (MICCAI),
Cham: Springer, 2017, pp. 533–540.
[36] K. Simonyan and A. Zisserman, "Very deep convolutional networks for large-
scale image recognition," arXiv preprint arXiv:1409.1556, 2014.
[37] C. Szegedy et al., "Going deeper with convolutions," in Proc. IEEE Conf.
Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, Jun. 2015, pp.
1–9.
[38] K. He, X. Zhang, S. Ren, and J. Sun, "Deep residual learning for image
recognition," in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR),
Las Vegas, NV, USA, Jun. 2016, pp. 770–778.
[39] S. Wan, Y. Liang, and Y. Zhang, "Deep convolutional neural networks for
diabetic retinopathy detection by image classification," Comput. Electr. Eng., vol. 72,
pp. 274–282, 2018.
[57] T. Li, Y. Gao, K. Wang, S. Guo, H. Liu, and H. Kang, “Diagnostic assessment
of deep learning algorithms for diabetic retinopathy screening,” Inf. Sci., vol. 501, pp.
511–522, 2019. [CrossRef]
[58] Y. P. Liu, Z. Li, C. Xu, J. Li, and R. Liang, “Referable diabetic retinopathy
identification from eye fundus images with weighted path for convolutional neural
network,” Artif. Intell. Med., vol. 99, p. 101694, 2019. [CrossRef]
[59] H. Jiang, K. Yang, M. Gao, D. Zhang, H. Ma, and W. Qian, “An interpretable
ensemble deep learning model for diabetic retinopathy disease classification,” in Proc.
2019 41st Annu. Int. Conf. IEEE Eng. Medicine and Biology Society (EMBC),
Berlin, Germany, Jul. 2019, pp. 2045–2048.