Final Year Project Report-2
Final Year Project Report-2
Submitted by
of
BACHELOR OF TECHNOLOGY
IN
MAR/APR – 2025
OPTIMIZING DIABETIC RETINOPATHY
DETECTION USING DEEP LEARNING
Submitted by
BACHELOR OF TECHNOLOGY
IN
MAR/APR - 2025
I
KIT - KALAIGNARKARUNANIDHI INSTITUTE
OF TECHNOLOGY
(AN AUTONOMOUS INSTITUTION)
Affiliated to Anna University, Chennai
Accredited by NAAC with ‘A’ Grade | Accredited by NBA(AERO,CSE,
ECE, EEE, MECH and MBA)
BONAFIDE CERTIFICATE
Certified that this project report “OPTIMIZING DIABETIC RETINOPATHY
DETECTION USING DEEP LEARNING” is the bonafide work of “AKASH
VISHNU R (711521BAD003), HARI PRASANNA M (711521BAD018),
RAVEE RAGHUL C (711521BAD044)” who carried out the project work
under my supervision.
SIGNATURE SIGNATURE
Ms.R.KAVITHA, Dr.C.DEEPA M.E., Ph.D.,
SUPERVISOR HEAD OF THE DEPARTMENT
Assistant Professor / AI&DS Associate Professor & Head
Department of Artificial Intelligence Department of Artificial Intelligence
and Data Science, and Data Science,
KIT-Kalaignarkarunanidhi Institute of KIT-Kalaignarkarunanidhi Institute of
Technology, Technology,
Coimbatore- 641 402. Coimbatore- 641 402.
II
INTERNAL EXAMINER EXTERNAL EXAMINER
II
ACKNOWLEDGEMENT
Owing deeply to the supreme, we extend our sincere thanks to God almighty who
has made all things possible.
We extend our heart full gratitude toward our revered Founder Chairman Thiru
Pongalur N. Palanisamy, Vice chairperson Mrs. P. Indhu Murugesan and
Executive Trustee Mr. A. Suriya all other trust members for having provided us
with necessary infrastructure to undertake this project.
We are grateful to our Chief Executive Officer Dr. N. Mohan Das Gandhi and
our beloved Principal Dr. M. Ramesh and academic Dean Dr. K. Ramasamy for
the facilities provided to complete this project work.
We wish to extend our gratitude to Dr. C. Deepa, Associate Professor and Head
of the Department of Artificial Intelligence and Data Science and our project
guide Ms. R. Kavitha and project coordinator Dr. R.P. Narmadha for their
constant support and encouragement.
We thank our beloved parents for their constant support and blessings. Finally, we
would like to thank the teaching and non-teaching staff members of the
Department of Artificial Intelligence and Data Science for their suggestions and
guidance.
III
ABSTRACT
serious complication of diabetes that can lead to blindness if not detected and treated
early. Leveraging the power of deep learning, the model utilizes multi-modal retinal
the system captures a more comprehensive view of the retinal structure, allowing for
technology that excels in both local feature extraction and global pattern recognition.
This combination ensures that the model not only identifies fine-grained details,
associated with different stages of DR. In addition to its diagnostic accuracy, the
model is designed for real-time use on edge devices such as smartphones and PCs,
the screening process and improving early detection rates, this project offers a
4
TABLE OF CONTENTS
7.1 CONCLUSION
7.2 FUTURE ENHANCEMENT
APPENDIX
SOURCE CODE
REFERENCES
6
LIST OF FIGURES
3.2 Python 22
7
LIST OF ABBREVIATIONS
ACC Accuracy
AI Artificial Intelligence
DR Diabetic Retinopathy
ML Machine Learning
P Precision
7
R-CNN Region-based Convolutional Neural Network
SE Sensitivity
SP Specificity
8
9
CHAPTER 1
INTRODUCTION
1.1 INTRODUCTION
This project introduces an innovative deep learning model for the diagnosis
of diabetic retinopathy, a vision-threatening complication of diabetes. By
leveraging the strengths of EfficientNetB0 and ResNet50 architectures, the model
enhances diagnostic accuracy through the integration of both fine-grained local
features and high-level global representations from retinal fundus images.
EfficientNetB0 contributes to the system’s efficiency and scalability, while
ResNet50 offers deep feature extraction capabilities for complex lesion
identification. The model classifies diabetic retinopathy into five distinct stages,
from No DR to Proliferative DR. To ensure transparency, interpretable AI
technique Grad-CAM are employed, allowing healthcare providers to visualize the
decision-making process and build trust in the automated diagnostic system.
9
Fig 1.1 Retinal Fundus image
10
1.1.2 ARTIFICIAL INTELLIGENCE
12
1.1.4 DEEP LEARNING
13
vision applications, contributing to advancements in fields like medical imaging,
autonomous vehicles.
14
1.1.6 GRAD-CAM
15
1.3 OBJECTIVE OF THE PROJECT
The report is organized as follows: Chapter 1 introduces the project and its
key terms. Chapter 2 presents a literature review, examining current relevant
initiatives. Chapter 3 details the software and hardware requirements for the
project. Chapter 4 outlines the existing system, highlighting its disadvantages, and
proposes the new system. Chapter 5 describes the functional design and block
diagram of the project. Chapter 6 concludes the report, discussing potential
improvements for the project.
16
CHAPTER 2
LITERATURE SURVEY
2.1 INTRODUCTION
17
detection holds immense promise in improving both the quality and accessibility of
eye care.
18
2.2 LITERATURE REVIEW
2.2.1 Data Diversity in Convolutional Neural Network Based Ensemble Model
for Diabetic Retinopathy:
Recent studies on CNN-based ensemble models for diabetic retinopathy
detection highlight the critical role of data diversity in enhancing model
performance and improving generalization. This ensemble learning techniques,
including popular methods such as bagging and boosting, are designed to combine
the strengths of multiple models to increase predictive accuracy. The effectiveness
of ensemble learning is significantly influenced by the diversity within the training
datasets, which helps the models learn from variations in image quality, patient
demographics, and the different stages of disease severity. By training CNN
models on diverse datasets, researchers can ensure that the models are better
equipped to generalize across different populations and real-world scenarios,
making the systems more reliable and robust. Incorporating data diversity has
proven to be an effective strategy for addressing issues like overfitting, where a
model becomes too specialized in its training data and performs poorly on unseen
data. Data augmentation techniques, such as rotation, scaling, flipping, and
cropping, are commonly applied to artificially expand the dataset, introducing
variability and ensuring that the CNN models learn more comprehensive features.
However, despite the clear benefits, there are still challenges associated with data
diversity in CNN-based ensemble models for diabetic retinopathy detection.
19
models combine the strengths of convolutional neural networks (CNNs) with other
20
machine learning techniques like support vector machines (SVMs), decision trees,
or advanced feature extraction methods to achieve better predictive performance.
CNNs are highly effective in capturing spatial features from retinal images, but
when combined with additional machine learning classifiers or feature selection
algorithms, the overall model gains the ability to harness both low-level and high-
level image details. This multi-layered approach enhances the precision of diabetic
retinopathy classification by allowing different techniques to focus on distinct
aspects of image processing, resulting in more robust and nuanced predictions. For
instance, CNNs can be employed to extract deep spatial features from retinal
fundus images, such as identifying subtle patterns or lesions, while classifiers like
SVMs or decision trees can process these features to make more accurate and
discriminative predictions regarding the stage of DR. Studies have shown that
hybrid models outperform standalone CNNs because they incorporate additional
feature selection mechanisms, which can better capture the complex nature of
diabetic retinopathy. To further enhance the robustness of these models, data
augmentation techniques like image rotation, flipping, zooming, and scaling are
commonly applied, artificially increasing the diversity of the training dataset and
helping the model generalize better to unseen data. Additionally, ensemble
methods, which combine the predictions of multiple CNNs or integrate CNNs with
traditional machine learning models, have demonstrated success in improving the
generalization of DR classification, especially across diverse patient populations
and varying image qualities.
21
programs prioritize patients based on their risk of developing severe stages of DR.
22
Traditionally, DR screening programs have treated patients uniformly, with little
distinction in terms of the urgency of their cases, potentially leading to
inefficiencies in managing healthcare resources. By employing deep learning,
particularly convolutional neural networks (CNNs), researchers are now able to
analyze retinal fundus images and stratify patients into different risk categories.
This approach allows for a more targeted screening process, where patients at
higher risk of progressing to severe or proliferative DR are identified early and
given priority for further evaluation and treatment. This risk-based prioritization
not only improves patient outcomes by enabling earlier intervention but also
optimizes resource allocation, ensuring that healthcare providers focus their
attention on those who need it most. Multicenter prospective studies have been
pivotal in demonstrating the efficacy of these deep learning models. By training
CNNs on large and diverse datasets sourced from multiple clinical centers,
researchers can create models that are more generalizable and adaptable to
different populations. These multicenter datasets capture a wide range of variations
in disease presentation, imaging quality, and patient demographics, allowing the
deep learning models to automatically assess disease progression with high
accuracy. Through this automated risk stratification, the models can flag high-risk
patients who require immediate attention, while low-risk patients can be monitored
with less frequent check-ups. This stratified approach enhances the efficiency of
DR screening programs, potentially reducing the burden on healthcare systems and
preventing vision loss in high-risk individuals.
23
networks, particularly in leveraging the rich details captured by ultra-wide
field (UWF)
24
imaging. UWF imaging represents a significant advancement in retinal
photography, allowing for the visualization of a much larger portion of the retina
compared to traditional fundus images. This capability provides clinicians with
more comprehensive data that can be critical for the early detection and accurate
classification of retinal diseases, as it encompasses more anatomical features and
potential pathologies. Traditional imaging techniques often miss peripheral lesions
and abnormalities that can be crucial indicators of diseases like diabetic
retinopathy, age-related macular degeneration, and retinal vein occlusion. By
utilizing UWF images, researchers can enhance the data available for machine
learning models, ultimately leading to improved diagnostic performance, and
detection of the diabetic retinopathy. The dual path network approach is
particularly effective in processing UWF images because it enables the
simultaneous analysis of both global and local features. This capability is vital
when examining retinal images where macroscopic features, such as overall retinal
architecture, must be assessed alongside microscopic details, such as small lesions
or vascular changes. By capturing and integrating these varying scales of
information, dual path networks ensure that models do not overlook subtle yet
significant pathological features that might be crucial for an accurate diagnosis.
Complementing this approach, the multi-scale enhanced attention mechanism
dynamically adjusts the model's focus across different regions of the image at
various resolutions. This allows the model to prioritize and highlight critical areas
during classification, thereby enhancing its sensitivity to pathological changes.
25
retinal fundus images. As a multi-scale transfer learning framework, it effectively
utilizes
26
pre-trained convolutional neural network (CNN) models to extract both low-level
and high-level features from retinal images, streamlining the classification
process. This approach addresses one of the main challenges in developing deep
learning models for medical imaging: the need for extensive, annotated datasets
that can be both costly and time-consuming to compile. By leveraging transfer
learning, the MTRA CNN can build upon existing knowledge captured in pre-
trained models, which have already learned to identify relevant features from vast
amounts of data. This allows for the effective classification of glaucoma without
necessitating the same level of extensive, specific annotation, thus reducing the
barrier to entry for deploying advanced diagnostic tools in clinical settings.A
notable feature of the MTRA CNN framework is its multi-scale capability, which
enhances the model’s performance by capturing features at different resolutions.
This is particularly important for glaucoma classification, where the pathological
signs are often subtle and vary in scale. For example, the optic nerve head and the
retinal nerve fiber layer are critical indicators of glaucoma, and detecting changes
in these structures requires a nuanced understanding of both localized details and
broader patterns. By integrating multiple scales, the MTRA CNN is able to
perform a more thorough analysis of retinal structures, ensuring that it can detect
and classify glaucoma effectively. Recent studies have highlighted that this multi-
scale approach results in superior performance metrics compared to traditional
CNN models, particularly in terms of classification accuracy and sensitivity. This
improvement is crucial, as it can lead to earlier detection and intervention, which
is vital for preserving vision in glaucoma patients.
28
classification accuracy and computational efficiency. DenseNet, short for Dense
Convolutional Network, is a sophisticated deep learning architecture characterized
29
by its unique connectivity pattern, where each layer is connected to every
subsequent layer in a feed-forward manner. This innovative structure allows for
efficient feature reuse throughout the network, significantly reducing the number
of parameters needed while still enabling the model to learn and capture complex
features present in retinal fundus images. This is particularly advantageous in the
context of DR screening, where accurately identifying and classifying subtle
retinal abnormalities is essential for effective diagnosis and timely intervention. In
the realm of diabetic retinopathy screening, DenseNet has shown remarkable
performance in detecting critical retinal abnormalities, including microaneurysms,
hemorrhages, and exudates—key indicators that signal the presence and
progression of DR. These features, although often subtle and challenging to
discern, are crucial for determining the various stages of the disease, from mild to
proliferative forms. The architecture's efficient parameter usage not only enables it
to maintain high levels of accuracy but also makes it less demanding in terms of
computational resources when compared to other popular architectures like
ResNet or VGG. This efficiency is particularly important in clinical settings where
computational power may be limited or where rapid analysis of images is
required.
31
information. Low-level features such as textures and edges are vital for identifying
subtle abnormalities, while high-level features encapsulate more complex patterns
that may signify the presence of DR-related lesions, including microaneurysms
and hemorrhages. This dual-level feature extraction is crucial, as it enables the
model to recognize diverse manifestations of DR, which can vary significantly
among patients. The parallel structure of the CNNs facilitates simultaneous
processing of features at multiple scales, further enhancing the model's capability
to identify and classify a wide array of DR-related signs. This multi-scale analysis
is particularly beneficial in the context of fundus images, where variations in
lesion size and location can influence diagnosis. Following the feature extraction
phase, the ELM classifier is employed to categorize the images into various
severity levels of DR. ELM is renowned for its rapid learning speed and strong
generalization capabilities, which allows it to effectively utilize the features
extracted by the CNNs. Research has demonstrated that this hybrid model
consistently outperforms standalone CNN classifiers, providing not only improved
accuracy but also significantly faster training times. This efficiency is particularly
important in clinical settings, where timely diagnoses can lead to better patient
outcomes and more effective management of the disease.
32
across various retinal image datasets by learning from multiple related tasks, such
33
as distinguishing between different stages of DR or identifying other retinal
diseases. By enabling models to adapt quickly to new tasks with minimal data,
meta-learning frameworks offer a solution to the challenges of data scarcity and
heterogeneity in retinal imaging. One of the key advantages of these frameworks
is their emphasis on interpretability, which is vital for building trust in AI systems
within the medical community. In clinical settings, it is not enough for an AI
model to simply make a prediction; it must also provide clear, interpretable
explanations for its decisions, whether through visualizations (such as heatmaps)
or feature- based insights that highlight critical areas in the retinal images.
Techniques like Gradient-weighted Class Activation Mapping (Grad-CAM) are
often employed to make the model’s decision-making process more transparent,
allowing clinicians to understand why certain regions of the retina were flagged as
indicators of DR.
Recent research into the optical identification of diabetic retinopathy (DR) using
hyperspectral imaging (HSI) has highlighted significant breakthroughs in
improving early detection and diagnosis of the disease. Hyperspectral imaging is a
powerful technology that captures a broad range of the electromagnetic spectrum,
gathering detailed information about retinal tissues beyond what is possible with
traditional imaging methods like fundus photography. By capturing and analyzing
this vast array of wavelengths, HSI can detect subtle biochemical and structural
changes in the retina that are associated with the early stages of DR, such as
variations in oxygenation and blood flow. These subtle changes may be invisible
in conventional imaging techniques but are crucial for diagnosing DR at its
earliest stages, where intervention can prevent disease progression. Studies have
34
shown that HSI can differentiate between healthy and diseased retinal tissues by
identifying
35
specific spectral signatures that correlate with key retinal abnormalities, such as
microaneurysms, haemorrhages, and lipid exudates—important markers for DR
progression. The integration of hyperspectral imaging with advanced machine
learning techniques, particularly convolutional neural networks (CNNs), has
further enhanced the accuracy and efficiency of DR detection. CNNs are well-
suited for analysing the complex, high-dimensional data generated by HSI,
allowing the model to automatically learn and recognize patterns in spectral
information that correspond to early signs of DR. This combination of HSI and
deep learning has enabled the automated identification of microscopic features
that may precede visible symptoms of DR, such as microaneurysms and exudates,
allowing for earlier and more precise diagnosis. Moreover, research suggests that
hyperspectral data can be used to generate quantitative assessments of retinal
health, potentially providing new metrics for tailoring individualized treatment
plans based on a patient’s unique retinal profile.
36
CHAPTER 3
3.1 INTRODUCTION
A thorough understanding of the commissioning and operational requirements of
the proposed system is crucial, with a key focus on gaining a comprehensive grasp of
the hardware and software needs for the project.
Interactive Code Execution: Jupyter Notebook allows users to run code in real-
time and view the output immediately, making it ideal for experimenting and
debugging.
37
Support for Multiple Languages: While primarily used for Python, Jupyter also
supports other programming languages such as R, Julia, and SQL via different kernels.
Rich Text Integration: With support for Markdown, users can include formatted
text, equations (via LaTeX), images, and links alongside code, making it useful for
creating comprehensive, well-documented reports.
Data Visualization: Jupyter integrates with numerous libraries such as Matplotlib,
Seaborn, and Plotly to generate dynamic and interactive visualizations directly within
the notebook.
3.3 PYTHON
39
programming, making it a flexible choice for a wide range of applications. Python is
extensively used in web development, data science, artificial intelligence, machine
learning, automation, and scientific computing due to its vast collection of libraries and
frameworks such as Django, Flask, Pandas, NumPy, and TensorFlow. Its cross-platform
compatibility and strong community support have contributed to Python's widespread
adoption, making it one of the most popular programming languages in the world today.
Python provides many useful features which make it popular and valuable from
the other programming languages. It supports object-oriented programming, procedural
programming approaches and provides dynamic memory allocation. We have listed
below a few essential features.
Python has a clean, readable syntax that makes it easier for beginners and developers
to write and understand code.
2) Interpreted language:
Python executes code line by line, allowing for immediate testing and debugging
without needing a compilation step.
40
4) Extensive standard library:
Python comes with a vast standard library that provides modules and functions
for tasks like file I/O, regular expressions, web development, and more.
5) Cross-platform:
6) Dynamic typing:
Python is dynamically typed, meaning you don’t need to declare variable types
explicitly. The interpreter determines the type at runtime.
Python has a rich ecosystem of libraries and frameworks like NumPy, Pandas,
Django, Flask, and TensorFlow, which extend its capabilities.
41
10) Integration capabilities:
Python can be easily integrated with other languages (like C/C++/Java) and
systems, making it versatile for building different kinds of applications.
3.3.2 NumPy
NumPy, short for Numerical Python, serves as the foundational library for
numerical computing in Python. Its primary offering is the `ndarray` object, an N-
dimensional array that allows for efficient storage and manipulation of homogeneous
data types. This array structure is designed to handle large datasets and perform
mathematical computations swiftly, making it indispensable for tasks involving
complex numerical calculations, particularly in fields like machine learning and image
processing.
One of the standout features of NumPy is its ability to perform element-wise
operations. This capability enables users to conduct mathematical operations on entire
arrays without needing explicit loops, significantly enhancing code readability and
performance.
42
3.3.3 Keras
3.3.4 PyTorch
43
your diabetic retinopathy detection project, you may need to try various model
architectures or modify hyperparameters quickly to achieve optimal performance.
44
3.3.5 NumPy
NumPy, short for Numerical Python, serves as the foundational library for
numerical computing in Python. Its primary offering is the `ndarray` object, an N-
dimensional array that allows for efficient storage and manipulation of
homogeneous data types. This array structure is designed to handle large datasets
and perform mathematical computations swiftly, making it indispensable for tasks
involving complex numerical calculations, particularly in fields like machine
learning and image processing.
One of the standout features of NumPy is its ability to perform element-wise
operations. This capability enables users to conduct mathematical operations on
entire arrays without needing explicit loops, significantly enhancing code readability
and performance. For example, when processing images for diabetic retinopathy
detection, you can easily manipulate pixel data, applying filters or transformations
uniformly across the dataset. Such operations are not only more intuitive but also
executed at speeds unattainable with Python’s built-in lists.
3.3.6 Pandas
45
information. With its intuitive indexing and slicing capabilities, Pandas simplifies
46
data retrieval and manipulation, making it easy to filter datasets based on specific
criteria or to group data for statistical analysis. For instance, you can effortlessly
merge patient metadata with corresponding image data, creating a cohesive dataset
that supports multi-modal data fusion, a critical aspect of your proposed system.
3.3.7 Matplotlib
Matplotlib is a powerful and versatile plotting library for Python that provides
a comprehensive suite of tools for creating static, animated, and interactive
visualizations in a wide variety of formats. As one of the foundational libraries for
data visualization in the Python ecosystem, it is particularly essential for projects
like diabetic retinopathy detection, where understanding data visually can lead to
better insights and model performance.
At the core of Matplotlib is the ability to generate high-quality plots with
minimal code. The library's primary interface, `pyplot`, allows users to create a
range of plots—line graphs, scatter plots, histograms, and more—using simple
function calls. This simplicity makes it accessible for both beginners and advanced
users, allowing for rapid prototyping and exploration of data. For instance, in your
project, you can quickly visualize the distribution of pixel intensities in retinal
images or compare model performance metrics across different experiments.
3.3.8 Seaborn
47
detection project, Seaborn can play a vital role in exploring complex relationships
within your datasets
48
and presenting findings in a clear and engaging manner.
One of Seaborn’s key strengths is its ability to handle data frames directly,
which aligns seamlessly with Pandas. This integration allows users to create
complex visualizations with minimal code, making it easier to explore and analyze
the data. For instance, you can quickly generate scatter plots, heatmaps, and box
plots that illustrate the relationships between features in your dataset, such as pixel
intensity distributions and their correlation with diabetic retinopathy severity.
3.3.9 OpenCV
3.3.10 Scikit-Image
50
analysis. In the context of your diabetic retinopathy detection project, scikit-image
can play a crucial role in enhancing the quality of retinal images and extracting
meaningful features.
One of the standout features of scikit-image is its comprehensive set of image
processing algorithms, which include filtering, morphology, segmentation, feature
extraction, and more. These algorithms are designed to work seamlessly with
NumPy arrays, allowing for efficient processing of image data. For example, you
can apply Gaussian blurring to reduce noise in retinal images, or use edge detection
techniques to highlight the boundaries of features that may be indicative of diabetic
retinopathy.
51
CHAPTER 4
PROJECT DESCRIPTION
The Convolutional Neural Networks (CNNs) are a powerful tool for analyzing
retinal images, especially when identifying features associated with diabetic
retinopathy, a common complication of diabetes that affects the eyes. CNNs work
by leveraging layers of filters to capture patterns and features from the retinal
images, such as blood vessel structures, exudates, and hemorrhages, which are
indicative of diabetic retinopathy. The multi-layer architecture of CNNs enables
the automatic extraction of low-level and high-level features, making them ideal
for image classification tasks. Given the complexity of retinal images and the
subtlety of diabetic retinopathy features, CNNs provide a robust solution for
automating this diagnostic process.
52
integrating these perspectives, the system becomes more resilient to overfitting and
variance. Techniques such as bagging or boosting can be used to average or weight
53
predictions from various CNN architectures, reducing the risk of errors from
individual models. By aggregating predictions, ensemble models offer more stable
and reliable outputs, which are particularly useful in medical image analysis, where
minimizing false positives and negatives is crucial.
DISADVANTAGES
54
4.2 PROPOSED SYSTEM
The proposed system for the project aims to advance the current approach of
diagnosing diabetic retinopathy by integrating hybrid models, and Explainable AI
(XAI) technique. This integration helps capture multiple perspectives of the retinal
structure, improving diagnostic accuracy by leveraging the strengths of different
imaging modalities. By fusing these data sources, the system can identify subtle
patterns that might not be visible in a single type of image, leading to more precise
and robust diagnoses.
At the heart of the system is a hybrid model that combines Convolutional
Neural Networks (CNNs) with Transformer architectures. The CNN component
excels at extracting detailed local features from retinal images, such as small
lesions, microaneurysms, and abnormal blood vessel formations. Meanwhile, the
Transformer component is designed to recognize global patterns and relationships
across the entire image, such as large-scale structural changes in the retina. By
combining these two approaches, the hybrid CNN-Transformer model can capture
both fine-grained details and overall patterns, which enhances its ability to detect
diabetic retinopathy at various stages with high accuracy.
To ensure the system’s transparency and trustworthiness, Explainable AI (XAI)
techniques like Grad-CAM (Gradient-weighted Class Activation Mapping) is
employed. Grad-CAM helps highlight the regions of the image that the model
focuses on when making predictions, offering visual explanations of why certain
areas are considered indicative of DR.
4.2.1 EfficientNetB0
4.2.2 ResNet50
57
CHAPTER 5
FUNCTIONAL DESIGNS
5.1 FUNCTIONAL DIAGARAM
In this stage, the system gathers retinal images and other relevant medical data.
The data collected typically includes:
59
5.1.3 Explainable AI with Grad-CAM
60
Input Retinal Images
Image Preprocessing
Image Segmentation
Image Enhancement
The process begins with the acquisition of retinal images. These images are
typically captured using fundus cameras or other imaging techniques such as Optical
Coherence Tomography (OCT). These images are vital for the detection of diabetic
retinopathy, as they allow the system to analyze the structural and vascular changes
61
in the retina caused by the disease.
62
5.2.2 Image processing
This stage involves segmenting the image into different regions of interest.
For diabetic retinopathy detection, the primary focus is on segmenting retinal
structures such as:
64
makes these features more easily detectable for analysis.
In this step, important parameters and features are extracted from the
enhanced and segmented image. These features may include:
This stage focuses on identifying abnormal features that are directly linked to
diabetic retinopathy. Key abnormalities include:
Microaneurysms: Small bulges in blood vessels that are one of the earliest
signs of DR.
Hemorrhages: Blood leaking into the retina.
Exudates: Fatty deposits from damaged blood vessels.
Neovascularization: Growth of new, fragile blood vessels in later stages of
the disease. Extracting these features helps in determining the severity and
progression of the disease.
The severity analysis leads to the final classification of the disease into different
stages:
66
CHAPTER 6
IMPLEMENTATION AND RESULT
6.1 IMPLEMENTATION
This allows the user to select a file from their system. The dialog lists
directories, files, and their respective metadata (name, type, size, and modification
date). Once a file is selected, its path is typically processed by the code for further
actions such as loading or analyzing the file.
67
The output displays the result of a diabetic retinopathy detection model. An input retinal
image was preprocessed and passed to a trained model (64x3-CNN.keras). The model
predicted the condition as "No DR" (No Diabetic Retinopathy), confirming the
absence of diabetic retinopathy in the provided image. The processed image is also
visualized.
68
69
CHAPTER 7
7.1CONCLUSION
70
7.2 FUTURE ENHANCEMENT
7.2.1 Integrating OCT scans, genetic information, and blood biomarkers
alongside retinal images offers a more holistic view of the patient’s
condition. This multimodal approach enhances the accuracy of diabetic
retinopathy detection and diagnosis.
7.2.3 Models can be customized for specific patient groups (age, ethnicity, or
diabetic history), leading to more accurate and individualized
predictions. This adaptation allows the AI to provide tailored diagnoses
based on patient-specific data.
7.2.6 Combining edge computing for real-time analysis and cloud resources
for heavy computations improves the system's performance on resource-
constrained devices like smartphones, making it more efficient and
scalable.
71
APPENDIX
A. READ ME FILE
A.2 DEVELOPER
72
SOURCE CODE
import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix, accuracy_score
import tensorflow as tf
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.layers import Input, Dense, GlobalAveragePooling2D, Dropout,
BatchNormalization
from tensorflow.keras.models import Model
from tensorflow.keras.applications import ResNet50, EfficientNetB0
from tensorflow.keras.optimizers import Adam
from tensorflow.keras.callbacks import EarlyStopping, ReduceLROnPlateau, ModelCheckpoint
import cv2
from ipywidgets import FileUpload, VBox, Label
from IPython.display import display
val_generator = val_datagen.flow_from_dataframe(
val_df,
directory=images_dir,
x_col='id_code',
74
y_col='diagnosis',
target_size=IMAGE_SIZE,
batch_size=BATCH_SIZE,
class_mode='categorical',
shuffle=False
)
# Combine features
combined = tf.keras.layers.concatenate([resnet_features, efficientnet_features])
# Final output
final_output = Dense(NUM_CLASSES, activation='softmax')(combined)
# Set callbacks
early_stop = EarlyStopping(monitor='val_loss', patience=10, restore_best_weights=True)
reduce_lr = ReduceLROnPlateau(monitor='val_loss', factor=0.1, patience=5, min_lr=1e-6)
checkpoint = ModelCheckpoint('ensemble_model.h5', monitor='val_loss', save_best_only=True)
# Calculate steps
steps_per_epoch = train_generator.samples // BATCH_SIZE
validation_steps = val_generator.samples // BATCH_SIZE
# Classification Report
print("Classification Report:")
print(classification_report(true_classes, pred_classes, target_names=stage_mapping.values()))
# Confusion Matrix
cm = confusion_matrix(true_classes, pred_classes)
plt.figure(figsize=(8,6))
sns.heatmap(cm, annot=True, fmt='d', xticklabels=stage_mapping.values(),
yticklabels=stage_mapping.values(), cmap='Blues')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.title('Confusion Matrix')
plt.show()
# Overall Accuracy
accuracy = accuracy_score(true_classes, pred_classes)
print(f"Validation Accuracy: {accuracy * 100:.2f}%")
if layer_name is None:
# Find last convolutional layer
77
for layer in reversed(model.layers):
if len(layer.output_shape) == 4:
layer_name = layer.name
break
if layer_name is None:
raise ValueError("No convolutional layer found in the model.")
grad_model = Model(
[model.inputs],
[model.get_layer(layer_name).output, model.output]
)
conv_outputs *= pooled_grads
heatmap = tf.reduce_sum(conv_outputs, axis=-1)
# Prediction function
def predict_stage(img_path, model):
img = Image.open(img_path)
img = img.resize(IMAGE_SIZE)
img_array = np.array(img) / 255.0
img_array = np.expand_dims(img_array, axis=0)
preds = model.predict(img_array)
pred_class = np.argmax(preds, axis=1)[0]
pred_prob = preds[0][pred_class]
stage = stage_mapping[pred_class]
# Display function
def display_prediction(file):
if file is not None:
# Read the image file
content = file['content']
img = Image.open(io.BytesIO(content))
img_rgb = img.convert('RGB')
img_resized = img_rgb.resize(IMAGE_SIZE)
img_array = np.array(img_resized) / 255.0
img_input = np.expand_dims(img_array, axis=0)
# Predict
preds = ensemble_model.predict(img_input)
79
pred_class = np.argmax(preds, axis=1)[0]
pred_prob = preds[0][pred_class]
stage = stage_mapping[pred_class]
# Display Grad-CAM
img_bytes = io.BytesIO()
img_rgb.save(img_bytes, format='PNG')
img_array_for_gradcam = np.expand_dims(np.array(img_rgb.resize(IMAGE_SIZE)),
axis=0) / 255.0
heatmap = get_gradcam(ensemble_model, img_array_for_gradcam)
# Display Grad-CAM
plt.figure(figsize=(6,6))
plt.imshow(superimposed_img.astype('uint8'))
plt.axis('off')
plt.title('Grad-CAM')
plt.show()
output = Label(value="")
def on_upload_change(change):
80
for name, file in upload.value.items():
display_prediction(file)
upload.observe(on_upload_change, names='value')
display(VBox([Label("Upload an eye image for Diabetic Retinopathy Detection:"), upload]))
81
REFERENCES
2 3 4
[1] 1Inamullah, Saima Hassan, Nabil A. Alrajeh, Emad A. Mohammed,
5
Shafiullah Khan, Data Diversity in Convolutional Neural Network Based
Ensemble Model for Diabetic Retinopathy, Journal of Biomimetics, Vol 8, 2023
[2] 1Ghulam Ali, 2Aqsa Dastgir, 3Muhammad Waseem Iqbal, 4Muhammad Anwar,
5
Muhammad Faheem, A Hybrid Convolutional Neural Network Model for
Automatic Diabetic Retinopathy Classification From Fundus Images, IEEE Journal
of Translational Engineering in Health and Medicine, Vol 11, 2023
[3] 1Ashish Bora, 2Richa Tiwari, 3Pinal Bavishi, 4Sunny Virmani, 5Rayman
Huang, 6Ilana Traynis, 7Greg S. Corrado, 8Lily Peng, 9Dale R. Webster, 10Avinash
11 12 13
V. Varadarajan, Warisara Pattanapongpaiboon, Reena Chopra, Paisan
Ruamviboonsuk, Risk Stratification for Diabetic Retinopathy Screening Order
Using Deep Learning: A Multicenter Prospective Study, Journal of Translational
Vision Science and Technology, Vol 12, 2023
[4] 1Fangsheng Chen, 2Shaodong Ma, 3Jinkui Hao, 4Mengting Liu, 5Yuanyuan Gu,
6
Quanyong Yi, 7Jiong Zhang, 8Yitian Zhao, Dual-Path and Multi-Scale Enhanced
Attention Network for Retinal Diseases Classification Using Ultra-Wide-Field
Images, IEEE Access, Vol 11, 2023
[5] 1Sanli Yi, 2Lingxiang Zhou, 3Lei Ma, 4Dangguo Shao, MTRA-CNN: A Multi-
Scale Transfer Learning Framework for Glaucoma Classification in Retinal Fundus
Image, IEEE Access, Vol 11, 2023
82
[6] 1S. Rama Krishna, 2Naresh Cherukuri, 3Y. Dileep Kumar, 4R. Jayakarthik, 5B.
Nagarajan, 6Allam Balaram, 7G. Divya Jyothi, Convolutional Neural Networks for
Automated Diagnosis of Diabetic Retinopathy in Fundus Images, Journal of
Artificial Intelligence and Technology, Vol 3, 2023
2 3
[7] 1Sheena Christabel Pravin, Sindhu Priya Kanaga Sabapathy, Suganthi
Selvakumar, 4Saranya Jayaraman, 5Selvakumar Varadharajan Subramani, An
Efficient DenseNet for Diabetic Retinopathy Screening, International Journal of
Engineering and Technology Innovation, Vol 13, 2023
[8] 1Md. Nahiduzzaman, 2Md. Robiul Islam, 3Md. Omaer Faruq Goni, 4Md.
Shamim Anower, 5Mominul Ahsan, 6Julfikar Haider, 7Marcin Kowalski, Diabetic
retinopathy identification using parallel convolutional neural network based feature
extractor and ELM classifier, Elsevier - Expert Systems With Applications, Vol
217, 2023
[9] 1Maofa Wang, 2Qizhou Gong, 3Quan Wan, 4Zhixiong Leng, 5Yanlin Xu,
6
Bingchen Yan, 7He Zhang, 8Hongliang Huang, 9Shaohua Sun, A fast interpretable
adaptive meta-learning enhanced deep learning framework for diagnosis of diabetic
retinopathy, Elsevier - Expert Systems with Applications, Vol 244, 2023
[10] Ching-Yu Wang, 2Arvind Mukundan, 3Yu-Sin Liu, 4Yu-Ming Tsao, 5Fen-
1
83