Skin Lesion Detection Using Deep Learning
Skin Lesion Detection Using Deep Learning
56 2022 ® Chandra and Hajiarbabi. This is an open access article licensed under the Creative Commons Attribution-Attribution 4.0 International (CC BY 4.0)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
with two image feature extractions, one for dermato- within the image; therefore, its ultimate classification
scopic input images and the other for macroscopic is unaffected. As a result, input images were randomly
input images. cropped and horizontal and vertical flips were used
to produce new samples under the same label as the
1.2. Partial multimodality classification original.
The researchers excluded the other two from the
complete network when only one image modality 1.5. Color augmentation
(macroscopic images or dermatoscopic images) and The images of skin lesions were gathered from vari-
information were supplied for classifying the images. ous sources and made using various devices. As a re-
Before passing it through the embedding network, sult, while using photographs for training and testing
the researchers generated only one feature vector any system, it is critical to scale the colors of the imag-
of image and combined it with the feature vector of es to increase the classification system’s performance.
metadata.
1.6. Data warping based on the knowledge of
1.3. Single image classification specialist
When there was only one image type for classifica- The clinicians diagnose the melanoma by seeing the
tion and there was no metadata, the image was sent patterns that surrounds the lesion. So, affine trans-
through the image feature extraction network, and the formations including distorting, shearing and scaling
extracted features were then transmitted via the net- the data can be helpful in classifying the images. As
work. In the testing phase, it came out that the meta- a result, warping is an excellent way to supplement
data variables of patients like age, sex and location data in order to improve performance and reduce
did not enhance precision for pigmented skin lesions overfitting in melanoma classification.
appreciably. As a result, it was concluded that avail- In [5] three classifiers named SVM, Random forests
able models rely substantially on tight image criteria and Neural Networks were used to classify the image
and may be unstable in clinical practice. Furthermore, dataset. The results showed that different augmenta-
selecting datasets may contain unintended biases for tions performed differently in this case. The neural
specific input patterns. networks performed best for classification task.
Using image representations produced from In image recognition nowadays, two basic types
Google’s Inception-v3 model, the proposed automa- of feature sets are routinely used [5]. The traditional
ted approach intends to detect the kind and cause of kind is based on what are known as “hand-crafted
cancer directly [3]. The researchers used a feed for- features”, which are created by academics with the
ward neural network having two layers with softmax goal of capturing visual aspects of a picture, such as
activation function in the output layer to perform texture or color. A new sort of feature set was just pre-
two-phase classification based on the representa- sented that was motivated by how brain decode ima-
tion vector. Two separate neural networks with the ges and derived from powerful Convolutional Neural
same representation vector were used to perform Networks. These new features beat “hand-crafted”
the two-phase classification. In phase one, the rese- features when combined with deep learning, and as
archers determined the type of cancer, whether it was a result, they are increasingly popular in computer vi-
malignant or benign, and in phase two, the resear- sion. The researchers proposed in this study to utilise
chers determined whether the cancer was caused by a mix of both sorts of features to classify skin lesions.
melanocytic or nonmelanocytic cells. The training da- “RSurf features” was extracted by the researchers
taset includes 2000 JPEG dermoscopic images of skin for image description. This feature set’s concept is
lesions, as well as ground truth values. The validation to divide the input image into “parallel sequences of
set had 150 photos, whereas the testing set contained intensity values from the upper-left corner to the bot-
600. The method identifies the images automatically tom-right corner”. The concept behind such extrac-
using Google’s inspection model and the image repre- tion technique is based on the texture unit model, in
sentation produced from the dermoscopic images. which an input image’s texture spectrum is defined.
This paper had two major contributions: first, The support vector machine with Gaussian kernel
the researchers offered a classification model that and standardized models was used in the first catego-
used Deep Convolutional Neural Network and rization. It estimated the class for a given input image
Augmentation of data to evaluate the classification of using RSurf features and LBPR=1,3,5. CNN characte-
skin lesion images [4]. Second, the researchers sho- ristics were used in the second SVM classifier, which
wed how data augmentation could be used to overco- had a Gaussian kernel and standardized predictors.
me data scarcity, and the researchers looked at how The researchers used the AlexNet to extract the featu-
varying numbers of augmented data samples affect res. The researchers chose the label with the greatest
the performance of different models. The researchers absolute score value for each image that was tested.
used three methods of data augmentation in melano- As a result, the final classifier incorporated both ap-
ma classification. proaches, including hand-crafted characteristics
as well as features acquired from the deep learning
1.4. Geometric augmentation method.
The semantic interpretation of the skin lesion is It’s critical to distinguish malignant form of skin
preserved by the position and scale of lesion mark lesions from benign form of lesions like “seborrheic
Articles 57
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
keratosis” or “benign nevi”, and good computerized the lesion kinds. The researchers examined linear
classification of skin lesion imagess can help with kernel as well as radial basis function (RBF) kernels
diagnosis [6]accurate discrimination of malignant and found that the RBF kernel performed marginally
skin lesions from benign lesions such as seborrheic better. In the final models, the researchers used 265
keratoses or benign nevi is crucial, while accurate one-vs-all multiclass SVM classifiers with radial basis
computerised classification of skin lesion images is function kernels. The major participation of the me-
of great interest to support diagnosis. In this paper, thod is that it proposed a hybrid deep neural network
we propose a fully automatic computerised method method for classifying the skin lesion that extracted
to classify skin lesions from dermoscopic images. deep features from data images using multiple DNNs
Our approach is based on a novel ensemble scheme 395 and assembles features in a support vector ma-
for convolutional neural networks (CNNs. The re- chine classifier that produced very accurate results
searchers offer a completely automated method for without needing exhaustive pre-processing or lesion
classifying skin lesions from dermoscopic pictures area segmentation. The results demonstrated that
in this study. For tasks like object detection and na- combining information in this way improves discri-
tural picture categorization, deep neural network al- mination and is complimentary to the 525 individual
gorithm, particularly convolutional neural networks, networks.
outperformed alternative methods. The well-establi- The “attention residual learning convolutional
shed CNN architectures were used to attain great ac- neural network (ARL-CNN)” model for skin lesion
curacy. Transfer learning had been applied in medical categorization is proposed in this research[7]. The
field for other tasks too. The pipeline of the model researchers combined a residual learning framework
includes the data pre-processing, fine-tuning of neu- for training a deep convolutional neural network
ral networks and then the features were extracted, with a small number of data images with an attention
these features were fed into the SVM model. Then learning mechanism to improve the DCNN’s particu-
the outputs of the model were assembled together. lar representation capacity by allowing it to object
To facilitate improved generalization ability when te- more on “semantically” important regions of dermo-
sted on additional datasets, the researchers kept the scopy images (i.e. lesions). The suggested attention
data pre-processing minimum in suggested pipeline. learning mechanism made full usage classification-
Only one task-specific pre-processing step (related to -trained DCNNs’ innate and impressive self-attention
skin lesion categorization) was included in the tech- capacity, and it could work under any deep convolu-
nique, while the rest were typical pre-processing sta- tional neural network framework without appending
ges to prepare the pictures before fed them to model. any additional “attention” layers, which was impor-
Normalization, resizing, and color standardization tant for the learning problems having small dataset as
were employed. VGG16, which included 16 weight in the problem in hand for classifying the images. In
layers, the number of convolutional layers were 13, terms of implementing this technique, each s o-called
and 3 FC layers were employed. In addition to vgg16, ARL block might include both “residual learning”
the powerful ResNet-18 and ResNet-101, which have and “attention learning”. By stacking numerous ARL
varying depths, were used for extracting the featu- blocks and training the model end-to-end, an ARLCNN
res. To solve the three class classification (Malignat model with any depth could be created. The rese-
Melanoma /Sabrohtic Kerosis/ benign nevi) classi- archers tested the suggested ARLCNN model using
fication, the 190 final fully connected layers and the the ISIC-skin 2017 dataset, and it outperformed the
last layer which was output layer of all pre-trained competition. The research contributed in many
networks were eliminated and replaced by two new aspects. The researchers proposed a novel ARLCNN
fully connected layers of 64 nodes and 3 nodes. The model for accurate skin lesion categorization,
new fully connected layers’ weights were chosen at which incorporates both residual learning and at-
random using a normal distribution with average tention learning methods. The researchers created
value of zero and a standard deviation of [195 0.01]. an effective attention framework that took full ad-
The researchers froze the weight values of the ear- vantage of DCNNs’ inherent “self-attention” abili-
liest layers of the deep models. By freezing the we- ty, i.e., instead of learning the attention mask with
ights, the issue of overfitting was addressed. Also extra layers, the researchers used the feature maps
freezing the weights can be helpful in decreasing the acquired by upper layer as the attention mask of
training time. The researchers froze the early lay- a lower level layer; and the researchers achieved
ers up to the 4th layers and up to the 10th layers for “state-of-the-art” lesion classification accuracy on
AlexNet and VGG16, respectively, and up to the 4th the ISIC-skin 2017 dataset by using only one model
residual block and 30th residual blocks for ResNet-18 with 50 layers, which was foremost for CAD of skin
and ResNet-101 respectively. To avoid overfitting of cancer.
the little training dataset, the researchers used data Researchers addressed two problems in the paper.
augmentation to boost the training size artificially. The first task entailed classifying skin lesions using
As key data augmentation approaches, the resear- dermoscopic pictures. “Dermoscopic” images and the
chers used rotation of 90 degrees, 180 degrees and metadata of patients were used for the second task
270 degrees and they also employed horizontal flip- [1]. For the first job, the researchers use a variety of
ping. A ternary SVM classifier was trained using the CNNs to classify dermoscopic images. The deep lear-
collected deep features and the related labels defining ning models for task 2 are divided into two sections:
58 Articles
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
Articles 59
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
Train set images Validation set images Test set images F1 Score =
(
2 * precision * recall )
9714 100 201
precision + recall
The training set was augmented with the images 3.2. L2 Regularization
generated by introducing the changes into original L2 regularization is applied to models to combat over-
dataset. The images were horizontally flipped, the fitting. Overfitting is a term used to describe a situa-
rotation range was 90 degrees and the zoom range tion where training loss decreases but the validation
was kept 0.2. the images were also rescaled before loss increases. In other words, the model is well fitted
feeding into the model. on training data but it is not predicting accurately for
validation data. The model is not able to generalize.
3.1. Evaluation Metrics This is serious because If model is not generalizing
Following evaluation metrics were used to evaluate then it will not produce accurate results when it will
the models. be implemented in real world scenario. There are dif-
The Receiver Operator Characteristic (ROC) curve ferent techniques that can be used to control overfit-
is metric that is used to evaluate the classification ting. Regularization is used to control the complexity
models of machine learning. It presents a probabili- of model. When regularization is added, the model
ty curve that plots the true positive rate against false not only minimize the loss, but it also minimizes the
positive rates at many threshold values. It basically di- complexity of model. So, the goal of machine learning
stinct the ‘signal’ from the ‘noise’. The formula of true model after adding regularization is,
positive rate and false positive rate are as follows:
minimize(Loss(Data|Model)) + complexity(Model))
true positive
True positive rate = The complexity of the models used in paper was
true positive + false negative
minimized by using L2 regularization. The formula
of L2 regularization is the sum of square of all the
false positive weights,
ÐÐÐÐÐÐÐÐÐ =
false positive + true negative
2
L2 regularization term = w = w 12 + w 22 + + w n2
2
The Area Under the Curve (AUC) measures the
performance of the classifier by evaluating its abi- In the models, two layers of L2 regularization was
lity to differentiate between classes. It is utilized as used before the final softmax layer.
the summary of Receiver Operator Characteristic A total of 12 experiments were conducted by using
(ROC) curve. The higher value of AUC means that the different optimizers. The three optimizers Adam,
classification model is performing accurately in diffe- RMSprop, Stochastic Gradient Descent were used in
rentiating the negative and positive classes. DenseNet and inception v3. Moreover, experiments
Accuracy is also an evaluation metric that is used were conducted with augmentations and without au-
for evaluation of classification models. The accura- gmentations to see whether the augmentations are
cy value represents the fraction of predictions that useful in our case or not. The details of the experi-
model predicts correctly. The formula of accuracy is: ments are given below
60 Articles
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
3.2.2. Without Augmentation to increase the data because deep learning models
These experiments were also conducted without aug- requires huge data to learn. The training accuracies
mentations to see if the model can generalize well of experiments without augmentations were more
without augmentations. than 90%. Although L2 regularization were also ap-
plied to overcome the issue of overfitting. In case of
1. DenseNet [RMSPROP] Inception V3, very interesting figures were produced.
2. DenseNet [ADAM] Adam optimizer achieved 75% test accuracy in 22
3. DenseNet [SGD] epochs while stochastic gradient descent produced
4. Inception v3 [RMSPROP] same accuracy in 60 epochs. Moreover, the RMSprop
5. Inception V3 [ADAM] optimizer produced 76% accuracy in 30 epochs. So
6. Inception V3 [SGD] for the given problem, stochastic gradient descent
optimizer with inception V3 is not a suitable choice.
The experiments without augmentations showed that
4. Discussion RMSprop is a better choice. It gave 81% accuracy in
Early detection of skin lesion can save many lives and 38 epochs. While Adam and SGD run for same number
Artificial Intelligence is helping the medical science in of epochs and gave 80% and 79% accuracies respec-
serving this purpose. Convolutional Neural Networks tively. Another interesting thing was to see the per
are useful in medical imaging. The two state of the art class AUC-ROC of Dermatofibroma class. It showed
architectures of convolutional neural network were AUC-ROC around 60% in experiments without aug-
experimented in this paper and they both showed mentations. And in experiments with augmentations,
good results overall. It turned out that DenseNet it showed AUC-ROC scores around 70%. While this
performed better then Inception V3 in classifying was not the pattern in DenseNet experiments. All
the images into different classes. In order to evalu- the AUC-ROC scores are around 90%. It shows that
ate the model performance, AUC-ROC curves, preci- Inception V3 architecture did not learn the pattern of
sion, recall, F1 score and accuracy were employed. Dermatofibroma class very efficiently.
The reason of choosing multiple metrics was that the The loss function that was used for experiments
data was highly imbalance. So, accuracy metric alone was focal loss which performed well. It was used to
might be a deceiving metric. The data imbalance issue overcome the class imbalance issue. In deep learning,
was resolved by using focal loss. The per class ROC it is important to have equal distribution of the clas-
curves of classes in the DenseNet model are better ses. If data entries of one class are more than others,
than the Inception V3 model. Also the overall accura- the model will learn efficiently the class with more
cy, precision, recall and F1 Score figures are better in examples. And when the model is deployed, it pre-
DenseNet model. The models were run for 60 epochs dicts every image belong to that class. The data was
and early stopping criteria was applied. The reason highly imbalance. There are multiple ways to solve
of applying early stopping was to ensure that model this issue. One method is to use weighted loss. But
does not overfit. If the model is trained on too many recently, another loss function as introduced called
epochs, there are chances that model will overlearn focal loss. it focuses the class with few examples more
the pattern. And if the model is run for few epochs, than the class with more number of examples. It sho-
the model can underfit i.e. it won’t learn the pattern wed good performance overall. In the given problem,
completely. Since number of epochs is a hyperparam- the Vascular class had very few examples in training
ter, so it has to be tuned. Normally, the model is run dataset. focal loss focused on this class and on test da-
with huge number of epochs and when it stops learn- taset almost all experiments accurately classified the
ing, it is stopped. In keras, the early stopping callback Vascular class.
is provided and that was used in experiments. In the The accuracies are better in DenseNet then
result tables, termination epoch is also provided. The Inception V3. Moreover, the grad activation maps
purpose of mentioning termination epoch was to see show that the two models have seen different places
which optimizer converge on what epoch. The idea to classify the same image. The focus region of incep-
was to see that which optimizer converge relatively tion V3 is different from the focus region of DenseNet.
fast. In Dense Net model, Adam converged on 39th Inception V3 model misclassified Vascular class as it
epoch and gave accuracy of 79% but stochastic gra- is shown in figure. While we cannot know from grad
dient descent converged on 35th epoch and was 81% activation maps the reason of focusing the certain
accurate. It means that stochastic gradient descent region, this is the black box to understand. But the-
performed better in both perspectives. It gave higher se visualizations can help medical staff in knowing
accuracy with less epochs. In the experiments where that why the model is predicting the certain image
augmentations were not applied, the accuracies were to belong to certain class. Because the explainability
comparatively better than experiments with augmen- of the machine learning models is important espe-
tations. But the experiments without augmentations cially in the sensitive area of medical science. It will
faced overfitting problem. this is because the data help medical staff to understand the model predic-
was very less and the model learnt the training data tion without knowing much about artificial intelli-
but did not generalize well on testing data. The pur- gence, machine learning and convolutional neural
pose of applying augmentations in deep learning is networks.
Articles 61
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
5. Future Work The per class AUC-ROC is highly accurate. The results
In future the focus would be to improve the model of other experiments are following,
accuracy by experimenting other models like AlexNet
and vgg-16. The accuracy of the models will be com- Tab. 1. DenseNet Comparison Table
pared and the best accurate model will be chosen.
Also, the skin lesion follows a certain hierarchy that Optimizer Accuracy Precision Recall F1- Termination
SCORE epoch #
can be incorporated in future research. The hierarchy
of skin lesion goes like: Adam 0.79 0.82 0.79 0.79 39
In this paper, the seven classes from the third level RMSProp 0.80 0.80 0.80 79 35
are incorporated. Total of eight classes belongs to the SGD 0.81 0.82 0.81 0.81 34
third level but in the dataset of skin lesion 2018, the
seven classes are given. In future the focus would be
to consider the complete hierarchy. In the first stage, Tab. 2. Per class AUC-ROC [DenseNet, RMS Prop, focal
the first level will be classified, in second phase, the Loss, with Augmentations]
second level will be classified and in the third level all
Class AUC-ROC
the seven classes will be classified by the model.
Actinic 0.957
Carcinoma 0.98
6. Figures and Tables
Dermatofibroma 0.985
Melanoma 0.921
Nevs 0.962
Seborrheic 0.958
Vascular 1.0
62 Articles
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
Tab. 6. Per class AUC-ROC [DeseNet, Adam, focal Loss, Tab. 9. Inception V3 comparison table
without Augmentations]
Optimizer Accuracy Precision Recall F1- Termination
Class AUC-ROC Score epoch #
Actinic 0.965 Adam 0.75 0.78 0.75 0.75 22
Carcinoma 0.979 RMSprop 0.76 0.71 0.76 0.73 30
Dermatofibroma 0.869 SGD 0.75 0.74 0.75 0.74 60
Melanoma 0.924
Nevs 0.94
Seborrheic 0.957
Tab. 10. Per class AUC-ROC [Inception Adam, focal Loss,
Vascular 1.0
with Augmentations]
Class AUC-ROC
Tab. 7. Per class AUC-ROC [DenseNet, RMSProp , focal
Actinic 0.887
Loss, without Augmentations]
Carcinoma 0.959
Class AUC-ROC
Dermatofibroma 0.859
Actinic 0.946
Melanoma 0.791
Carcinoma 0.986
Dermatofibroma 0.982 Nevs 0.92
Melanoma 0.905 Seborrheic 0.911
Nevs 0.96 Vascular 0.99
Seborrheic 0.956
Vascular 1.0
Articles 63
Journal of Automation, Mobile Robotics and Intelligent Systems VOLUME 16, N° 3 2022
AUTHORS
Rajit Chandra – Computer Science Department,
Purdue Fort Wayne, Fort Wayne, 46805, USA,
E-mail: [email protected].
*Corresponding author
64 Articles