Using Deep Learning To Identif
Using Deep Learning To Identif
1. Introduction
In recent years, the rapid advancement of artificial intelligence (AI) and deep learning
(DL) technologies has significantly increased in fields from healthcare to entertainment.
Among these advances, generative adversarial networks (GANs) have captured great atten-
tion due to their remarkable ability to generate realistic images, videos, and audio. While
GANs have shown tremendous potential in several positive instances, such as content
creation and game development, they have also given rise to a concerning phenomenon
known as deepfakes. Deepfake technology is a growing concern because it can create
highly realistic but false media, leading to misinformation, identity fraud, and a loss of
trust in digital content. As deepfakes become easier to produce, the risk of their misuse
increases, making it urgent to develop reliable detection methods to prevent deception
and protect individuals, organisations and society at large. Deepfakes emerged as a conse-
quence of advances in deep learning, particularly GANs, which were first introduced by
Goodfellow et al. [1]. Deepfakes are synthetic media in which a person in an existing image
or video is digitally replaced with someone else, creating highly realistic and potentially
misleading content [2].
GANs consist of two neural networks, the generator and the discriminator, which
engage in a continuous adversarial process. The generator creates fake data that mimic
real data, while the discriminator attempts to distinguish between the real and fake data
and provides feedback to the generator [3]. This adversarial process continues until the
generator produces data that are indistinguishable from real data to the human eye [4].
Deepfakes generated by GANs offer great potential due to their creativity and innovation.
They have been used in the film industry for visual effects, in video games to create lifelike
characters, in education to stimulate historical events to make learning more fun and
accessible, and even in art to generate new forms of creative expression [5].
However, the same technology that enables these positive applications can also be
weaponised and used for malicious purposes. The widespread availability of deepfake
generation tools and increasingly sophisticated GANs have made it easier than ever for
anyone to create convincing fake media. Deepfakes have been used to spread misinforma-
tion, create non-consensual explicit content, and commit fraud, raising ethical and security
concerns [6]. Misusing deepfakes created by GANs poses a significant threat to digital
integrity and trust. Deepfake detection in real-world scenarios such as media and politics
presents challenges due to the high quality of synthetic content and the vast spread of
digital media. As deepfakes become increasingly sophisticated, the challenge of detecting
them grows. Traditional detection methods, which often rely on visual inconsistencies
or manual detection, are becoming less effective against advanced GAN-generated deep-
fakes [7]. This requires the development of robust detection systems capable of identifying
deepfakes with high accuracy. Researchers are working day and night to develop power-
ful algorithms to detect deepfake images. Sharma et al. [8] experimented on three deep
learning models, VGG16, ResNet50, and a custom CNN model, to attempt to achieve a
powerful classifier model. They used the 140K real and fake face dataset [9], which consists
of images generated by StyleGAN [10]. Developed by NVIDIA, StyleGAN has set a new
standard in the quality of generated images, and it produces faces that are highly realistic.
Perišić and Jovanović [11] proposed another solution to distinguish between real and fake
images by using a pre-trained VGG16 model and a custom VGG-like architecture. They
used the same dataset generated by StyleGAN. They found that a smaller, optimised CNN
could outperform larger pre-trained models in certain scenarios. Their findings highlight
the importance of balancing model complexity and accuracy, which inspired this research
to extend the comparison to multiple architectures, including ResNet50, DenseNet121,
MobileNet, and InceptionV3.
ings highlight the importance of balancing model complexity and accuracy, which in-
spired this research to extend the comparison to multiple architectures, including Res-
Computers 2025, 14, 60 Net50, DenseNet121, MobileNet, and InceptionV3. 3 of 21
This study focuses on using new technologies and AI to contribute to a safer digital
world. Existing DL image classification techniques such as convolutional neural networks
(CNNs)Thiswill
study focuses
be used to on using
detect new technologies
deepfakes and AIcreated
in facial images to contribute
using to a safer
GANs in digital
an at-
world. Existing DL image classification techniques such as convolutional neural
tempt to reduce deepfake crimes. We will use images generated by GANs and experiment networks
(CNNs)
with will bepre-trained
different used to detect deepfakes
models such asinResNet50,
facial images created using
DenseNet121, GANs in and
MobileNet, an attempt
Incep-
to reduce deepfake crimes. We will use images generated by GANs and experiment
tionV3, in addition to building a custom CNN model. We will apply various techniques with
different
such pre-trained
as face croppingmodels such as ResNet50,
and experiment DenseNet121,
with different dataset MobileNet, andratios
sizes and split InceptionV3,
to find
in addition to building a custom CNN model. We will apply various techniques
the optimal configuration. Additionally, a web interface will be developed, allowing users such
as check
to face cropping
whether and
a faceexperiment
image is realwith
or different
fake. dataset sizes and split ratios to find the
optimal configuration. Additionally, a web interface will be developed, allowing users to
2. Background
check whether a faceStudy
image is real or fake.
2.1. Deepfakes
2. Background Study
Deepfakes refers to fake media that include a person being replaced with another one
2.1. Deepfakes
in an image or video, and they are usually created using artificial intelligence technolo-
Deepfakes refers
gies. Deepfakes can betocreated
fake media
usingthat include
several a personincluding
techniques, being replaced
GANswith
and another one
autoencod-
in an
ers. image or video, and they are usually created using artificial intelligence technologies.
Deepfakes can be created using several techniques, including GANs and autoencoders.
2.2. Generative Adversarial Networks (GANs)
2.2. Generative Adversarial Networks (GANs)
This is a powerful technique that uses two neural networks, a generator and a dis-
This is a powerful technique that uses two neural networks, a generator and a dis-
criminator, to create a deepfake. The generator network creates synthetic data, such as
criminator, to create a deepfake. The generator network creates synthetic data, such as
images and videos, that mimic real data. At the start, the output is random noise, but with
images and videos, that mimic real data. At the start, the output is random noise, but with
time, the generator learns and creates realistic outputs [3]. The discriminator evaluates the
time, the generator learns and creates realistic outputs [3]. The discriminator evaluates the
authenticity of the output of the generator, distinguishing whether the data are real or
authenticity of the output of the generator, distinguishing whether the data are real or fake.
fake. The generator then learns and improves its outputs based on the discriminator’s
The generator then learns and improves its outputs based on the discriminator’s feedback.
feedback. An example of a GAN is shown in Figure 1.
An example of a GAN is shown in Figure 1.
Figure 2.
Figure 2. Fake
Fake images
images generated by StyleGAN
generated by StyleGAN from
from [10].
[10].
3. Literature Review
In our work, we will be using a subset of the 140K real and fake face dataset [9], which
was generated by StyleGAN [10].
In a study by Raza et al. [2], a solution based on a hybrid CNN (convolutional neural
network) and VGG16 architecture was put forward. The neural network techniques were
3. Literature Review
built from a dataset containing 1081 real and 960 fake images. The deepfake dataset uti-
lisedInis afreely
studyaccessible
by Raza etonal.Kaggle
[2], a solution
[14] frombased on a hybrid
the Yonsei CNN (convolutional
University Department ofneural
Com-
network) and VGG16 architecture was put forward. The
puter Science. After comparing Xception, NAS-Net, Mobile Net, and VGG16,neural network techniques were
they decided
built
to go from
forward a dataset containing
with VGG16, 1081
since real the
it had andhighest
960 fakeaccuracy
images. of
The deepfake
90%. dataset utilised
The suggested model
is
architecture was created by merging the hybrid layers of VGG16 and the of
freely accessible on Kaggle [14] from the Yonsei University Department Computer
CNN—more
Science. After
specifically, thecomparing Xception,flattening,
pooling, dropout, NAS-Net,and Mobile
fullyNet, and VGG16,
connected layers.they
Thedecided
proposedto
go forward with VGG16, since it had the
hybrid deepfake predictor achieved an accuracy of 94%.highest accuracy of 90%. The suggested model
architecture
Kerenalli was created
et al. by merginga the
[15] developed hybrid classification
three-step layers of VGG16 and the
technique to CNN—more
classify mis-
specifically, the pooling, dropout, flattening, and fully connected
leading deepfake images. The classifier generality is enhanced in the first step layers. Thebyproposed
employ-
hybrid deepfake predictor achieved an accuracy of 94%.
ing a new method of data augmentation known as random CutMixUp augmentation. In
Kerenalli et al. [15] developed a three-step classification technique to classify mislead-
the second stage, visual assessments of the shifted window transformer and EfficientNet
ing deepfake images. The classifier generality is enhanced in the first step by employing
a new method of data augmentation known as random CutMixUp augmentation. In the
second stage, visual assessments of the shifted window transformer and EfficientNet struc-
ture are merged to create the hybrid model. As a result, a trustworthy classifier that can
differentiate between genuine and fake photos is generated. Finally, GradCAM considers
attention maps and feature maps to offer visual clues for the classifier to decide upon so
that non-AI users can also understand the classifier’s decision. This study suggests that the
deepfake image as a whole has to be studied thoroughly. The Computational Intelligence
and Photography Lab (CIPL) dataset, also known as the Real and Fake Face Detection
dataset from Kaggle [14], was combined with a dataset of 140,000 real and fake images for
Computers 2025, 14, 60 5 of 21
training and validation, and the accuracy achieved by the suggested method was 98.45%.
However, it is to be noted that this research concentrates only on artificially generated and
manually created images, excluding the consideration of adversarial images.
In this study, a Vision Transformer and CNN model were analysed and compared to
determine which deep learning technique works better in generalising deepfake detection
beyond the method in which it was created and trained. The ForgeryNet dataset, which is
available on Github, was used to train the models, since it is one of the largest deepfake
datasets ever made public, consisting of 2.9 million images and 220 thousand video clips.
For this research, only the images were used to train the model. After testing, it was noted
that Vision Transformer worked better for generalisation, as we could see that, no matter the
process for creating the training dataset, the variance for the Vision Transformer was always
lower than that of the CNN; for example, they had a variance of 0.013 and 0.024, respectively,
making it more suitable for real-world applications. For the CNN, EfficientNetV2 was
chosen, and it was concluded that it was more accurate for specialisation, hence making it
more suitable when one wants to perform specific deepfake detection [16].
In another study, data augmentation was used to enhance model performance during
training by producing sample images [17]. This was to overcome the problem of overfitting
and generalisation. A dataset of deepfake and real images from Kaggle [18] was used in
this study. A total of 140,002 images were utilised for training, 39,428 images were used
for validation, and 10,905 images were used for an evaluation of the model’s abilities. In
addition, deep learning models such as a CNN, InceptionV3, VGG16, and VGG19 were
compared after applying the transfer learning concept. This concept was used to increase
the performance in detecting deepfake images. The VGG16 model achieved the highest
accuracy of 90%.
Hsu et al. [4] aimed at solving two challenges of deepfake detection. Firstly, with
GANs generating images, it is very difficult to obtain all the training samples. Secondly, we
need to retrain our models such that they can effectively detect new fake images generated
by GANs. The dataset used was from CelebA [19], and it is available on Kaggle. It contains
202,599 face images of various celebrities with 10,177 different identities and no names.
It also provides attributes such as the presence of glasses, hair colour, a smile, and many
more. Fake and real images were paired together, and then pairwise learning was used to
train the Common Fake Feature (CFF) network. A classification network that could be used
to detect real and fake images was then derived. The proposed detector had an accuracy
of 90.9%.
In this study, the LRNet method proposed by Sun et al. [20] was chosen to be enhanced
because of its high level of precision. This technique was designed to analyse temporal
changes in videos and detect whether or not they have been altered. The FaceForensics++
dataset introduced by Rössler et al. [21] was a larger version of the FaceForensics dataset,
which focussed only on the alteration in facial expressions. The enhanced version included
1000 real videos from YouTube and 1000 fake videos created with GANs and computer
graphics. Three different levels of video compression were included in the dataset: raw, c23,
and c40. The enhanced model was created to improve the AUC result and “c23” data-level
accuracy. The main differences between the improved model and LRNet were the first
dropout rates, the number of hidden GRU neurons, the number of dropout layers, and the
linear layer and activation function configurations. The enhanced model’s “c23” variant
achieved an accuracy increase from 92.93% to 96.17%, and the AUC increased from 96.80%
to 98.39% [6].
In a study by Suganthi et al. [22], the goal was to implement a technique that was
faster and more accurate in detecting deepfakes. The local binary pattern histogram
(FF-LBPH), a deep learning approach used by fisherface, was the foundation of the sug-
Computers 2025, 14, 60 6 of 21
gested deepfake recognition and detection model. A Kalman filter was used for the
preprocessing, which targeted resizing, the removal of noise, and the normalisation of
images. To obtain a shorter execution time, the dimension reduction of images was per-
formed using a fusion of FF-LBPH. The deepfake detection datasets that were utilised were
Flickr-Faces-HQ (FFHQ) from Karras et al. [23], 100K-Faces from Generated Photos [24],
the Fake Face Dataset (DFFD) from Dang et al. [25], and CASIA-WebFace from Yi et al. [26].
From the testing results, it was concluded that the proposed FF-LBPH model performed
better than SVM, LDA, KNN, and CNN on all datasets. The proposed model achieved the
highest accuracy of 98.82% on the CASIA-WebFace dataset, followed by an accuracy of
97.82% on the DFFD.
Soleimani et al. [27] proposed a method for detecting synthesised images. It was
based on a three-path decision. Firstly, the entire face was fed to deepfake detectors to
check if an image was real or not. Secondly, they created feature vectors for each patch
after they divided the face into patches. By joining every patch together, they could detect
whether the image was real or not. Thirdly, if the number of fake patches exceeded the
number of real patches, the image was considered fake. So, each of the three patches
determined if the image was real or not. The final decision was made as follows: if two
approaches determined that an image was real and one approach considered it fake, the
image was considered real. They used the same technique for occluded images, with the
difference that the pixels in the occluded areas were set to zero. The datasets of fake images
generated by StyleGAN from Karras et al. [23] and by StyleGAN2 from Karras et al. [28]
used FFHQ images for training, and those generated by StarGAN from Choi et al. [29] and
PPGAN from Karras et al. [10] used the CelebA dataset from Liu et al. [30] for training.
The CelebA and FFHQ datasets were used for real images. For testing, the first dataset
contained the fake images generated by StyleGAN and StarGAN and real ones from CelebA
and FFHQ. The suggested approach for this dataset achieved 100% accuracy. The second
dataset consisted of fake images produced by StyleGAN and real images from CelebA.
The proposed approach again had an accuracy of 100%. The third dataset contained fake
images generated by StyleGAN2 and real images from FFHQ. The proposed approach had
an accuracy of 99.7%. For all the datasets used, their approach had the highest accuracy in
comparison with previous research.
Chen et al. [31] proposed a Two-Branch Convolutional Network with Similarity and
Classifier (TCNSC) technique that targeted detecting deepfakes in compressed images. The
network had two branches: one for binary classification and the other for similarity learning.
They noticed that there was a high similarity between original images and compressed ones
based on symmetry. The proposed method was improved by concurrently training these
two branches, which had the symmetrical raw image and its compressed image as inputs.
The FaceForensics++ dataset introduced by Rössler et al. [21] was used to experiment with
the model, and it was noted that the proposed model outperformed all existing ones under
the three compression settings of low quality, medium quality, and high quality, with an
accuracy of 91.8%, 93.4%, and 95.3%, respectively.
Abir et al. [32] used several deep learning algorithms (CNN models) to differentiate
between real and fake images, and the results were later compared to determine which one
to use for further research. Deep learning methods are considered black box models, as
we cannot obtain a clear understanding of how deep neural networks come to a decision.
In order to overcome this, Explainable AI (XAI) was introduced, which gave clarifications
through visualisations, analysis, masking, numerical values, and feature weighting. From
XAI, the Local Interpretable Model-Agnostic Explanations (LIME) algorithm was selected,
since it can be applied to any machine learning model, and its explanations are interpretable
and transparent. The dataset used in this research was retrieved from Kaggle; it had
Computers 2025, 14, 60 7 of 21
140,000 images, among which 70,000 were fake [9]. InceptionResNetV2 was chosen from
the CNN model, and with the help of XAI, the accuracy was 99.87%. XAI could give users
a better understanding of why the model made the prediction, making this approach more
flexible and reliable for users.
Khudeyer and Al-Moosawi [33] aimed at improving image deepfake detection by
modifying a CNN architecture with EfficientNetB0. EfficientNet is quicker, has fewer
parameters, and is more capable of feature extraction than other CNN models. The Flickr-
Faces-HQ (FFHQ) dataset from Karras et al. [28] was used to train the classification model.
Three models were proposed, with each one being an improved version of the previous
model. Model 2 became an improved version of EfficientNetB0 by adding a dense layer
of 256 nodes and dropout techniques. Schedule learning techniques were then added,
improving the performance and reducing the training time of model 3. The proposed
method achieved an accuracy of 99.06%.
Doloriel and Cheung [34] focussed on improving the generalisation capability of
deepfake detectors using masked image modelling. This technique used masking in
supervised settings and focussed on classification loss to differentiate between real and fake
images. During training, the images were subjected to both spatial- and frequency-domain
masking. The dataset used for the training and validation setup was the same as that
described by Wang et al. [35], by using ProGAN with 720,000 samples for training and
4000 samples for validation. For testing, models such as GANs, DeepFake, low-level vision
models, and perceptual loss models from Wang et al. [35] were used. The experiments
also included testing with diffusion models. These diffusion models included Guided
Diffusion; Latent Diffusion (LDM) with varying steps of noise refinements and generation
guidance; Glide, which made use of two stages of noise refinement steps; and, lastly,
DALL-E-mini. They applied the proposed frequency masking for comparison on the state-
of-the-art method of Wang et al. [35], who proposed a way to use detectable fingerprints
from CNN-generated images to differentiate between real and fake images, thus allowing
forensic classifiers to generalise from one model to another without extensive adaptation.
In addition, Gragnaniello et al. [36] analysed the use of different augmentation techniques
and training strategies on a deepfake detector’s generalisation ability. It could be noted
that the combination of their frequency-based masking technique with the method of
Wang et al. [35] resulted in an increase of 2.36% in mean average precision, and there
was an increase of 3.01% in mean average precision when combined with the method of
Gragnaniello et al. [36].
Sun et al. [37] proposed a blending-based detection approach to enhance the general-
isation of deepfakes. They introduced a method of generating synthetic forged training
samples, named reconstructed blended images (RBIs). These images incorporated an in-
visible generator fingerprint and noise pattern, thereby enhancing the range of simulated
artefacts. They introduced a detection model named the multi-scale feature reconstruction
network (MFRN) to capture the variety of altered regions and training artefacts present
in the blended data. This approach combined their deepfake generator and detector
model. The model was trained based on the FF++ dataset, which was introduced by
Rössler et al. [21]. The model performance was tested first through cross-manipulation de-
tection, where the model demonstrated great performance, with areas under curve (AUCs)
ranging from 98.90% to 100% across various manipulation techniques, such as DeepFake,
Face2Face, FaceSwap, and so on; secondly, it was tested through cross-dataset classification,
where the model produced robust detection results on well-known deepfake detection
datasets such as CDF-v2, DFD, DFDC, and so on, achieving AUCs ranging from 73.31% to
99.12%. The results surpassed or matched those of current state-of-the-art models.
Computers 2025, 14, 60 8 of 21
Nethravathi et al. [38] focussed on two deep learning techniques, error-level analysis
(ELA) with a CNN and a pre-trained VGG-16 model. The two models were improved, and
their results were compared. For the ELA-CNN, they used a custom CNN architecture
consisting of various convolutional, pooling, dropout, and fully connected layers. To
prevent overfitting, the CNN used early halting and dropout regularisation techniques.
The dataset utilised for training and testing the ELA-CNN was CASIA v1.0, which was
introduced by Chen et al. [39]. To improve the model’s efficiency and generalisation, the
training set was augmented using a variety of random transformations. For the pre-trained
VGG-16 model, transfer learning was used by swapping out the final classification layer
with a new layer designed to enhance deepfake detection. The dataset utilised for training
and testing the VGG-16 model was the same as the one used for the ELA-CNN, except
that instead of preprocessing the images using ELA, they downsized and normalised the
images. It was noted that the ELA-CNN model achieved an accuracy of 99.87%, whereas
the VGG-16 model achieved an accuracy of 97.93%.
A method that combined error-level analysis (ELA) and a CNN architecture to detect
deepfakes in images was proposed by Sudiatmika et al. [40]. The CASIA v2.0 dataset [41]
was used to train the model. The dataset comprised 7491 real and 5123 fake images. The
dataset was divided into real and fake images, followed by normalising the images by
processing them to a size of 224 × 224 pixels. Then, they had to perform compression ELA
on the images. VGG-16 was then chosen as the CNN architecture to be used to train the
model, since it is perfect for training with minimal datasets. It was noted that the proposed
method had an accuracy of 92.2% in training and 88.46% in validation.
Chen et al. [42] proposed a solution that used images generated by a conditional
diffusion model (CDM) for data augmentation. This method enabled the deepfake detection
model to learn generic and robust representations without leading to overfitting. The
FaceForensics++ dataset introduced by Rössler et al. [21], which contains 1000 real YouTube
videos and corresponding fake videos produced by DeepFake, Face2Face, FaceSwap, and
NeuralTextures, was used. To assess the generalisability of their detector, the Celeb-DFv2
(CDF) and DeepFakeDetection (DFD) datasets were used for a cross-dataset test. CDF has
5639 fake videos generated using an improved synthesis process, and DFD has 363 genuine
videos from YouTube and 3068 fake videos. The detection model was trained with several
baseline models, and the results were compared. Using intra-dataset deepfake detection,
their proposed method outperformed the FF++ and ADM methods, achieving an AUC
of 99.31%. Using cross-dataset deepfake detection, it was noted that their method had a
better performance than the others, showing improvements of 5.5% on CDF and 4.7% AUC
on DFD.
A solution that focussed on the viability of a Vision Transformer (ViT) for detecting
multiclass deepfake images was proposed by Arshed et al. [43]. The deepfake detector
would tackle work as a multiclass task by dealing with issues related to Stable Diffusion
and StyleGAN2. A ViT was used to extract the global properties of images for better
detection accuracy. The dataset used for real images was accessed on Kaggle [9], and
10K images were considered. GAN-based fake images were obtained from an online
website named thispersondoesnotexist [44]. Another dataset focussed on Stable Diffusion
that was based on text-to-image conversion was used. Lastly, a StyleGAN2 encoding of
Stable Diffusion, the dataset for which was accessed on Kaggle and was called Synthetic
Faces High Quality (SFHQ), was used [45]. It contained curated 1024 × 1024 high-quality
face images. After experimenting, it could be noted that their proposed solution that was
based on a multiclass-prepared dataset achieved an accuracy of 99.90%.
Wang et al. [46] proposed a robust identity perceptual watermark framework that
detects deepfake face swapping. A chaotic encryption system was constructed to ensure the
Computers 2025, 14, 60 9 of 21
4. Methodology
4.1. Dataset
A subset of the 140K real and fake face dataset (accessed on Kaggle [9]) was used in
this study. The fake images in this dataset were part of the 1 Million Fake Face dataset,
which was generated by NVIDIA StyleGAN [23]. The details of this dataset are summarised
in Table 2. Some sample images from this face dataset are shown in Figure 3. Initially, the
dataset [9] had 140 K images. Since training on 140K images would take many resources,
we decided to downsize the dataset to only 20K images. The test folder of the 140K real
and fake face dataset was used in its entirety to build the 20K image dataset. This updated
dataset is named 20k_gan.
Several versions of the 20k_gan dataset with different split ratios and sizes were
created in order to find the optimal one. Another version of the dataset was created by
applying a cropping operation on the images. This was meant to keep only face images and
18 Wang et al. [46] Images CelebA-HQ [10] Above 98%
mark framework that detects
deepfake face swapping
Computers 2025, 14, 60 10 of 21
4. Methodology
4.1. Dataset
remove most backgroundA subset images,
of the so
140Kthatreal
theand
model
fakewould not have
face dataset to learnonunnecessary
(accessed Kaggle [9]) was use
features. The face_recognition library was used to perform the cropping operation.
this study. The fake images in this dataset were part of the 1 Million Fake ThisFace dat
is a simple facial recognition
which libraryby
was generated forNVIDIA
Python StyleGAN
built on top[23].
of DLib and OpenCV,
The details and it are sum
of this dataset
has a function rised
named in Table 2. Some sample images from this face dataset are shown inright,
face_locations for locating a face and storing the borders (top, Figure 3. Initi
bottom, and left) of the face. The face is then cropped and saved based on the saved
the dataset [9] had 140 K images. Since training on 140K images would take many borders.
The cropping operation
sources, we is shown
decidedinto
Figure 4. This
downsize theoperation
dataset towas performed
only on PyCharm
20K images. The test folder of
to create a newly cropped dataset, which was then uploaded to Google
140K real and fake face dataset was used in its entirety to build the Drive to train
20Kthe
image dat
model with theThis
dataset on Google Colab.
updated dataset is named 20k_gan.
Several versions of the 20k_gan dataset with different split ratios and sizes were cre-
ated in order to find the optimal one. Another version of the dataset was created by ap-
plying a cropping operation on the images. This was meant to keep only face images and
remove most background images, so that the model would not have to learn unnecessary
features. The face_recognition library was used to perform the cropping operation. This
is a simple facial recognition library for Python built on top of DLib and OpenCV, and it
has a function named face_locations for locating a face and storing the borders (top, right,
bottom, and left) of the face. The face is then cropped and saved based on the saved bor-
ders. The cropping operation is shown in Figure 4. This operation was performed on Py-
Charm to create a newly cropped dataset, which was then uploaded to Google Drive to
Figure 3. Examples
train the model of images
with
Figure 3.the of faces
dataset
Examples inGoogle
ofon the 140K
images realinand
Colab.
of faces thefake
140Kface
realdataset [9].face dataset [9].
and fake
Figure 4. Cropping
Figure 4. Cropping operation
operation on
on an
an image.
image.
4.2. Architecture of the Model
4.2. Architecture of the Model
Figure 5 provides the details of the architecture, showcasing the different steps in-
Figure 5 provides the details of the architecture, showcasing the different steps in-
volved in training the model. The Keras library, which is part of the TensorFlow framework,
volved in training the model. The Keras library, which is part of the TensorFlow frame-
was used to define and build the deep learning model. The dataset consisting of real and
work, was used to define and build the deep learning model. The dataset consisting of
real and fake images was preprocessed. Before resizing and normalising the dataset, it
was split into training, validation, and testing sets. All the sets were then resized to an
appropriate size (in our case, we used 256 × 256 pixels), and they were then normalised.
Scaling, cropping, and normalisation were used to standardise the images to 256 × 256
Computers 2025, 14, 60 11 of 21
fake images was preprocessed. Before resizing and normalising the dataset, it was split into
training, validation, and testing sets. All the sets were then resized to an appropriate size
(in our case, we used 256 × 256 pixels), and they were then normalised. Scaling, cropping,
and normalisation were used to standardise the images to 256 × 256 pixels, focussing only
Computers 2025, 14, x FOR PEER REVIEW 12 of 22
on facial features by removing background noise, scaling pixel values to a consistent range,
and enhancing the model’s ability to learn relevant features efficiently.
Figure6.6.Website
Figure Websiteinteracting with
interacting thethe
with server.
server.
Figure7.7.Web
Figure Webinterface ofof
interface the application
the showing
application thatthat
showing a correct prediction
a correct has been
prediction made.made.
has been
Figure8.8.Flowchart
Figure showing
Flowchart the the
showing image prediction
image process
prediction for thefor
process user.
the user.
5. Results
This study brings several improvements to deepfake detection. The preprocessing
stepThe 20k_gan dataset consists of 20,000 images: 10,000 real images and 10,000 fake
is designed to focus only on facial features. The images are scaled and normalised
images. We used different split ratios to find the optimal one. Table 3 shows the different
The dataset split is carefully chosen to improve accuracy while keeping computing needs
dataset names along with their split ratios and the number of images in each set.
low. Unlike many studies that use just one type of model, this research compares a custom
CNN with several pre-trained models (ResNet50, MobileNet, DenseNet121, and Incep-
tionV3) to find the best approach for deepfake detection. MobileNet is selected for its
speed and efficiency, making it useful for real-time detection. Lastly, this study goes be-
yond just testing models by building a Flask-based web tool, allowing deepfake detection
Computers 2025, 14, 60 14 of 21
The validation accuracy ranging between 80 and 85% suggests that the model does
not overfit significantly and has the ability to generalise well.
All datasets with a 10% test ratio have the same test set images, implying that the
models using the 20k_gan_5_4_1, 20k_gan_6_3_1, 20k_gan_7_2_1, and 20k_gan_8_1_1
datasets are evaluated using the same set of images each time. This is done to ensure that
there is no bias when evaluating the model.
The time taken to train the model varies significantly, mostly because Google Colab’s
runtime disconnects as soon as an internet connection is weak, and it waits until the
connection is stable to continue running the cell. For each dataset version, the data are
tasets are evaluated using the same set of images each time. This is done to ensure that
there is no bias when evaluating the model.
The time taken to train the model varies significantly, mostly because Google Colab’s
Computers 2025, 14, 60 15 con-
of 21
runtime disconnects as soon as an internet connection is weak, and it waits until the
nection is stable to continue running the cell. For each dataset version, the data are trained
twice, as artificial intelligence (AI) does not always give accurate results. So, we used two
trained twice, as artificial intelligence (AI) does not always give accurate results. So, we
runs, and then the average was considered for further evaluation.
used two runs, and then the average was considered for further evaluation.
From these experiments, we can note that dataset 20k_gan_8_1_1 achieved the high-
From these experiments, we can note that dataset 20k_gan_8_1_1 achieved the highest
est test accuracy. The dataset was wisely distributed, providing the model with enough
test accuracy. The dataset was wisely distributed, providing the model with enough data
data for training and testing. Therefore, the optimal split ratio for the 20k_gan dataset was
for training and testing. Therefore, the optimal split ratio for the 20k_gan dataset was an
an 80% training ratio, 10% validation ratio, and 10% test ratio.
80% training ratio, 10% validation ratio, and 10% test ratio.
5.2. Results for the 2k_gan_8_1_1, 5k_gan_8_1_1, 10k_gan_8_1_1, 15k_gan_8_1_1, and
5.2. Results for the 2k_gan_8_1_1, 5k_gan_8_1_1, 10k_gan_8_1_1, 15k_gan_8_1_1, and
20k_gan_8_1_1 Datasets
20k_gan_8_1_1 Datasets
From the
From theexperiments,
experiments,we wecan
cansee
see that,
that, as as
thethe dataset
dataset sizesize increases,
increases, the validation
the validation and
and test accuracy also improve. This
test accuracy also improve. This is because the model has more data to train on and and
is because the model has more data to train on can
can learn
learn moremore
complexcomplex features.
features. The validation
The validation and
and test test accuracies
accuracies track
track each eachclosely,
other other
closely, indicating
indicating good generalisation.
good generalisation. Both runsBoth
alsoruns
showalso
veryshow very
similar similar indicating
patterns, patterns, indi-
that
cating
the that the
training trainingand
is accurate is accurate and reproducible.
reproducible. However, the However, the training
training time time also
also increases, in-
since
creases, since training on around 20,000 images is resource-intensive. Figure
training on around 20,000 images is resource-intensive. Figure 9 is a line graph showing 9 is a line
graph
the showing
different the different
accuracies accuracies
for each dataset.for each dataset.
Figure 9.
Figure 9. Graph for the analysis of the results
results of
of gan_8_1_1.
gan_8_1_1.
Table 5. Comparing the normal datasets with the cropped face datasets.
Average Training Average Validation Average Test Accuracy Average Time Taken to
Dataset
Accuracy (%) Accuracy (%) (%) Train
2k_gan_8_1_1 98.25 68.45 65.45 10 min
2k_gan_crop_8_1_1 96.5 71.7 70.5 14 min
10k_gan_8_1_1 85 79 80 2h
10k_gan_crop_8_1_1 98.4 80.4 81.1 53 min
20k_gan_8_1_1 98 84 86.1 2 h 30 min
20k_gan_crop_8_1_1 97.25 82.95 85.05 2 h 30 min
5.5. Comparing ResNet50, DenseNet121, MobileNet, InceptionV3, and Our Custom CNN Model
We tested our dataset on four pre-trained models, namely, ResNet50, DenseNet121,
MobileNet, and InceptionV3. The aim was to enhance the model’s performance by using a
model that was already trained on a large dataset and had already learned various features
and patterns. We used the pre-trained model’s feature extraction layers and added a custom
classification layer to distinguish between real and fake images.
In Table 7, we can see that the MobileNet model currently yields the best test accuracy
of 98.5%, followed by a test accuracy of 98.0% with the InceptionV3 model, 97.3% with
the DenseNet121 model, 96.1% with the ResNet50 model, and, finally, 86.2% with our
custom CNN model. We can also note that MobileNet took less time to train than the other
models since it is more lightweight. Additionally, since the custom CNN model was built
with a simpler architecture of only two convolutional layers, it lacked in accuracy due to
its limited capabilities compared with deeper networks. We used a simple CNN model
as a starting point to compare its performance with the pre-trained models in order to
understand how well it can detect deepfakes. More advanced models like ResNet50 or
InceptionV3 give better results, but they might need a lot more computing power, which
can make them harder to use in real time or on devices with limited resources. The models
were trained, validated, and tested on only 20,000 images due to limited computational
power. Figure 10 shows the test accuracy for each model.
Sharma et al. [8] also experimented on the 140K real and fake face dataset [9]. They
ran tests on three models: VGG16, ResNet50, and a custom CNN model. Table 9 shows the
results obtained.
Our custom CNN model was trained on only 16,000 images from the 140K real and
fake face dataset and achieved an accuracy of 86.2% while the ResNet50 model achieved an
accuracy of 96.1% on the same subset of the 140K real and fake face dataset.
Computers 2025, 14, 60 18 of 21
In Table 10, we can see that the ResNet50 model outperformed the work of
Sharma et al. [8] by 2.2%. Additionally, the ResNet50 model was trained, validated, and
tested on a total of 20,000 images only, compared with the model of Sharma et al. [8], who
used all 140,000 images in the dataset. Our custom CNN achieved an accuracy of 86.2% by
training on 16,000 images, while the custom CNNs of Perišić et al. [11] and Sharma et al. [8]
achieved an accuracy of 97.18% and 95.5%, respectively, by training on 100,000 images.
7. Conclusions
Due to rapid advances in artificial intelligence (AI), generative adversarial networks
(GANs) have gained popularity due to their ability to create realistic images, videos,
and audio. GANs can be used to generate image datasets and cartoon characters and
can help in text-to-image translations. Shortly after these advances, deepfakes became
a growing concern, and the issue remains alarming, since a fake image of someone can
tarnish a potential victim’s reputation. In this study, we implemented a solution to tackle
the deepfake problem by building an effective deep learning (DL) model to distinguish
between real and fake images.
We focussed on GAN-generated images, as these are a type of image that can easily
deceive the human eye. We used a subset of the 140K real and fake face dataset with
images generated by StyleGAN and experimented with five pre-trained models: ResNet50,
DenseNet121, MobileNet, InceptionV3, and a custom CNN model. Various dataset sizes,
such as 20,000, 15,000, 10,000, and 5000 images, and different split ratios were used. We
also applied techniques such as face cropping to the dataset. The 20k_gan_8_1_1 dataset
achieved the best performance, with a test accuracy of 98.5% when using the MobileNet
model, followed by 98.0% with the InceptionV3 model, 97.3% with the DenseNet121 model,
96.1% with the ResNet50 model, and, finally, 86.2% with our custom CNN model. Moreover,
we developed a web interface for users to detect the authenticity of face images.
The deepfake detection models developed here have significant potential for real-
world applications. This study’s detection model could help limit the spread of fraudulent
content, protecting digital platforms and enhancing user trust. For instance, the model
could automatically analyse and flag fake images before they are posted on social media
platforms, protecting users from deceptive content. Additionally, this model could verify
profile pictures on various websites, including government and official portals, enhancing
security and trust on digital platforms. In the future, the whole 140K dataset or other
datasets such as FaceForensics++ could be used to train the model. In this way, the model
will be able to learn from a wider and more diverse dataset, which is expected to further
increase the accuracy of the model. Additionally, we could refine the custom CNN model by
adding more convolutional layers or even implement some other pre-trained deep learning
models such as VGG16 to increase the model’s accuracy. We also intend to experiment with
more advanced and recent techniques such as Vision Transformers and diffusion models to
further boost detection accuracy, especially for highly realistic deepfakes. Collaboration
with industry actors, such as social media platforms, could support the real-world testing
and refinement of the model for wider deployment.
Computers 2025, 14, 60 19 of 21
Author Contributions: Conceptualization, S.P. and J.J.; methodology, S.P.; software, J.J.; validation,
J.J. and S.P.; formal analysis, J.J.; investigation, J.J.; resources, S.P.; data curation, J.J.; writing—original
draft preparation, J.J.; writing—review and editing, S.P.; visualisation, J.J.; supervision, S.P.; project
administration, S.P.; funding acquisition, S.P. All authors have read and agreed to the published
version of the manuscript.
Data Availability Statement: The data are freely available on Kaggle. Please see reference [9].
References
1. Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial
Nets. In Proceedings of the International Conference on Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC,
Canada, 8–13 December 2014; pp. 2672–2680.
2. Raza, A.; Munir, K.; Almutairi, M. A Novel Deep Learning Approach for Deepfake Image Detection. Appl. Sci. 2022, 12, 9820.
[CrossRef]
3. Walczyna, T.; Piotrowski, Z. Fast Fake: Easy-to-Train Face Swap Model. Appl. Sci. 2024, 14, 2149. [CrossRef]
4. Hsu, C.-C.; Zhuang, Y.-X.; Lee, C.-Y. Deep Fake Image Detection Based on Pairwise Learning. Appl. Sci. 2020, 10, 370. [CrossRef]
5. Kondrashov, S. Reimagining Digital Creativity: The Impact of Deepfake Technology on Artistic Expression. Medium. 2024.
Available online: https://medium.com/@realstanislavkondrashov/stanislav-kondrashov-explores-the-impact-of-deepfake-
technology-on-artistic-expression-84aec4ca1d49 (accessed on 25 September 2024).
6. Janutėnas, L.; Janutėnaitė-Bogdanienė, J.; Šešok, D. Deep Learning Methods to Detect Image Falsification. Appl. Sci. 2023, 13, 7694.
[CrossRef]
7. Clarke, M. Keeping It Real: How to Spot a Deepfake. CSIRO. 2024. Available online: https://www.csiro.au/en/news/all/
articles/2024/february/detect-deepfakes (accessed on 23 July 2024).
8. Sharma, J.; Sharma, S.; Kumar, V.; Hussein, H.S.; Alshazly, H. Deepfakes Classification of Faces Using Convolutional Neural
Networks. Trait. Signal 2022, 39, 1027–1037. [CrossRef]
9. Xhlulu. 140K Real and Fake Face. 2020. Available online: https://www.kaggle.com/datasets/xhlulu/140k-real-and-fake-faces
(accessed on 16 May 2024).
10. Karras, T.; Aila, T.; Laine, S.; Lehtinen, J. Progressive growing of GANs for improved quality, stability, and variation. arXiv 2017,
arXiv:1710.10196. [CrossRef]
11. Perišić, N.; Jovanović, R. Convolutional Neural Networks for Real and Fake Face Classification. In Sinteza 2022—International
Scientific Conference on Information Technology and Data Related Research 2022; Singidunum University: Belgrade, Serbia, 2022;
pp. 29–35. [CrossRef]
12. Cortuk, D. Generative Adversarial Networks (GANs): A Journey into AI-Generated Art. Medium. 2023. Available online: https:
//medium.com/@derya.cortuk/generative-adversarial-networks-gans-a-journey-into-ai-generated-art-7b7f9e40d4f5 (accessed
on 2 May 2024).
13. Naitali, A.; Ridouani, M.; Salahdine, F.; Kaabouch, N. Deepfake Attacks: Generation, Detection, Datasets, Challenges, and
Research Directions. Computers 2023, 12, 216. [CrossRef]
14. Kaggle. CIPLAB @ Yonsei University. Real and Fake Face Detection. 2019. Available online: https://www.kaggle.com/datasets/
ciplab/real-and-fake-face-detection (accessed on 24 September 2024).
15. Kerenalli, S.; Yendapalli, V.; Chinnaiah, M. Classification of Deepfake Images Using a Novel Explanatory Hybrid Model. CommIT
J. 2023, 17, 151–168. [CrossRef]
16. Coccomini, D.A.; Caldelli, R.; Falchi, F.; Gennaro, C.; Amato, G. Cross-Forgery Analysis of Vision Transformers and CNNs for
Deepfake Image Detection. In Proceedings of the 1st International Workshop on Multimedia AI against Disinformation (MAD
’22), Newark, NJ, USA, 27–30 June 2022; pp. 52–58. [CrossRef]
17. Farkhud, I.; Ahmed, A.; Abdul, R.J.; Ahmed, A.; Zunera, J.; Sahid, A.; Imad, R. Data Augmentation-based Novel Deep Learning
Method for Deepfaked Images Detection. ACM Trans. Multimed. Comput. Commun. Appl. 2023, 20, 339. [CrossRef]
18. Kaggle. Deepfake and Real Images. 2021. Available online: https://www.kaggle.com/datasets/manjilkarki/deepfake-and-real-
images (accessed on 24 September 2024).
19. Kaggle. CelebFaces Attributes (CelebA) Dataset. 2015. Available online: https://www.kaggle.com/datasets/jessicali9530/celeba-
dataset/data (accessed on 24 February 2024).
Computers 2025, 14, 60 20 of 21
20. Sun, Z.; Han, Y.; Hua, Z.; Ruan, N.; Jia, W. Improving the Efficiency and Robustness of Deepfakes Detection through Precise
Geometric Features. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR),
Nashville, TN, USA, 20–25 June 2021; pp. 3608–3617.
21. Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial
images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2
November 2019; pp. 1–11.
22. Suganthi, S.T.; Ayoobkhan, M.U.A.; Krishna, K.V.; Bacanin, N.; Venkatachalam, K.; Štěpán, H.; Pavel, T. Deep learning model for
deep fake face recognition and detection. PeerJ Comput. Sci. 2022, 8, e881. [CrossRef]
23. Karras, T.; Laine, S.; Aila, T. A Style-Based Generator Architecture for Generative Adversarial Networks. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4401–4410.
[CrossRef]
24. Generated Photos. 100K-Faces Dataset. 2020. Available online: https://generated.photos/datasets (accessed on 27 September 2024).
25. Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A.K. On the Detection of Digital Face Manipulation. In Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2022; pp. 5781–5790. [CrossRef]
26. Yi, D.; Lei, Z.; Liao, S.; Li, S. Learning Face Representation from Scratch. arXiv 2014, arXiv:1411.7923. [CrossRef]
27. Soleimani, M.; Nazari, A.; Moghaddam, M.E. Deepfake Detection of Occluded Images Using a Patch-based Approach. arXiv 2023,
arXiv:2304.04537. [CrossRef]
28. Karras, T.; Laine, S.; Aittala, M.; Hellsten, J.; Lehtinen, J.; Aila, T. Analysing and improving the image quality of StyleGAN. In
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020;
pp. 8107–8116. [CrossRef]
29. Choi, Y.; Choi, M.; Kim, M.; Ha, J.W.; Kim, S.; Choo, J. Stargan: Unified generative adversarial networks for multi-domain
image-to-image translation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City,
UT, USA, 18–23 June 2018; pp. 8789–8797.
30. Liu, Z.; Luo, P.; Wang, X.; Tang, X. Deep learning face attributes in the wild. In Proceedings of the IEEE International Conference
on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 3730–3738. [CrossRef]
31. Chen, P.; Xu, M.; Wang, X. Detecting Compressed Deepfake Images Using Two-Branch Convolutional Networks with Similarity
and Classifier. Symmetry 2022, 14, 2691. [CrossRef]
32. Abir, W.H.; Khanam, F.R.; Alam, K.N.; Hadjouni, M.; Elmannai, H.; Bourouis, S.; Dey, R.; Khan, M.M. Detecting Deepfake Images
Using Deep Learning Techniques and Explainable AI Methods. Intell. Autom. Soft Comput. 2023, 35, 2151–2169. [CrossRef]
33. Khudeyer, R.S.; Almoosawi, N.M. Fake Image Detection Using Deep Learning. Informatica 2023, 47, 115–120. [CrossRef]
34. Doloriel, C.T.; Cheung, N.M. Frequency Masking for Universal Deepfake Detection. arXiv 2024, arXiv:2401.06506. [CrossRef]
35. Wang, S.Y.; Wang, O.; Zhang, R.; Owens, A.; Efros, A.A. CNN-generated images are surprisingly easy to spot... for now. In
Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020;
pp. 8695–8704. [CrossRef]
36. Gragnaniello, D.; Cozzolino, D.; Marra, F.; Poggi, G.; Verdoliva, L. Are GAN generated images easy to detect? A critical analysis
of the state-of-the-art. In Proceedings of the 2021 IEEE International Conference on Multimedia and Expo (ICME), Shenzhen,
China, 5–9 July 2021; pp. 1–6. [CrossRef]
37. Sun, Y.; Nguyen, H.H.; Lu, C.S.; Zhang, Z.; Sun, L.; Echizen, I. Generalised Deepfakes Detection with Reconstruct-ed-Blended
Images and Multi-scale Feature Reconstruction Network. arXiv 2023, arXiv:2312.08020. [CrossRef]
38. Nethravathi, N.P.; Austin, B.D.; Reddy, D.S.P.; Kumar, G.V.; Raju, G.K. Image Forgery Detection Using Deep Neural Network. Int.
Res. J. Eng. Technol. (IRJET) 2023, 10, 1095–1100.
39. Chen, X.; Dong, C.; Ji, J.; Cao, J.; Li, X. Image manipulation detection by multi-view multi-scale supervision. In Proceedings of the
IEEE/CVF International Conference on Computer Vision, Virtual, 11–17 October 2021; pp. 14185–14193. [CrossRef]
40. Sudiatmika, I.B.K.; Rahman, F.; Trisno, T.; Suyoto, S. Image forgery detection using error level analysis and deep learning.
Telkomnika 2019, 17, 653–659. [CrossRef]
41. Kaggle. CASIA 2.0 Image Tampering Detection Dataset. 2013. Available online: https://www.kaggle.com/datasets/divg07/
casia-20-image-tampering-detection-dataset/code (accessed on 3 May 2024).
42. Chen, T.; Yang, S.; Hu, S.; Fang, Z.; Fu, Y.; Wu, X.; Wang, X. Masked conditional diffusion model for enhancing deepfake detection.
arXiv 2024, arXiv:2402.00541. [CrossRef]
43. Arshed, M.A.; Mumtaz, S.; Ibrahim, M.; Dewi, C.; Tanveer, M.; Ahmed, S. Multiclass AI-Generated Deepfake Face Detection
Using Patch-Wise Deep Learning Model. Computers 2024, 13, 31. [CrossRef]
44. Thispersondoesnotexist.com. Thispersondoesnotexist. 2019. Available online: https://thispersondoesnotexist.com (accessed on
15 March 2024).
45. Kaggle. Synthetic Faces High Quality (SFHQ) Part 4. 2022. Available online: https://www.kaggle.com/datasets/selfishgene/
synthetic-faces-high-quality-sfhq-part-4 (accessed on 15 March 2024).
Computers 2025, 14, 60 21 of 21
46. Wang, T.; Huang, M.; Cheng, H.; Ma, B.; Wang, Y. Robust Identity Perceptual Watermark Against Deepfake Face Swapping. arXiv
2023, arXiv:2311.01357. [CrossRef]
47. Github. [CVPR 2021 Oral] ForgeryNet: A Versatile Benchmark for Comprehensive Forgery Analysis. 2021. Available online:
https://github.com/yinanhe/ForgeryNet (accessed on 24 February 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.
Reproduced with permission of copyright owner. Further reproduction
prohibited without permission.