0% found this document useful (0 votes)
38 views16 pages

Deep Fake Image Detection Using FaceForensics++

Research papers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views16 pages

Deep Fake Image Detection Using FaceForensics++

Research papers
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Applied Soft Computing 105 (2021) 107256

Contents lists available at ScienceDirect

Applied Soft Computing


journal homepage: [Link]/locate/asoc

Detecting handcrafted facial image manipulations and GAN-generated


facial images using Shallow-FakeFaceNet

Sangyup Lee a , Shahroz Tariq a , Youjin Shin b , Simon S. Woo c ,
a
Department of Computer Science & Engineering, Sungkyunkwan University, Suwon, South Korea
b
Department of Computer Science & Engineering, State University of New York, Incheon, South Korea
c
Department of Applied Data Science, College of Computing and Informatics, Sungkyunkwan University, Suwon, South Korea

article info a b s t r a c t

Article history: The rapid progress of sophisticated image editing tools has made it easier to manipulate original face
Received 30 July 2020 images and create fake media content by putting one’s face to another. In addition to image editing
Received in revised form 15 February 2021 tools, creating natural-looking fake human faces can be easily achieved by Generative Adversarial
Accepted 24 February 2021
Networks (GANs). However, malicious use of these new media generation technologies can lead to
Available online 9 March 2021
severe problems, such as the development of fake pornography, defamation, or fraud. In this paper,
Keywords: we introduce a novel Handcrafted Facial Manipulation (HFM) image dataset and soft computing neural
Soft computing network models (Shallow-FakeFaceNets) with an efficient facial manipulation detection pipeline. Our
Fake media content detection neural network classifier model, Shallow-FakeFaceNet (SFFN), shows the ability to focus on the
Fake image dataset manipulated facial landmarks to detect fake images. The detection pipeline only relies on detecting
Multimedia forensics
fake facial images based on RGB information, not leveraging any metadata, which can be easily
GAN-generated image detection
manipulated. Our results show that our method achieves the best performance of 72.52% in Area
Deepfake detection
Under the Receiver Operating Characteristic (AUROC), gaining 3.99% F1-score and 2.91% AUROC on
detecting handcrafted fake facial images, and 93.99% on detecting small GAN-generated fake images,
gaining 1.98% F1-score and 10.44% AUROC compared to the best performing state-of-the-art classifier.
This study is targeted for developing an automated defense mechanism to combat fake images used
in different online services and applications, leveraging our state-of-the-art hand-crafted fake facial
dataset (HFM) and the neural network classifier Shallow-FakeFaceNet (SFFN). In addition, our work
presents various experimental results that can help guide better applied soft computing research in
the future to effectively combat and detect human and GAN-generated fake face images.
© 2021 Elsevier B.V. All rights reserved.

1. Introduction can be used to manipulate multimedia contents, create fake news,


distort the truth, defame, and impersonate someone. Additionally,
Digital image editing tools such as Adobe Photoshop [1] have such fake and distorted information can be quickly disseminated
significantly advanced, making it easier to manipulate complex via social media [4].
images and turning those into other high-quality images. In ad- In addition to the aforementioned image editing tools, deep
dition, Adobe developed an automatic area selection tool that learning has made significant breakthroughs in a wide range
can improve the editing quality [2], and restore parts of the of areas, including computer vision, image processing, speech
photographs automatically through the foreground-aware deep recognition, etc [5,6]. In particular, the Generative Adversarial
Networks (GANs) [7], where the discriminator and generator
learning-based inpainting technique [3]. In particular, these edit-
compete with each other, can be exploited to create entirely new
ing tools can be used to manipulate and create sophisticated
images, videos, and voices that are highly realistic. In particular,
fake images that are difficult for ordinary people to distinguish
GANs have been used most often to create realistic new im-
whether they are forged or not. Moreover, tutorials for step-by- ages [8], and to enhance the quality of such images [9]. Similar
step instructions to quickly create such fake images are widely to image editing tools, deep learning models, including these
available on YouTube, making it easier to learn about such image GANs, can also deceive people with synthetically generated im-
editing skills, even for amateurs. Therefore, these technologies ages. A fake face created by GANs can not only fool humans
but also machine learning classifiers [10,11]. Especially, it can
∗ Corresponding author. be an enormous problem if they are maliciously exploited for
E-mail addresses: [Link]@[Link] (S. Lee), shahroz@[Link] user identification and authentication applications besides fake
(S. Tariq), [Link].1@[Link] (Y. Shin), swoo@[Link] (S.S. Woo). information generation.

[Link]
1568-4946/© 2021 Elsevier B.V. All rights reserved.
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Another severe example from such image editing tools as well 3. We perform extensive experiments with different prepro-
as GANs is for generating pornographic materials by replacing cessing methods and compare our approach with other
a specific victim’s face with a naked actor’s face in an adult state-of-the-art forgery detectors in various scenarios and
video [4]. Video clips using various deepfakes methods synthesize show the best performance.
faces of a number of celebrities into pornography, which was first
This paper is organized as follows. We discuss related work of
appeared on Reddit in 2017. They can not only humiliate and
fake image detection in Section 2. A new handcrafted dataset and
intimidate the victims but also harm their reputation. Anyone can
its generation method are introduced in Section 3. We explain
easily produce such types of fake videos using online tools. More-
our approaches to detect facial manipulations in Section 4, and
over, various deepfakes technologies can target politicians [12],
describe our evaluation results in Section 5. Section 6 provides the
creating fake news, malicious hoaxes [13,14], etc [15]. Conse-
discussion and limitations of our work. Finally, Section 7 offers
quently, malicious usage of such machine learning-enabled digital
our conclusions.
technologies is causing significant problems in the development
of fake pornography, hate crimes, and various types of fraud.
2. Related work
To prevent and detect such malicious use of image forgery
technologies, various detection techniques have been proposed.
First, prior research by Huh et al. [16] detects fake images gener- Several researchers have previously proposed various digital
ated by image editing tools by analyzing metadata information image forensic algorithms and methods [20–26]. Many previous
and image compression techniques to determine the authen- methods used the characteristics of image formats and metadata
ticity of images. However, it is relatively easy to change and information to assess image authenticity. While various forensic
methods and algorithms have been developed and advanced to
erase metadata information to bypass these detection algorithms.
identify forgery and false components in images, detection con-
On the other hand, Adobe has developed a deep learning-based
tinues to be a problematic issue, as the malicious use of new
model to verify if certain parts of the image have been ma-
imaging and machine-learning techniques is also advanced. Since
nipulated. Nevertheless, it has difficulty detecting images with
different ways of making a counterfeit image and techniques are
composite facial landmarks, where some parts of the face are
emerging in the image forgery domain, it makes them difficult to
swapped. Furthermore, these techniques are not effective in the
detect.
case of detecting GAN-generated fake images [8], where entirely
Another common case to create fake multimedia is to cre-
new images are created from scratch. Also, it is extremely chal-
ate fake pornography of celebrities or revenge porn, which was
lenging to develop a classifier that can effectively detect both
launched on Reddit in 2017 and is commonly known as Deep-
fake images generated from photo-editing tools by humans as
fake. It has become evident that deepfakes methods can produce
well as machine-generated by GANs. To address this issue, we
fake porn of famous actresses. Deepfakes may also be used to
focus on the problem of detecting human-created as well as
build false news and destructive fabrications on politics, such as
GAN-generated fake facial images with neural network-based
deepfake of Barack Obama [15]. In addition, a deepfake video
detection models using only RGB information in an image and
showing Belgium’s prime minister speaking of the Covid-19 sit-
not using any metadata. In fact, many GAN-generated images
uation was posted [27]. Therefore, technology to identify and
(Progressive Growing of GANs (PGGAN) [8]), and real images
avoid the spread of such fake images is crucial. In this paper,
(CelebA [17]) are available to train the classifier to detect GANs.
we develop an algorithm to detect if face images are created
Hence, a large number of training data makes it possible to train
by a photoshop tool or machines (GANs). Currently, traditional
a deep neural network that can detect such GAN-generated fake
digital media forensic tools fail to address this issue. Hence, our
images. However, detecting handcrafted fake face images made
work focuses on distinguishing handcrafted and GAN-generated
by humans is particularly challenging because of the limited
fake face images from real peoples’ face images, and develop
availability of such training data. While Kim et al. [18] released a
fundamental enabling classifier technology that can detect them
manipulated human face dataset, this dataset is relatively small to
with high fidelity. In the next sections, we introduce several
train a neural network classifier. Therefore, there is a significant
notable research directly relevant to fake face image generation
need to develop more datasets to detect handcrafted face images
and detection.
using photo-editing tools effectively. In this work, we collect and
generate a novel Handcrafted Facial Manipulation (HFM) dataset
2.1. GAN-generated images
using Adobe Photoshop editing tool, which is composed of 1527
fake face images. In addition, we propose Shallow-FakeFaceNet
(SFFN), a neural network-based detection model, with an efficient Generative Adversarial Nets (GANs), first introduced by Good-
facial manipulation detection pipeline to distinguish fake face fellow et al. [7], where GANs can effectively generate synthetic
images created by humans and GANs. Our trained classifier takes but highly realistic images. Some of the most prominent re-
an image as an input and generates the possibility of that image search about GANs is by Zhu et al. [28] and Van den Oord
being fake or modified. Our contributions are summarized as et al. [29–31]. Currently, the state-of-the-art approaches typically
follows: employ autoregressive models [29], and variational autoencoders
(VAE) [32]. Karras et al. [8] recently showed that GANs produce
1. We introduce a new Handcrafted Facial Manipulation new faces by progressively growing both the generator and the
(HFM) [19] dataset and publicly release it for further re- discriminator. This method produces incredibly realistic face im-
search in the applied soft computing area to assist in ages, making it difficult for humans to determine whether they
detecting various facial forgeries. are real people. While applications can be used positively with
2. We propose a novel neural network-based classifier, this technology, GANs can create fake human faces and be mali-
Shallow-FakeFaceNet (SFFN), with an effective end-to-end ciously used to harm people. Furthermore, the creation of these
fake face detection pipeline that can detect fake face im- realistic fake faces can trick facial recognition algorithms, and
ages generated by humans as well as GANs. Our soft com- attackers can create several of these fake photos to deceive people
puting method can combat facial forgeries in online ser- and potentially trigger social problems. They can create false
vices and be applied as a backbone detection network in social identities, for instance, with the image of a nonexistent
Social Network Services. person.
2
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

2.2. Fake face datasets is another popular method to check the image forgery in the
frequency domain [21,22], where 2D-DWT decomposes a gray
Recently, there has been an active interest in the research scale-image into four wavelet coefficient-domains. Each domain
community to detect a particular type of face forgery, com- represents each pixel’s correlation in different directions, such
monly known as Deepfakes. Rossler et al. [33,34] developed Face- as diagonal, horizontal, and vertical. The correlation might be
Forensics and FaceForensics++ dataset from 1000 YouTube videos. the evidence to show whether the image is manipulated or not.
Similarly, Li et al. [35] developed CelebDF(v2) dataset using 58 However, these methods do not work well against images with
different celebrities videos from YouTube. The purpose of devel- sophisticated and smoothed edges, which we consider in this
oping these datasets is to help researchers with the detection of work.
deepfakes. Methods such as Face2Face [36], FaceSwap [37], and JPEG Ghost [23] was suggested in order to fix certain short-
DeepFake [38] are the most popular methods to develop these comings of frequency domain detection methods. Usually, the
deepfakes. However, these deepfakes are not the only way to fabricated component is replicated from images of different JPEG
create a fake image. Recently, it is becoming effortless to use quality, when an image is altered. The JPEG Ghost checks on
photoshop tools such as Adobe Photoshop [1] to forge images that same image to find various JPEG qualities and extracts the
with exceptional quality. Therefore, it is as important, if not difference. The uniform pixel distance between the original image
more important, to detect these manipulated images. Due to and a recreated image of different JPEG quality can indicate
this reason, in this paper, we introduce a new dataset which the differing image quality regions. Nevertheless, this approach
contains photoshopped face images with three different level cannot be practical, if the replicated areas retain the same degree
of editing qualities to detect both human-manipulated images of image quality as of the original image, which ensures that the
created using photo-editing tools such as Adobe Photoshop as area has the same quality as neighboring regions.
well as machine-generated. Furthermore, the ELA [24], another technique for using error
rate in JPEG files, is proposed. Modified regions typically have
2.3. Deep learning-based classifiers differing error levels from unmodified regions, which can be de-
tected by re-saving images at various error rates. This method can
Image classification with neural networks is useful for digital work well if parts of an image are modified but fails on images
forensics on big data workflows such as healthcare, security, and when forged regions are carefully smoothed (brushed) into the
anomaly detection [39]. Besides, neural networks can be used for original image using Photoshop. Moreover, ELA cannot recognize
image forgery detection. For example, image classification and the different error levels for images produced by GANs at all
recognition models based on the Convolutional Neural Networks because there are no such artifacts. Therefore, the approach is
(CNNs) can be trained to differentiate between fakes and real not helpful. Advanced tools such as FotoForensics [47] and MMC
images. VGG16 and VGG19 [40] enhanced the identification of Image Forensic Tool [48] use a variety of information, including
large-scale images by increasing the depth of the layer (23/26 metadata, ELA, and JPEG consistency. Sophisticated attackers can,
layers) of their CNN model. Likewise, more data is required in however, quickly reverse these methods by covering or changing
these deeper neural network models. ResNet [41] presents a metadata.
framework for residual learning, which makes network train- Huh et al. [16] proposed a method to detect fake imagery using
ing much easier. They reformulate the layers with reference to self-consistency, determining whether a single imaging pipeline
the layer inputs as learning residual functions using shortcut produces the content of the image. The EXIF metadata plays an
connections, rather than learning unreferenced functions. This essential role during the training phase of this method. We tested
can help with model optimization and improve accuracy for our dataset with this self-consistency method and showed that in
the deeper network. DenseNet [42] is a network built to feed the absence of metadata information, the excellent quality pho-
each layer into every other layer, where each layer is input toshopped image could not be detected. In contrast, our method
with the features of all former layers. To achieve state-of-the- performs much better.
art efficiency, DenseNet requires significantly fewer parameters Tariq et al. [10,11] proposed neural network based methods
and computation. Although these image classification models to detect GAN-generated images with high accuracy. For image
require essential architectural engineering, NASNet [43] is con- forgery detection, Zhou et al. [25], and Cozzolino et al. [26]
cerned in searching an architectural building block for a small proposed to include more features in their deep learning archi-
dataset and transforming it into a broader dataset. It examines tecture. In particular, noise features extracted from a filter in rich
the best convolutional layer on a given dataset and uses it to model steganalysis are used as input into a CNN-based network
create a convolutional architecture. This aspect of NASNet leads to examine the noise’s inconsistencies. Schlegl et al. [49] utilized
to the design and transferability to new search space. In addition, only normal images for training the GAN-based model f-AnoGAN,
Xception [44] shows great performance compared to the other separating the reconstruction deviation between normal and ab-
previous models on the ImageNet dataset by developing a deep normal images. Furthermore, numerous studies have been in-
learning classifier with depth-wise separable convolutions. In troduced that leverage existing CNN-based models by modifying
this paper, we experimented with all these deep learning-based them and adding modules for fake image detection [50–55]. How-
classifiers. ever, such methods need a significant amount of training samples
and cannot work well for sophistically created images, as the
2.4. Forgery detection methods apparent noise characteristics become challenging to separate.

Analyzing images in the frequency domain is one of the 3. Handcrafted Facial Manipulation (HFM) dataset
most common approaches to exploit the compression history,
which is recorded when pictures are compressed with JPEG. As One of the core contributions of our work is to produce a
many researchers have shown, the study of this compression manually crafted high-quality facial manipulation dataset. The
history in the frequency domain can reveal different properties dataset contains 1527 forged images developed with multiple
in the forged regions [20,45,46]. However, the frequency domain levels of editing complexity, and 621 original images that are used
analysis-based methods do not work well with images with the to make the fake faces. Several skilled personals developed this
delicate and polished edges. Discrete Wavelet Transform (DWT) dataset using the Adobe Photoshop tool [1].
3
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Fig. 1. The range of fake parts, where the left is original image and the right is fake. (a) a part of the face, (b) two or more parts of the face, (c) half of the face,
(d) the whole face, (e) additional parts (e.g., sunglasses, and mustache), and (f) multiple faces in an image.

3.1. HFM dataset creation others are very difficult to spot the modified parts. Therefore,
to simulate the differing levels of complexity for the manually-
We believe that attackers can manually create very sophis- created fake images by humans, we decide to use three different
ticated fake photos using photo editing software. However, not difficulty levels to create fake images, as follows:
many handcrafted fake face image datasets are available. There-
• Lv.1-image: We crop the specific part of the source image
fore, we recruited several artists in our university who are skilled
and merely paste it on the same part of the target image
with photo editing tools to create fake face image dataset using
without further refinement.
Adobe Photoshop CS6, where some samples are presented in
• Lv.2-image: From Lv.1-image, we smooth the edges of the
Fig. 1. pasted area on the Lv.1-image to improve the image quality.
• Lv.3-image: Lv.3-image, we further adjust the level of color
3.1.1. Source image specifications and light on the Lv.2-image to make the image more realis-
We collected the 621 original source images using Google im- tic.
age search that include human faces with various characteristics.
In particular, we set the image usage right option to ‘Labeled As shown in Fig. 2, the Lv.1-image is relatively easy to recognize
for reuse with modification’ in order to free from any copyright by ordinary people as there are rough edges of the pasted image;
issues. In addition, to make a more diverse dataset, we consider however, Lv.3-image is the hardest to recognize by regular people
the face images of men and women with different age groups as all the edges are smooth, and light is adjusted.
and races to prevent bias in a training set. We also include more
challenging face samples with heavy make-ups, beards, shades, 3.1.3. Types of modifications
glasses, caps, etc. In addition, we performed six different types of modifications
to create fake images based on the face regions, as shown in Fig. 1
and described as follows:
3.1.2. Forged image complexity
While several images consist of only one face, as shown in • Modification (a): One part of the facial landmarks is mod-
Fig. 1a, we also include images that contain multiple people. ified or swapped with a target image (e.g., eye, nose, and
In this case, either one or multiple faces are forged, as shown mouth).
in Fig. 1f. Furthermore, we divided the forged image quality • Modification (b): Two or more parts of the facial land-
into three different difficulty levels based on the editing com- marks are modified or swapped with multiple target images
plexities in the following and some samples are presented in (e.g., eyes, nose, and mouth).
Fig. 2. The reason for dividing the levels is because we observe • Modification (c): Half of the face is swapped with a target
that sometimes manually created fake images posted on SNSs image.
such as Twitter, Instagram, and Facebook have different editing • Modification (d): Whole face is swapped with a target
complexity. Some of the forged images are very easy to tell, and image.
4
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Fig. 2. Three quality levels of creating images: Lv.1 is cropped and pasted, Lv.2
is cropped, pasted, and smoothed edges, and Lv.3 is cropped, pasted, smoothed
edges, and adjusted color and light levels.

Fig. 3. Frequency shift on two datasets (PGGAN and Handcrafted Facial Manip-
ulation (HFM) dataset) where the X -axis indicates the Spatial Frequency and
• Modification (e): Facial accessories or facial hairstyles are the Y -axis shows the 1D Power Spectrum. The red and blue line indicates the
attached to the source image (e.g., sunglasses, or mustache). mean value of each shaded regions in the same color. Compared to 1D power
• Modification (f): Multiple faces in the image are forged spectrum statistics on PGGAN dataset (a), our HFM dataset (b) is much more
challenging to separate even in the high-frequency domain.
using the above five methods.

3.2. Validating the quality and difficulty of classifying HFM dataset


method to evaluate the quality of our HFM dataset, where IS
To evaluate the quality of the generated fake dataset, Salimans generates a score that indicates the number of classes expected.
et al. [56] uses a crowd-sourcing platform to evaluate a large For example, if the IS score of a two-class dataset is close to 2.0,
number of GAN generated images. However, they also tried to re- it means the dataset is easy to classify. On the other hand, the
move the subjective human evaluation of images by introducing dataset is difficult to classify, if it is close to 1.0. Our HFM dataset
the Inception Score (IS). Inception Score (IS) is an objective metric has a IS score average of 1.0045886 and a standard deviation of
for evaluating the quality of synthetically generated images by 0.0007829073. Out HFM’s IS score confirms and validates that our
applying the Inception model [57] to each generated image in generated fake images are difficult to classify on average using the
order to obtain the conditional label distribution. We utilize this objective Inception Score (IS) metric.

Fig. 4. Our end-to-end pipeline to detect facial manipulated images.

5
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Algorithm 1: MTCNN Noise Filtering


Input: Input image x
Output: Filtered & cropped faces Dfaces
1 M ← MTCNN(x) /* where M = {m1 , m2 , m3 , ..., mn } is a
set of detected faces from the MTCNN face
detector. */
2 ri ← mw i + mi
h
/* where i ∈ {1, 2, 3 . . . , n} and each mi
has the attribute of width mwi and height mhi of the
cropped face. */
3 R ← {r1 , r2 , r3 , ..., rn }
4 rmax ← max(R)
5 for i ∈ {1, 2, 3 . . . , n} do
6 if ri ≥ rmax
τ
then
7 Append Dfaces ← mi
8 else
9 Discard mi as Noise
Fig. 5. An example of MTCNN face detection and noise filtering, where the faces 10 end
in red rectangles should be ignored by the filtering algorithm, and the faces in
11 end
green are used as an input to the classifier.

Another way to check the generated image quality and diffi- In addition to occluded faces, we also need to consider small
culty of a dataset is to transform an image from a time domain to artifacts (random objects that are not faces) detected by the
a frequency domain. Durall et al. [58] introduced the frequency MTCNN algorithm as false positives. To mitigate false positives in
transformation technique to check the difference between real an image with multiple faces, we further perform a noise filtering
and fake images across all spatial frequencies. We show and strategy. The algorithm for MTCNN Noise Filtering is provided in
compare the frequency shift of our HFM dataset and Progressive Algorithm 1.
Growing of GANs (PGGAN) [8] in Fig. 3 for illustrative purposes. We first take the input image x and extract the faces as shown
Fig. 3(a) shows the 1D Power Spectrum statistics on the PGGAN in Algorithm 1, where M is a set of faces with mi being the ith
dataset, where we can recognize the gap across all spatial fre- extracted face from the input image x using MTCNN algorithm,
quencies. However, the gap between the fake and real images is mw h
i and mi indicates the width and height of the extracted face
quite small with our HFM dataset, as shown in Fig. 3(b). This small mi respectively, and ri is calculated by adding the width and the
gap clearly demonstrates that our HFM dataset is quite difficult to height of the face. Then, we use the maximum value from R,
classify. This means that handcrafted face images in our dataset which is a set of ri to distinguish noises and detected faces. We
are not trivially detectable, and it is a challenging task. discard the faces or objects that are smaller than a certain thresh-
old (τ ) compared to the largest face in the image. In our work,
4. Fake face image detection framework we set τ as 1.732 which is empirically calculated by counting
the false positives from several experiments with training data.
In this section, we describe our face manipulation detection However, if there are cases where false positives are large enough
pipeline, which is used to distinguish real and manipulated fake to pass the filtering step, we determine them as noises in our
facial images. Our detection pipeline includes facial image pre- training dataset.
processing as well as the details of our proposed detection model
Shallow-FakeFaceNet (SFFN). In our model, we do not use metadata 4.2. Image upscaling methods
as it can be forged as well, and we distinguish facial manipulated
images with only RGB channel information from the image. After cropping the face region and filtering the raw input im-
We illustrate our fake face image detection end-to-end pipeline age’s noise, around 8% of the total images from our HFM dataset
in Fig. 4. We divide our detection pipeline into two stages, where were smaller than 128 × 128 pixels. For those small images, it
the first stage is to perform facial image preprocessing to (1) is very challenging to distinguish real vs. fake due to the small
crop the face region, (2) filter cropped faces, (3) apply upscaling number of pixels. Therefore, it is important to increase the size of
methods, and (4) augment the data. The second stage is to pass the low-resolution image carefully. For resizing, we compare the
the preprocessed faces to the classifier models to train and de- following two different upscaling methods: (1) Nearest Neigh-
tect facial manipulations, using our detection model architecture bor Upscaling (NNU) method, and (2) Facial Super-Resolution
SFFN. (FSR) [61] upscaling technique.
NNU replaces every pixel with the nearest pixel while up-
4.1. Cropping face regions scaling an image, resulting in multiple pixels with the same
color. Unlike the NNU technique, which copies the adjacent pixel
In order to focus on facial part from the entire image, we values, the FSR is an upscaling method that specifically targets the
utilized the MTCNN [59,60] face detector to crop the face in reconstruction of face images. FSR progressively trains a GANs-
an image automatically. However, after applying the MTCNN in based neural network model with a Facial Attention Loss to focus
an image, it can detect incorrect areas such as hands, objects, on facial landmarks while upscaling. The FSR generator trains
etc, producing false positives. Besides, in the presence of mul- the input image with three steps, with each step doubling the
tiple faces, the MTCNN can also detect occluded and relatively resolution of the input image. For example, 16 × 16 input image
small faces, as illustrated as red boxes in Fig. 5. We assume that is upscaled to 32 × 32 in the first step of the generator, then
occluded or too small faces in an image are difficult to forge; 32 × 32 to 64 × 64, and finally 64 × 64 to 128 × 128. This
therefore we consider this as noise or false positives from our network can train and generate an image that is upscaled up to
cropping algorithm. 8 times higher resolution of the input image.
6
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Fig. 6. A side-by-side comparison between Nearest Neighbor Upscaling (NNU) and Facial Super-Resolution (FSR) methods. The original image has a size of 88 × 88.
The above images show the upscaled faces to 128 × 128, and 256 × 256 using FSR, while the below images present the upscaled faces using the NN upscaling
method.

The example of using both NNU and FSR methods is illustrated • ImgAug [65]: It converts a collection of input images to a
in Fig. 6, using one of the HFM data. In this example, we upscale new, much larger set of slightly modified images.
the original image from 88 × 88 to 128 × 128 and 256 × 256
An example of one image using 6 different Keras augmen-
using FSR and NNU. The above images illustrate the result from
tation techniques is shown in Fig. 7(a), and 64 ImgAug aug-
FSR, while the below images show the result from NNU. One of
mentation techniques (geometric transformations, color and con-
the most significant disadvantages of the NNU is the pixelation
trast adjustments, arithmetic changes, artistic mode, convolu-
effect, which introduces a square shape in an image caused by
tional options, blending, blurs, etc.) is shown in Fig. 7(b). Further-
cloning the surrounding pixels while upscaling. We can clearly
more, we conduct an ablation study in Section 5.1.6 to identify
observe this pixelation effect by utilizing NNU, even with the
the minimum number of training data required to achieve the
largest image size, as shown in the red box in Fig. 6. However, FSR
maximum detection efficiency using different data augmentation
can reduce this phenomenon significantly, resulting in a better
techniques.
quality of the upscaled image. Therefore, in this work, we use FSR
for upscaling the image size and compare the results with NNU
4.4. Shallow-fakefacenet architecture
in Section 5.1.5.
To detect facial manipulations, we first design various CNN-
4.3. Image dataset augmentation
based classifiers. Based on our initial observations and experi-
Generating handcrafted fake images is a time-consuming and ments, surprisingly, neural networks with great depth, such as
meticulous task. We created and utilized a total number of 2148 Xception [44] and DenseNet [42] performed very poorly on small
handcrafted images (1527 fake and 621 real). We also included image sizes (64 × 64 and 128 × 128). Deeper networks such
the openly available Real and Fake Face dataset (RFF) [18] (960 as Xception and DenseNet can learn more about hierarchical
fake and 1081 real) into our training set, making the total to 4189 representations of the training dataset. And these deeper net-
images. However, this number is relatively small to train deep works performed well on complex or high-resolution images by
learning models. To handle the data paucity issue in detection increasing the depth of the network. However, if the number of
tasks, Hussain et al. [62] and Eaton-Rosen et al. [63] suggested the layers in CNN increases, the number of parameters also increases
use of data augmentation methods for medical imaging domains, rapidly. We hypothesized that if the input image resolution is too
which can increase the training dataset size. To address this issue, small, it is inefficient and does not have enough information to
we also employ image data augmentation, where data augmen- learn from the low-resolution images. On the other hand, deep
tation is a technique that can be applied by creating transformed architectures would require a large amount of data to achieve
versions of images. Data augmentation can increase the accuracy high performance. For example, Xception was built to classify 350
of classification tasks and further improve the generalizability of million high-resolution images with 17,000 classes. Therefore, we
the deep learning model by introducing different variations of developed our own model with a shallow convolutional neural
the images [64]. In particular, we tried the following two data network (CNN) architecture, where we refer to our approach as
augmentation methods: Shallow-FakeFaceNet (SFFN). The details of the architecture are
presented in Fig. 8.
• Keras image preprocessing real-time data augmentation For each layer in SFFN, we utilize an L2 kernel regularizer
pipeline: It performs image shifting, shear, zoom, and flips. with a value of 0.0001. SFFN can detect the subtle difference
7
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Table 1
Handcrafted facial manipulation image detection performance with different CNN-based models on two different image resolutions (128 × 128 and 256 × 256)
where the bold text indicates the best performer within the same input image size and underlined text indicates the improved metric on using 256 × 256 size
image compared to the smaller input image (128 × 128).
Model Input image resolution
128 × 128 256 × 256
Precision (%) Recall (%) F1-score (%) AUC (%) Precision (%) Recall (%) F1-score (%) AUC (%)
Xception 61.51 61.40 61.31 63.30 63.97 63.90 63.86 68.83
Xception (IN) 66.22 66.20 66.19 69.61 66.27 66.00 65.86 66.74
Inception ResnetV2 63.41 63.40 63.39 66.28 62.80 62.80 62.80 66.04
ResNext 64.68 64.20 63.91 68.23 63.81 63.70 63.63 68.88
MesoNet 25.00 50.00 33.33 50.00 25.00 50.00 33.33 50.00
Adobe 50.99 50.70 46.79 47.47 50.99 50.70 46.79 47.17
SFFNV3 (Ours) 64.71 62.80 61.55 69.47 70.26 70.20 70.18 72.52

Fig. 8. Visual representation of different Shallow-FakeFaceNet architectures,


where the ReLU, Batch Normalization and Dropout layers are denoted as ‘R’,
‘BN’ and ‘D’, respectively.

pooling (3 × 3 with the stride of 2) and repeating this 6 times.


The last layer is composed of two dense layers with a size of
3933 and 2, as shown in Fig. 8(a). To have a strong regularization
effect, we add dropouts after convolutional layers. These dropouts
can also prevent overfitting when training with fewer samples, as
mentioned by Park et al. [66]. In our SFFNs, all of the dropout rates
are set to 0.25. The objective behind constructing the network in
this way is to keep architecture resource-efficient with minimum
Fig. 7. Example of two different augmentation methods.
complexity. Our initial findings show that SFFNV1 in Fig. 8(a)
had lower detection performance on fake face images with small
sizes. As a deeper neural network cannot learn significantly from
small images, as shown in Fig. 8(b), we developed a shallower
between real and fake facial images and provide very high fake architecture than SFFNV1 for Shallow-FakeFaceNetV2 (SFFNV2)
detection accuracy, even on tiny images such as 128 × 128, which with just eight convolutional layers. All convolutional layers have
are also difficult to differentiate by human eyes. We developed a kernel size of 3 × 3, except the last one has 1 × 1. We also
three different versions of SFFN with different network settings. decreased the size of the last dense layer to 1024 and 2. This
Shallow-FakeFaceNetV1 (SFFNV1) is composed of 3 convolutional change in architecture showed a performance boost for small-
layers (kernel size of 3 × 3, 3× 3, and 1 × 1), followed by max sized images and maintained the same detection performance
8
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

with SFFNV1 for larger images in our preliminary experiments. We combine the HFM and RFF datasets to construct 2487
To further reduce the computational cost and training time of (HFM: 1527 and RFF: 960) handcrafted facial manipulated images
SFFNV2, we developed Shallow-FakeFaceNetV3 (SFFNV3) by in- and 1702 (HFM: 621 RFF: 1081) original input source images as
troducing max pooling layers back in SFFNV2 architecture, as real images. After cropping and filtering a total number of 4189
shown in Fig. 8(c). We adjusted the kernel size of each convolu- (2487 fake and 1702 real images) in the preprocessing step, the
tional layer (Conv2D) in SFFNV3 (1st Conv2D kernel size: 5 × 5; number of cropped faces increased to 4645 (2911 fake images
2nd, 3rd, 5th, 6th and 8th Conv2D kernel size: 1 × 1; 4th and 7th and 1734 real images) due to the photos with multiple faces.
Conv2D kernel size: 3 × 3). To balance the two real and fake image classes (2911 each), we
The detection performance of SFFNs and comparison with added 1177 real images from the CelebA dataset, making it a total
other state-of-the-art CNN-based classifiers are described in Sec- of 5822 preprocessed images.
tion 5. We provide preprocessed facial images separately in each
model for training different classifiers. And each classifier is as- 5.1.2. Experimental setup and baseline models
signed to the same task of calculating the probability of an image We evaluated the following CNN-based models using Keras
being fake. In addition, we utilized the photoshopped face detec- python deep learning library [68] to detect handcrafted fake face
tor model from Adobe [67] and compare the results with other images: Xception, Xception with ImageNet pretrained (Xception
CNN-based models. (IN)), Inception Resnet, ResNext, VGG19, MesoNet, NasNet, and
Shallow-FakeFaceNetV3 (SFFNV3). Additionally, we compared the
5. Experiments and results
results of CNN-based methods with the photoshopped face detec-
tor model from Adobe [67].
We have performed extensive experiments to evaluate and
compare the performances of our detection method in two differ- As Xception achieves the highest detection rate on handcrafted
ent cases (handcrafted and GAN-generated fake face detection). fake face dataset in prior research [11], we retrained Xception
additionally with ImageNet pretrained weights. To evaluate the
5.1. Handcrafted facial manipulation (HFM) image detection Adobe detection model, we utilized the trained weights that the
authors [67] provided and tested with our dataset. To compare
We evaluate the models’ performance to distinguish fake the performance among different models, we first fixed the input
handcrafted face images from the real photos with the follow- image size to 128 × 128 pixels. Images smaller than 128 × 128
ing experiments: (1) using different CNN-based classifiers, (2) are upscaled using the Facial Super-Resolution (FSR) method [61],
training with various input image sizes, (3) comparing different and we applied Keras’ real-time augmentation to augment the
augmentation methods, and 4) evaluating the effectiveness of training dataset. The models are trained with the ADAM optimizer
upscaling techniques. Our final handcrafted fake image detection minimizing the binary cross-entropy loss function LBCE in Eq. (1):
end-to-end pipeline, including Shallow-FakeFaceNet (SFFN), is N
described in Fig. 4. We first take the original images and crop 1 ∑
LBCE = − yi · log(p(yi )) + (1 − yi ) · log(1 − p(yi )), (1)
the faces with the MTCNN [59,60] face detector. After cropping, N
i=1
we filter out the noise, such as occluded faces and artifacts that
are not faces detected by the MTCNN algorithm. The faces that where y is the label (0 for real and 1 for fake) and p(y) is the
are smaller than the model input size, as shown in the middle predicted probability of the image being fake for all N images.
of Fig. 4, are upscaled using the FSR [61] method. For training, We trained the models up to 200 epochs, with batch size 32, and
we augment the data to generalize the model and train with the learning rate is set to 0.00005.
maximum efficiency with a given number of images. Finally, we
use the test set to evaluate our trained model. 5.1.3. Performance on Handcrafted Facial Manipulation (HFM)
dataset
5.1.1. Dataset descriptions For the evaluation, we tested 1000 preprocessed test images
To detect handcrafted facial manipulations in the image, we (500 fake and 500 real images) and measured Precision, Recall,
utilized the following three datasets: F1-score, and AUROC [69]. In particular, AUROC measures the
practicality of a model, where a larger area indicates a more
• Handcrafted Facial Manipulation (HFM) Dataset: This
practical model. The detection results are provided in Table 1,
dataset contains 1527 face manipulated images made by
where the performance metrics are Precision (Pre.), Recall (Re.),
humans using the Adobe Photoshop [1] tool. We collected
F1-score (F1), and AUROC (AUC) and the bold text indicates the
621 source images from Google image searches that are
used to create the HFM dataset. The manipulated images best performing model within the same input image size. Both
are generated in three different levels of complexity, with MesoNet and Adobe methods showed weak performance with
six different modifications, as shown in Fig. 1 and 2. 50.00%, and 47.47% AUROC, respectively, similar to or worse than
• Real and Fake Face (RFF) [18] Dataset: This additional a random guess. On the other hand, Xception with ImageNet
RFF fake images are available on Kaggle [18], including 960 pretrained weights outperformed other competing models with
expert-generated photoshopped face images and 1081 real 69.61% AUROC. The AUROC for Xception, Inception ResNet V2,
images to generate the fake images. The images are com- and ResNext is 63.30%, 66.28%, and 68.23%, respectively. Overall,
posed of different faces, separated by eyes, nose, mouth, SFFNV3 is the best performer among all classification methods.
or whole face. This dataset only contains images with a
single face, while our HFM dataset also includes images with 5.1.4. Effect of different input image sizes
multiple faces with more diverse forging parts (facial hairs Additionally, we hypothesize that different image sizes would
and accessories). These added features further complicate yield different detection performance, similar to the GANs case
the HFM dataset and make it challenging to detect fakes by from prior research [10,11]. Therefore, to examine and analyze
the model. how training image size affects the model’s performance, we
• CelebFaces Attributes (CelebA) Dataset: For training real compare the results of the same models over different input
face images, we utilized the CelebA dataset, which con- image sizes. We train the classifier models for distinguishing
tains original celebrity images [17], and chose 1177 celebrity fake and real images with the following two different cropped
images. input image sizes: 128 × 128, and 256 × 256. We present the
9
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Table 2
Performance evaluation with Facial Super-Resolution (FSR) method. The bold text indicates a performance increase compared to using Nearest Neighbor Upscaling
(NNU) method.
Model NN FSR
Precision (%) Recall (%) F1-score (%) AUC (%) Precision (%) Recall (%) F1-score (%) AUC (%)
Xception (IN) 65.52 65.20 65.02 66.21 66.22 66.20 66.19 69.61
SFFNV3 (Ours) 56.77 56.10 54.99 57.98 70.26 70.20 70.18 72.52

Table 3
Detection model performance comparison with different augmentation framework, where the bold text indicates the improved results compared to the model with
no augmentation.
Model Without augmentation (%) Keras framework (%) ImgAug library (%)
Precision Recall F1-score AUC Precision Recall F1-score AUC Precision Recall F1-score AUC
Xception (IN) 63.90 63.20 62.73 58.85 66.22 66.20 66.19 69.61 60.49 60.30 60.12 57.36
SFFNV3 (Ours) 54.99 54.70 54.03 54.84 70.26 70.20 70.18 72.52 55.87 55.10 53.58 55.67

results in Table 1, where the underlined texts show the improved


performance compared to the same model testing on smaller
images. Most models did not show performance improvement.
However, Xception and the SFFNV3 were the only two models
that improved performance on both F1-score and AUROC met-
rics. Xception has an improvement of 2.55% F1-score and 5.53%
AUROC. For SFFNV3, the performance was improved by 8.63% F1-
score and 3.05% AUROC, as shown in Table 1. We also provide the
confusion matrices for Table 1 in Fig. 9.

5.1.5. Effect of different super-resolution methods


After cropping and filtering out the noise from our HFM
dataset, we have 8% of the cropped faces that are smaller in size
than 128 × 128 pixels. Images smaller than the input size of
the model need to be upscaled to fit the model. To this end, we
hypothesized that a better facial upscaling technique optimized
to the face could provide more sufficient details of the input
image to the model.
We compare the result of two different upscaling methods: (1)
Nearest Neighbor Upscaling (NNU) method and (2) Facial Super-
resolution (FSR) in Table 2. Upscaling a facial image through
NNU is simple, and it increases resolution by copying the nearest
pixels. However, this means multiple neighboring pixels will have
the same color, resulting in a pixelated image, as shown in Fig. 6.
Therefore, the resulting visual quality also may get degraded.
On the contrary, the FSR method creates a model that learns
the face reconstruction through Generative Adversarial Networks
(GANs), which preserves facial attributes during upscaling. We
utilized the two best models from Table 1, which are Xcep-
tion with pretrained ImageNet weights (128 × 128) and SFFNV3
(256 × 256), to compare and demonstrate the effectiveness using
FSR images. Xception has a slightly increased AUROC score of
3.4% by training with FSR images. For SFFNV3, we observed a
drastic increase to 72.52% AUROC, compared to the NNU method
(57.98%) as shown in Table 2. We observed that if the input im-
age has to be upscaled, utilizing the facially-optimized upscaling
technique (FSR) can provide more information to the model than
the conventional image upscaling method (NNU).

5.1.6. Effect of data augmentation


For training the models, we used the preprocessed (cropped,
filtered, and upscaled) 4822 facial images, including 2411 real
and 2411 fake images. However, this number of images is rela-
tively small to train a CNN-based classifier model. To achieve the
best performance on distinguishing handcrafted fake face images
with a small dataset, we explored the following different data
Fig. 9. Handcrafted facial manipulation image detection confusion matrix with
augmentation framework:
different CNN-based models on two different image resolutions (128 × 128 and
1. Keras real-time image augmentation framework with the 256 × 256) where the X -axis is predicted label and the Y -axis indicates the real
label.
following parameter settings:
10
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Table 4
Performance comparison of different CNN-based models and their comparison with Shallow-FakeFaceNets.
Model 64 × 64 128 × 128 256 × 256 1024 × 1024
Precision (%) Recall (%) F1-score (%) AUC (%) AUC (%) AUC (%) AUC (%)
VGG19 49.00 49.50 49.25 56.69 55.13 57.13 60.13
Xception 79.00 63.00 70.10 79.32 79.03 82.03 85.03
NASNet 81.50 70.00 75.31 83.55 90.55 92.55 96.55
DenseNet – – – – – 83.88 99.20
SFFNV1 83.50 82.50 82.90 84.94 98.12 99.82 99.99
SFFNV2 72.00 71.50 71.75 79.82 99.98 99.99 99.99
SFFNV3 80.50 72.50 76.29 90.85 99.99 99.99 99.99
Ensemble (SFFNV1+SFFNV3) 81.50 73.50 77.29 93.99 99.99 99.99 99.99

Fig. 10. Dataset used for the GAN-generated image detection. Sample images from dataset are shown in 64 × 64, 128 × 128, 256 × 256, and 1024 × 1024
resolution.

(a) Random width or height shifting within 20% of the In the case of ImgAug, surprisingly, the overall F1-score per-
image. formance dropped compared to the model trained without it
(b) Random shearing or zooming within 20% of the im- due to the ImgAug’s large number of augmentations (Xception
age. (IN): from 62.73% to 60.12% vs. SFFNV3: from 54.03% to 53.58%).
(c) Horizontal flipping with 50% probability. Therefore, we conclude that Keras’ real-time image augmen-
tation (6 types of augmentations) performed much better to
2. ImgAug [65] framework provides the full image augmen- detect human-generated fake images, and ImgAug (64 types of
tation settings, including meta shuffle, arithmetic changes, augmentations) degraded the overall performance.
artistic mode, blending, blurs, collections, color adjust-
ments, contrast changes, convolutional options, flip, geo- 5.2. Additional experiment: GAN-generated image detection
metric adjustments, and resizing.
Similarly, the GAN-generated fake face image detection pro-
To compare each augmentation framework’s performance, we
cess follows the end-to-end classification pipeline described in
select two best performing models in the same way as we did
Fig. 4. The datasets we used for detecting GAN-generated fake
for the Super-resolution experiment. We utilized the FSR images images originally have high-resolution images with a size of
during training with different augmentation frameworks since it 1024 × 1024. For this reason, unlike detecting handcrafted fake
performed much better than NNU. We derive that both Xception face images, we did not perform any image upscaling method.
(IN) and SFFNV3 models performed better with Keras’ real-time Instead, we downscaled to smaller sizes such as 64 × 64 and
image augmentation framework, as shown in Table 3. Xception 128 × 128, according to the experimental settings. In the case
with ImageNet (Xception (IN)) pretrained weights increased its of detecting the HFM dataset, we utilized ImageNet pretrained
AUC performance by 10.76% with Keras image augmentation weights to initialize the model and retrained the whole network.
framework, where SFFNV3 has more AUC improvement of 17.68%. There are few benefits of using ImageNet pretrained weights
11
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

when training a classifier. First, initializing the model with the


pretrained weights can boost the performance in the early stage
of the training phase. Secondly, not as much labeled data is
required compared to non-pretrained weight models when train-
ing. However, in detecting GAN-generated image case, we utilized
200 K images for training the classifiers, which is significantly
larger than the HFM dataset and trained until convergence. There-
fore, we compared the networks with no pretrained weights with
our SFFNs in Table 4.

5.2.1. Dataset descriptions


To detect GAN-generated fake face images, we utilized the
following dataset:

• CelebFaces Attributes (CelebA) Dataset: CelebA dataset


contains more than 200 K original celebrity images [17].
• Progressive Growing GANs (PGGAN) Dataset: The PGGAN
dataset consists of 100 K fake celebrity images (1024 × 1024 Fig. 11. A side-by-side comparison of Class Activation Map (CAM) on two
resolution) that are generated by Karras et al. [8]. handcrafted images using the detection models trained with Xception (IN) and
Shallow-FakeFaceNetV3. Left: handcrafted faces. Middle: CAM with Xception
Examples of real (CelebA) and GAN-generated (PGGAN) im- (IN). Right: CAM with Shallow-FakeFaceNetV3.
ages in different sizes are presented in Fig. 10(a) and 10(b).
For training, we label the images in PGGAN and CelebA dataset
as ‘fake’ and ‘real’ respectively. They are then passed through 5.3.1. Visualizing class activation map (CAM) of detection model
deep neural networks to create a classification model, as shown To observe and analyze the fake face activations from our
in the end-to-end training pipeline in Fig. 4. model, we utilize Grad-CAM [70] to generate the implicit atten-
tion of our CNN-based models on an image. We generate the
5.2.2. Performance on GAN-generated facial images Class Activation Map (CAM) with the last convolutional layer of
We trained different deep learning models to distinguish GAN- two best performing models, Xception with ImageNet pretrained
generated fake images from real celebrity images, and evaluated (128 × 128 input size) and SFFNV3 (256 × 256 input size) as
their detection performance. We utilized 200 K images for train- shown in Fig. 11.
ing (20% of training dataset for validation) and 18 K for testing for The left column in Fig. 11 is the handcrafted faces from the
both classes. We conducted the experiments following different HFM dataset, where the nose in the top-left image and the mouth
training input image sizes: 64 × 64, 128 × 128, 256 × 256, in the bottom-left image are the manipulated regions. By vi-
and 1024 × 1024, as shown in Table 4. For measuring the per- sualizing the activations of the last Convolutional layer from
formance of different models, we used Precision, Recall, F1-score, both models, we observed that two models focus on the images
and AUROC similar to the handcrafted fake face detection exper- differently. We noticed that the detection model using Xception
iment, as shown in Table 4. (IN) seems to focus more on the middle part of the face. Si-
The performance of Shallow-FakeFaceNet (SFFN) outperformed multaneously, SFFNV3 is mainly activated on facial landmarks,
other neural network models even with the smallest input size where it can have the highest probability of getting forged. As
(64 × 64), as presented in Table 4, which is shown to be the shown in Fig. 11, we believe that the ability to focus more on
facial landmarks led to higher performance for SFFN for the HFM
most challenging case to detect. We achieved the best perfor-
dataset.
mance using an ensemble model of SFFNV1 and SFFNV3 with
93.99% (64 × 64) to 99.99% (128 × 128, 256 × 256, and 1024 ×
5.3.2. Comparison with image splice detection via learned self-
1024) accuracy, as shown in Table 4. Our results clearly demon-
consistency with and without metadata
strate that detecting the difference between real and
We also compare our approach with Huh et al. [16], the state-
GAN-generated images at lower resolutions with the state-of-
of-the-art image splice detection using learned self-consistency.
the-art deep neural network architectures such as Xception is
After testing some of the handcrafted fake facial images that we
quite challenging, as shown in Table 4. In particular, Xception
produced in HFM, Huh et al.’s [16] splice detection method was
and NASNet are deeper networks and are quite huge for low-
unable to perform without the metadata. On the other hand,
resolution inputs. Hence, we observe that the SFFN models with
as shown in Table 1 to Table 3, our method is able to obtain
relatively shallow structures have better detection performance
detection performance above 70%. In the presence of metadata,
in all input image sizes than the rest of the models with deep
approach from Huh et al. [16] has been successful in detecting
structures. Besides, deeper networks such as VGG19, Xception,
forged areas in fake face images, as shown in Fig. 12. The red
and NASNet have a continuous decrease as the resolution of the
regions are the modified areas. This method captured the right
image becomes smaller.
half of the forged face in the left image of Fig. 12 and the glasses
in the right image. However, there were many cases where even
5.3. Analysis of our detection model with HFM dataset fake images with metadata could not be detected with the image
splice detection method, as shown in Fig. 13.
In this section, we analyze our proposed facial manipulation We illustrate two examples in Fig. 13, where the left image is
detection method with the Class Activation Map (CAM) to de- the original image without any handcrafted features. The middle
termine the focused facial region from our model. Moreover, and right images are forged images with the modification com-
we compare our approach with Image Splice Detection with the plexity Lv.2 and Lv.3, respectively. The second row indicates the
presence of image metadata and describe the advantages of our result of using splice detection. For all handcrafted images (both
method. Lv.2 and 3), we were not able to observe much difference from the
12
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Fig. 12. The example of splice detection success using learned self-consistency,
where the red regions are modified areas [16].

original image and missed the detection. Therefore, this indicates


that image splice detection’s performance in the presence of
metadata still may not be effective in detecting handcrafted fake Fig. 13. The example of splice detection failure using learned self-consistency.
images, and our approach performs better on the HFM dataset.

6. Discussion and limitations of handcrafted fake images in the training dataset, we found
that our end-to-end detection pipeline, including various super-
Our neural network-based classifier Shallow-FakeFaceNet resolution and data augmentation preprocessing techniques, was
(SFFN) outperformed the state-of-the-art CNN-based classifiers very efficient in detecting handcrafted facial images.
on detecting facial forgeries. With a small amount of data and
low-resolution images, we surprisingly found that deeper net- 6.2. Challenges of detecting the novel HFM dataset.
works performed worse than a shallower network in our exper-
iments. In this section, we discuss our findings and challenges One of the significant challenges for detecting human-created
learned through various experiments. Furthermore, we discuss fake images is the lack of a large number of datasets. In order to
the limitations of our current work, especially for the diversity address this issue, we create a Handcrafted Facial Manipulation
and quantity in the HFM dataset. (HFM) [19] dataset comprised of 1527 handcrafted facial images
and 621 real images, and we also make this dataset publicly
6.1. Shallow vs. deep architecture. available for research purposes.1 We first release a few samples
from our dataset and plan to release the entire dataset once this
Contrary to the oversimplified notion that ‘‘deeper network paper is accepted.
is always better,’’ our experimental results demonstrate that a
shallower network can perform better on classifying fake images 6.3. Limitations.
with small resolution than deep architecture. Indeed, this result
is consistent with the premise of He et al.’s [41] research on
While we have demonstrated promising performance using
ResNet, where shallow networks performed better than deeper
different techniques in detecting handcrafted facial manipula-
networks unless residual or shortcut connections are introduced
tions, we have the following limitations with our approach: First,
in the architecture of deeper networks. Moreover, it is difficult to
we have not taken into account people with facial medical condi-
train a very deep neural network such as InceptionResNetV2 (572
tions in our HFM dataset. These include facial burns, bell’s palsy,
depth) or Xception (126 depth) with a limited amount of small
vitiligo, and rosacea, which can be detected as false positives.
resolution images because the input image becomes too small,
Therefore, future research can examine the impacts of algorithmic
due to the pooling operations. Therefore, it becomes difficult to
fake detection bias for people with facial medical conditions.
extract meaningful features when it passes through the deep
Besides, tiny faces in an image with multiple people are gen-
layers inside the network. Whereas a shallower network such as
erally discarded, while performing face detection and cropping.
Shallow-FakeFaceNet does not exhibit such behavior and can cap-
ture the essential features inside small resolution images provid-
ing better performance in this scenario. Despite the small amount 1 [Link]

13
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

Our filtering algorithm used the pre-defined threshold based on Declaration of competing interest
the empirical measurement. Therefore, when a small face in an
image with multiple people is manipulated and compressed to The authors declare that they have no known competing finan-
a low resolution, our method may not work correctly and cause cial interests or personal relationships that could have appeared
false negatives. To mitigate the false positives while cropping the to influence the work reported in this paper.
faces and prevent small faces from getting ignored, we plan to
combine multiple facial detectors, utilizing facial key points. A Acknowledgments
final decision technique can be applied after using multiple facial
detectors, such as ensembling, majority voting, etc. We expect We thank the reviewers for their insightful comments and
that employing a better face detector can eventually yield higher Siho Han for carefully reviewing our initial manuscript. This work
fake face detection performance. was supported by Institute of Information & communications
Technology Planning & Evaluation (IITP) grant funded by Korea
7. Conclusions and future work government Ministry of Science, ICT (MSIT) (No. 2019-0-01343,
Regional strategic industry convergence security core talent train-
Fake photography, such as hand-crafted images using photo ing business) and the Basic Science Research Program through
editing tools and machine-generated fake images, can cause social National Research Foundation of Korea grant funded by Korea
problems, including fake identification and defamation. In this pa- government (MSIT) (No. 2020R1C1C1006004). Additionally this
per, we tackle this problem by proposing a neural network-based research was partly supported by IITP grant funded by the Korea
classifier Shallow-FakeFaceNet (SFFN) with a novel Hand-crafted government (MSIT) (No. 2021-0-00017, Original Technology De-
Facial Manipulation (HFM) dataset. The main contributions of this velopment of Artificial Intelligence Industry) and was supported
paper are as follows: by the Korea government (MSIT), under the High-Potential Indi-
(1) At present, there are no hand-crafted fake images with a viduals Global Training Program) (2019-0-01579) supervised by
significant amount of images. Therefore, in this work, we create the IITP.
and contribute a fake face dataset, Handcrafted Facial Manip-
ulation (HFM), (see footnote 1) with multiple levels of editing References
complexity and various modifications in the face.
(2) We propose Shallow-FakeFaceNet (SFFN), along with an [1] Adobe, Adobe photoshop | best photo, image, and design editing software,
effective end-to-end fake face detection pipeline. With super- 2020, [Online; accessed 22-April-2020], [Link]
resolution and data augmentation, our SFFN detection model [Link].
[2] S. Caplin, Photoshop 2020 new features, 2019, [Online; accessed
shows promising results in detecting human-created fake images
31-December-2020], [Link]
and further capture the subtle difference between fake and real features.
handcrafted images with 72.52% AUROC using less than 2500 [3] W. Xiong, J. Yu, Z. Lin, J. Yang, X. Lu, C. Barnes, J. Luo, Foreground-aware
fake images for training. We also evaluate our approach on ad- image inpainting, in: Proceedings of the IEEE Conference on Computer
ditional GAN-generated face detection problem, achieving 93.99% Vision and Pattern Recognition, 2019, pp. 5840–5848.
[4] S. Cole, We Are Truly Fucked: Everyone Is Making AI-Generated Fake Porn
accuracy for the challenging low-resolution images. Now, Motherboard, 2018, pp. 1–15, [Online; accessed 31-December-2020],
We demonstrate that our proposed soft computing architec- [Link]
ture can assist in detecting and combating fake media contents. In daisy-ridley.
particular, possible applications for Shallow-FakeFaceNet can be [5] K. He, X. Zhang, S. Ren, J. Sun, Delving deep into rectifiers: Surpassing
human-level performance on imagenet classification, in: Proceedings of the
a fake face detector pre-checking system for uploading images in IEEE International Conference on Computer Vision, 2015, pp. 1026–1034.
Social Network Services such as Twitter, Facebook, and Instagram. [6] S. Ioffe, C. Szegedy, Batch normalization: Accelerating deep network
Moreover, media outlets can utilize our method to check the training by reducing internal covariate shift, 2015, arXiv preprint arXiv:
authenticity of the images automatically. Furthermore, we hope 1502.03167.
[7] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair,
that our HFM dataset can contribute to fostering and sprouting
A. Courville, Y. Bengio, Generative adversarial nets, in: Advances in Neural
further research in this area. Information Processing Systems, 2014, pp. 2672–2680.
For future work, we plan to increase the quantity and quality [8] T. Karras, T. Aila, S. Laine, J. Lehtinen, Progressive growing of gans for
of our HFM dataset by including diverse facial conditions to re- improved quality, stability, and variation, 2017, arXiv preprint arXiv:1710.
10196.
duce bias against a particular group and by making more diverse
[9] T.-C. Wang, M.-Y. Liu, J.-Y. Zhu, A. Tao, J. Kautz, B. Catanzaro, High-
handcrafted fake facial images using other software tools such as resolution image synthesis and semantic manipulation with conditional
Gimp, Pixlr, and Photoscape. Another avenue for future research gans, in: Proceedings of the IEEE Conference on Computer Vision and
would be to apply GANs to synthetically augment handcrafted Pattern Recognition, 2018, pp. 8798–8807.
fake facial images for training [71] or utilize transfer learning, a [10] S. Tariq, S. Lee, H. Kim, Y. Shin, S.S. Woo, Detecting both machine
and human created fake face images in the wild, in: Proceedings of
domain adaptation technique to improve detection performance the 2nd International Workshop on Multimedia Privacy and Security,
with a small dataset. By storing the knowledge gained while solv- in: MPS ’18, Association for Computing Machinery, New York, NY,
ing a similar facial fake detection problem such as detecting GAN- USA, ISBN: 9781450359887, 2018, pp. 81–87, [Link]
generated fake faces, DeepFakes [38], and FaceSwap [37] to our 3267357.3267367.
[11] S. Tariq, S. Lee, H. Kim, Y. Shin, S.S. Woo, GAN Is a friend or foe? A
model, we can transfer this knowledge to detecting handcrafted framework to detect various fake face images, in: Proceedings of the 34th
facial manipulations. ACM/SIGAPP Symposium on Applied Computing, in: SAC ’19, Association
for Computing Machinery, New York, NY, USA, ISBN: 9781450359337,
CRediT authorship contribution statement 2019, pp. 1296–1303, [Link]
[12] B.B.C. News, The fake video where johnson and corbyn endorse
each other, 2019, [Online; accessed 31-December-2020], https:
Sangyup Lee: Conceptualization, Visualization, Methodology, //[Link]/news/av/technology-50381728/the-fake-video-where-
Software, Writing - original draft preparation, Formal analysis, johnson-and-corbyn-endorse-each-other.
Data curation, Investigation. Shahroz Tariq: Conceptualization, [13] K. Roose, Here come the fake videos, too - The New York Times, in:
Methodology, Software, Writing - original draft preparation, Val- TheNewYorkTimes, 2018, [Online; accessed 31-December-2020], https://
[Link]/2018/03/04/technology/[Link].
idation. Youjin Shin: Data Curation, Resources. Simon S. Woo: [14] J. Christian, Experts fear face swapping tech could start an international
Writing - reviewing & editing, Supervision, Project administra- showdown, in: Outl., 2018, [Online; accessed 31-December-2020], https:
tion, Funding acquisition. //[Link]/post/3179/deepfake-videos-are-freaking-experts-out.

14
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

[15] A. Romano, Jordan peele’s simulated obama psa is a double-edged [43] B. Zoph, V. Vasudevan, J. Shlens, Q.V. Le, Learning transferable architectures
warning against fake news, Vox, 2018, pp. 2018–2020, [Online; ac- for scalable image recognition, in: Proceedings of the IEEE Conference on
cessed 31-December-2020], [Link] Computer Vision and Pattern Recognition, 2018, pp. 8697–8710.
jordan-peele-obama-deepfake-buzzfeed. [44] F. Chollet, Xception: Deep learning with depthwise separable convolutions,
[16] M. Huh, A. Liu, A. Owens, A.A. Efros, Fighting fake news: Image splice in: Proceedings of the IEEE Conference on Computer Vision and Pattern
detection via learned self-consistency, in: Proceedings of the European Recognition, 2017, pp. 1251–1258.
Conference on Computer Vision (ECCV), 2018, pp. 101–117. [45] S. Murali, G.B. Chittapur, B.S. Anami, et al., Comparision and analysis of
[17] Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, photo image forgery detection techniques, 2013, arXiv preprint arXiv:
in: Proceedings of the IEEE International Conference on Computer Vision, 1302.3119.
2015, pp. 3730–3738. [46] Z. Lin, J. He, X. Tang, C.-K. Tang, Fast, automatic and fine-grained tampered
[18] Computational Intelligence and Photography Lab Yonsei University, Real JPEG image detection via DCT coefficient analysis, Pattern Recognit. 42 (11)
and fake face detection, 2019, [Online; accessed 31-December-2020], https: (2009) 2492–2501.
//[Link]/ciplab/real-and-fake-face-detection. [47] N. Krawetz, FotoForensics, Neal Krawetz, 2020, [Online; accessed
[19] S. Lee, S. Tariq, Y. Shin, Hand-crafted facial manipulation (HFM) dataset, 31-December-2020], [Link]
2020, Mendeley Data, V1, doi:10.17632/h4ymvy9g8j.1. [48] Multimedia Computing Laboratory, MMC Image forensic tool - image
[20] J. Yang, G. Zhu, J. Huang, X. Zhao, Estimating JPEG compression history of watermarking tool, 2015, [Online; accessed 31-December-2020], http://
bitmaps based on factor histogram, Digit. Signal Process. 41 (2015) 90–97. [Link].
[49] T. Schlegl, P. Seeböck, S.M. Waldstein, G. Langs, U. Schmidt-Erfurth, F-
[21] A. Kashyap, B. Suresh, M. Agrawal, H. Gupta, S.D. Joshi, Detection of
AnoGAN: Fast unsupervised anomaly detection with generative adversarial
splicing forgery using wavelet decomposition, in: International Conference
networks, Med. Image Anal. 54 (2019) 30–44.
on Computing, Communication & Automation, IEEE, 2015, pp. 843–848.
[50] H. Khalid, S.S. Woo, OC-FakeDect: Classifying deepfakes using one-class
[22] M.F. Hashmi, A.R. Hambarde, A.G. Keskar, Copy move forgery detection
variational autoencoder, in: Proceedings of the IEEE/CVF Conference on
using DWT and SIFT features, in: 2013 13th International Conference on
Computer Vision and Pattern Recognition Workshops, 2020, pp. 656–657.
Intellient Systems Design and Applications, IEEE, 2013, pp. 188–193.
[51] H. Jeon, Y. Bang, S.S. Woo, Faketalkerdetect: Effective and practical realistic
[23] H. Farid, Exposing digital forgeries from JPEG ghosts, IEEE Trans. Inf.
neural talking head detection with a highly unbalanced dataset, in:
Forensics Secur. 4 (1) (2009) 154–160.
Proceedings of the IEEE/CVF International Conference on Computer Vision
[24] N. Krawetz, H.F. Solutions, A picture’s worth, Hacker Factor Solut. 6 (2) Workshops, 2015.
(2007) 2. [52] H. Jeon, Y. Bang, S.S. Woo, Fdftnet: Facing off fake images using fake
[25] P. Zhou, X. Han, V.I. Morariu, L.S. Davis, Learning rich features for im- detection fine-tuning network, in: IFIP International Conference on ICT
age manipulation detection, in: Proceedings of the IEEE Conference on Systems Security and Privacy Protection, Springer, 2020, pp. 416–430.
Computer Vision and Pattern Recognition, 2018, pp. 1053–1061. [53] J. Kim, S. Han, S.S. Woo, Classifying genuine face images from disguised
[26] D. Cozzolino, G. Poggi, L. Verdoliva, Recasting residual-based local descrip- face images, in: 2019 IEEE International Conference on Big Data (Big Data),
tors as convolutional neural networks: an application to image forgery IEEE, 2019, pp. 6248–6250.
detection, in: Proceedings of the 5th ACM Workshop on Information Hiding [54] H. Jeon, Y. Bang, J. Kim, S.S. Woo, T-GD: Transferable GAN-generated
and Multimedia Security, 2017, pp. 159–164. images detection framework, 2020, arXiv preprint arXiv:2008.04115.
[27] G. Galindo, XR Belgium Posts Deepfake of Belgian Premier Linking [55] S. Tariq, S. Lee, S.S. Woo, A convolutional LSTM based residual network for
Covid-19 with Climate Crisis, Brussels Times, 2020, [Online; accessed deepfake video detection, 2020, arXiv preprint arXiv:2009.07480.
31-December-2020], [Link] [56] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, X. Chen,
news/politics/106320/xr-belgium-posts-deepfake-of-belgian-premier- Improved techniques for training gans, in: Advances in Neural Information
linking-covid-19-with-climate-crisis. Processing Systems, 2016, pp. 2234–2242.
[28] J.-Y. Zhu, T. Park, P. Isola, A.A. Efros, Unpaired image-to-image translation [57] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, Z. Wojna, Rethinking
using cycle-consistent adversarial networks, in: Proceedings of the IEEE the inception architecture for computer vision, in: Proceedings of the
International Conference on Computer Vision, 2017, pp. 2223–2232. IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp.
[29] A. Van den Oord, N. Kalchbrenner, L. Espeholt, O. Vinyals, A. Graves, et 2818–2826.
al., Conditional image generation with pixelcnn decoders, in: Advances in [58] R. Durall, M. Keuper, F.-J. Pfreundt, J. Keuper, Unmasking deepfakes with
Neural Information Processing Systems, 2016, pp. 4790–4798. simple features, 2019.
[30] A.v.d. Oord, N. Kalchbrenner, K. Kavukcuoglu, Pixel recurrent neural [59] I. de Paz Centeno, MTCNN Face detection implementation for tensor-
networks, 2016, arXiv preprint arXiv:1601.06759. flow, 2016, [Online; accessed 31-December-2020], [Link]
[31] A.v.d. Oord, S. Dieleman, H. Zen, K. Simonyan, O. Vinyals, A. Graves, N. ipazc/mtcnn.
Kalchbrenner, A. Senior, K. Kavukcuoglu, Wavenet: A generative model for [60] K. Zhang, Z. Zhang, Z. Li, Y. Qiao, Joint face detection and alignment using
raw audio, 2016, arXiv preprint arXiv:1609.03499. multitask cascaded convolutional networks, IEEE Signal Process. Lett. 23
[32] D.P. Kingma, M. Welling, Auto-encoding variational bayes, 2013, arXiv (10) (2016) 1499–1503.
preprint arXiv:1312.6114. [61] D. Kim, M. Kim, G. Kwon, D.-S. Kim, Progressive face super-resolution via
[33] A. Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, M. Nießner, attention to facial landmark, 2019, arXiv preprint arXiv:1908.08239.
Faceforensics: A large-scale video dataset for forgery detection in human [62] Z. Hussain, F. Gimenez, D. Yi, D. Rubin, Differential data augmentation
faces, 2018, arXiv preprint arXiv:1803.09179. techniques for medical imaging classification tasks, in: AMIA Annual
Symposium Proceedings, 2017, American Medical Informatics Association,
[34] A. Rossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, M. Nießner, Face-
2017, p. 979.
forensics++: Learning to detect manipulated facial images, in: Proceedings
[63] Z. Eaton-Rosen, F. Bragman, S. Ourselin, M.J. Cardoso, Improving data
of the IEEE International Conference on Computer Vision, 2019, pp. 1–11.
augmentation for medical image segmentation, Open Rev. Med. Imag. Deep
[35] Y. Li, X. Yang, P. Sun, H. Qi, S. Lyu, Celeb-df: A new dataset for deepfake
Learn. (2018).
forensics, 2019, arXiv preprint arXiv:1909.12962.
[64] L. Perez, J. Wang, The effectiveness of data augmentation in image
[36] J. Thies, M. Zollhofer, M. Stamminger, C. Theobalt, M. Nießner, Face2face:
classification using deep learning, 2017, arXiv preprint arXiv:1712.04621.
Real-time face capture and reenactment of rgb videos, in: Proceedings of
[65] A.B. Jung, K. Wada, J. Crall, S. Tanaka, J. Graving, C. Reinders, S. Yadav, J.
the IEEE Conference on Computer Vision and Pattern Recognition, 2016,
Banerjee, G. Vecsei, A. Kraft, Z. Rui, J. Borovec, C. Vallentin, S. Zhydenko, K.
pp. 2387–2395.
Pfeiffer, B. Cook, I. Fernández, F.c.-M. De Rainville, C.-H. Weng, A. Ayala-
[37] M. Kowalski, Faceswap - github repository, 2016, [Online; accessed Acevedo, R. Meudec, M. Laporte, et al., Imgaug, 2020, [Online; accessed
31-December-2020], [Link] 31-December-2020], [Link]
[38] FaceSwapDevs, Deepfakes_faceswap - GitHub Repository, 2020, [Online; [66] S. Park, N. Kwak, Analysis on the dropout effect in convolutional neural
accessed 31-December-2020], [Link] networks, in: Asian Conference on Computer Vision, Springer, 2016, pp.
[39] S. Zerdoumi, A.Q.M. Sabri, A. Kamsin, I.A.T. Hashem, A. Gani, S. Hakak, 189–204.
M.A. Al-Garadi, V. Chang, Image pattern recognition in big data: taxon- [67] S.-Y. Wang, O. Wang, A. Owens, R. Zhang, A.A. Efros, Detecting pho-
omy and open challenges: survey, Multimedia Tools Appl. 77 (8) (2018) toshopped faces by scripting photoshop, in: Proceedings of the IEEE
10091–10121. International Conference on Computer Vision, 2019, pp. 10072–10081.
[40] K. Simonyan, A. Zisserman, Very deep convolutional networks for large- [68] F. Chollet, et al., Keras, 2015, [Online; accessed 31-December-2020], https:
scale image recognition, 2014, pp. 1–14, arXiv preprint arXiv:1409. //[Link].
1556. [69] H. Chu, Github: AUROC (Area Under the Receiver Operating Charac-
[41] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, teristic), 2017, [Online; accessed 31-December-2020], [Link]
in: Proceedings of the IEEE Conference on Computer Vision and Pattern hyoungseokchu/AUROC.
Recognition, 2016, pp. 770–778. [70] R.R. Selvaraju, M. Cogswell, A. Das, R. Vedantam, D. Parikh, D. Batra,
[42] G. Huang, Z. Liu, L. Van Der Maaten, K.Q. Weinberger, Densely con- Grad-cam: Visual explanations from deep networks via gradient-based
nected convolutional networks, in: Proceedings of the IEEE Conference on localization, in: Proceedings of the IEEE International Conference on
Computer Vision and Pattern Recognition, 2017, pp. 4700–4708. Computer Vision, 2017, pp. 618–626.

15
S. Lee, S. Tariq, Y. Shin et al. Applied Soft Computing 105 (2021) 107256

[71] C. Han, K. Murao, T. Noguchi, Y. Kawata, F. Uchiyama, L. Rundo, H. Youjin Shin received the B.S. degree in computer
Nakayama, S. Satoh, Learning more with less: conditional PGGAN-based science from Ewha womans university, [Link], in 2010
data augmentation for brain metastases detection using highly-rough and the M.B.A. degree in IT and media management
annotation on MR images, in: Proceedings of the 28th ACM Interna- from Korea Advanced Institute of Science and Tech-
tional Conference on Information and Knowledge Management, 2019, pp. nology (KAIST), South Korea, in 2015. She received
119–127. M.S. degree and currently pursuing the Ph.D. degree
in computer science from the State University of New
York, [Link] and Stony Brook University, USA. She
worked for the Chosun Daily, [Link], as a staff from
Sangyup Lee received his B.S. in Computer Science
2010 to 2013. Her research interest includes the fac-
from Kwangwon University, Seoul, South Korea. He was
torization algorithm, machine learning algorithm, and
a member of IT Development department for 1 and a
their application fields such as anomaly detection for the real data.
half year in Ssangyong Information and Communication
Corp. Seoul, Korea. He was a Ph.D. research assistant
at Stony Brook University and SUNY Korea (2017–
2019). He is currently a Ph.D. student at Sungkyunkwan
University.
Simon S. Woo received his M.S. and Ph.D. in Computer
Science from Univ. of Southern California (USC), Los An-
geles, M.S. in Electrical and Computer Engineering from
University of California, San Diego (UCSD), and B.S. in
Electrical Engineering from University of Washington
Shahroz Tariq received his B.S. in Computer Science (UW), Seattle. He was a member of technical staff
with high distinction from National University of Com- (technologist) for 9 years at the NASA’s Jet Propulsion
puter & Emerging Sciences, (FAST-NUCES), Islamabad, Lab (JPL), Pasadena, CA, conducting research in satellite
Pakistan, and M.S. in Computer Science with high communications, networking and cybersecurity areas.
distinction from Sangmyung University, Cheonan, South Also, he worked at Intel Corp. and Verisign Research
Korea. He worked as a Software Engineer in Bent- Lab. Since 2017, he was a tenure-track Assistant Pro-
ley Systems (2014–2015). He was a Ph.D research fessor at SUNY, South Korea and a Research Assistant Professor at Stony Brook
assistant at Stony Brook University and SUNY Korea University. Now, he is a tenure-track Assistant Professor at the SKKU Institute for
(2017–2019). He is currently a Ph.D. candidate at Convergence and Department of Software in Sungkyunkwan University, Suwon,
Sungkyunkwan University (Natural Sciences campus), Korea.
Suwon, South Korea.

16

You might also like