International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
Deep Fake Video Detection Using Transfer Learning Resnet50
S. Praveena 1, [Link] 2 , [Link] Farhana 3 [Link] 4
1
Professor, Department of AIML (Artificial intelligence and machine learning), Manakula Vinayagar Institute
of Technology, Puducherry, India.
2,3,4
Under Graduate student, Manakkula vinayagar institute of technology, Puducherry, India.
Email ID:
[email protected],
[email protected],
[email protected],
[email protected]Abstract
The rapid development of deep learning technologies has enabled the creation of highly realistic deepfake
videos, raising concerns in areas such as media integrity, privacy, and security. Detecting these deepfakes has
become a significant challenge, as conventional methods struggle to keep pace with increasingly sophisticated
techniques. This journal explores the application of transfer learning using ResNet50, a pre-trained
convolutional neural network, for deepfake video detection. We present an overview of deepfake creation, the
role of ResNet50 in transfer learning, the implementation process, and the results of using this approach to
detect deepfakes in video content.
Keywords: Deepfake Detection, Transfer Learning, ResNet50, Convolutional Neural Network, Media
Integrity.
1. Introduction
In recent years, advancements in artificial deepfakes evolve, leading to an urgent need for
intelligence and machine learning have enabled the effective detection methods. Traditional detection
creation of highly sophisticated fake media, techniques often rely on analyzing digital artifacts or
particularly deepfake videos. Deepfakes utilize deep inconsistencies in video frames. However, as
learning algorithms, specifically Generative deepfake generation techniques become more
Adversarial Networks (GANs) and Autoencoders, to advanced, these methods are frequently insufficient.
generate hyper-realistic video content by swapping There is a critical demand for robust detection
faces, manipulating facial expressions, or altering systems that can identify deepfakes even when they
speech to create convincing yet fabricated scenarios. appear nearly indistinguishable from authentic
This technology has raised significant ethical videos. Deep Learning, particularly through the use
concerns, as it can be exploited for malicious of Convolutional Neural Networks (CNNs), has
purposes, such as spreading misinformation, creating emerged as a promising approach to tackle the
defamatory content, or manipulating public opinion. challenges posed by deepfake detection. CNNs excel
The implications of deepfake technology are in visual tasks by automatically extracting features
profound, impacting areas such as journalism, from images, which is essential for recognizing the
politics, and social media. For instance, the potential subtle differences between real and manipulated
to create realistic yet fabricated videos of public video frames. Among various deep learning
figures can undermine trust in media and influence architectures, ResNet50 has gained prominence due
public perceptions. In the realm of personal privacy, to its ability to train very deep networks while
deepfakes can lead to identity theft or defamation, as maintaining high performance, thanks to its
individuals can be portrayed in compromising innovative residual connections. In this study, we
situations without their consent. The challenge of leverage transfer learning to adapt the pre-trained
identifying these manipulated videos is becoming ResNet50 model for deepfake detection. Transfer
increasingly pressing as the techniques for creating learning involves utilizing a model that has been
IRJAEM 585
International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
previously trained on a large dataset (such as tailored for deepfake classification. Fine-tuning
ImageNet) and fine-tuning it on a smaller, task- involves adjusting the weights of the network to adapt
specific dataset. This approach allows for efficient to the deepfake detection task, improving the model's
training and better performance, particularly when sensitivity to the subtle artifacts often found in
data is limited. By employing transfer learning with manipulated videos. Once the model architecture is
ResNet50, we aim to harness its ability to extract established, the training process begins. The model is
intricate features and patterns that characterize trained on the pre-processed frames extracted from
deepfake content, ultimately leading to improved both real and deepfake videos, allowing it to learn to
detection accuracy. The objective of this journal is to detect the unique patterns, distortions, and
explore the methodology of using ResNet50 for inconsistencies associated with deepfakes. Since the
detecting deepfake videos, analyse the results of our ResNet50 architecture is deep, comprising 50 layers
experiments, and discuss the effectiveness and with residual connections, it can effectively capture
limitations of this approach. Through this research, hierarchical features at different levels of abstraction,
we aim to contribute to the ongoing efforts to combat from basic edges to complex textures, which are
the proliferation of deepfake technology and crucial in identifying deepfakes. After training, the
safeguard the integrity of digital media. [1-3] model is evaluated using a test set comprising unseen
2. Methodology real and deepfake videos. The performance of the
The methodology for detecting deepfake videos using model is assessed through various metrics such as
transfer learning with the ResNet50 model involves a accuracy, precision, recall, F1-score, and the
systematic approach combining data preprocessing, confusion matrix, which provides insight into the
model selection, training, and evaluation. First, a number of true positives, false positives, true
diverse dataset containing both real and deepfake negatives, and false negatives. To ensure robustness,
videos is gathered from reliable sources such as the cross-validation techniques may be used, where the
Deepfake Detection Challenge (DFDC), Face dataset is split into multiple subsets to test the model's
Forensics++, or Celeb-DF. These datasets typically consistency across different data partitions. Once the
contain thousands of videos that are either genuine or model achieves satisfactory performance, it is
synthetically generated using advanced deepfake deployed for real-world deepfake detection. Given a
techniques. The videos are then pre-processed by new input video, the trained model processes the
extracting individual frames, as deepfake detection video frame by frame, predicting whether each frame
typically focuses on identifying subtle visual is authentic or a deepfake. The final decision for the
inconsistencies within each frame. Each frame is entire video is based on an aggregation of frame-level
resized and normalized to meet the input predictions, with majority voting or probability-
requirements of ResNet50, which typically accepts based techniques employed to classify the video as
images of size 224x224 pixels. Additional real or fake. Additionally, post-processing techniques
preprocessing steps may include face detection, such as temporal consistency checks across frames
cropping to focus on facial regions, and data can further enhance the reliability of predictions by
augmentation techniques such as rotation, flipping, addressing any potential frame-wise inconsistencies.
and zooming to enhance model generalization. Next, This deepfake detection system, combining transfer
the pre-trained ResNet50 model is used for transfer learning with ResNet50, enables efficient and
learning. ResNet50, having been trained on the accurate detection of manipulated videos, making it a
extensive ImageNet dataset, already possesses robust valuable tool for combating the spread of
feature extraction capabilities from images. In this misinformation and protecting the integrity of digital
approach, the convolutional layers of ResNet50 are media. With further fine-tuning, the system can
retained to leverage its feature extraction power, continue to improve its performance on evolving
while the fully connected layers are either fine-tuned deepfake generation techniques, ensuring it remains
or replaced with new dense layers that are specifically effective against future deepfake advancement[4-
IRJAEM 586
International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
6](Figure 1 Deepfake Detection System) features from video data. One significant contribution
to deepfake detection is the use of convolutional
neural networks (CNNs), which have shown
remarkable success in image classification tasks. A
study by Afchar et al. (2018) introduced a CNN-
based approach called MesoNet, which was
specifically designed for deepfake detection. While
MesoNet demonstrated success in detecting
manipulated faces, it lacked the generalization
capability required for more complex, high-quality
deepfakes. To address this, recent studies have
leveraged more advanced architectures like ResNet,
which can capture finer details in video frames.
Transfer learning has emerged as a powerful
technique in deepfake detection, allowing models
pre-trained on large datasets, such as ImageNet, to be
fine-tuned for deepfake classification. In 2020,
Nataraj et al. demonstrated the use of ResNet-based
architectures for detecting deepfakes, where the pre-
trained ResNet50 model was fine-tuned on deepfake
datasets such as FaceForensics++. This approach
Figure 1 Deepfake Detection System significantly improved detection accuracy compared
to traditional methods, as the model could learn
3. Literature Survey subtle features specific to deepfake manipulations,
The detection of deepfake videos has emerged as a such as pixel-level inconsistencies and texture
critical area of research, driven by the rapid distortions. Several datasets have been crucial in
advancements in artificial intelligence and deep advancing deepfake detection research. The
learning technologies. Deepfakes, which manipulate FaceForensics++ dataset, introduced by Rössler et al.
video and audio content using sophisticated (2019), contains a diverse collection of manipulated
algorithms, have posed significant threats to the and real videos and has become a standard
authenticity of digital media. Various researchers benchmark for evaluating deepfake detection models.
have explored different methodologies, datasets, and Additionally, the Deepfake Detection Challenge
models for detecting these manipulations, with (DFDC) dataset, released by Facebook in
particular focus on transfer learning and collaboration with several academic and industry
convolutional neural networks (CNNs) such as partners, provided an even larger and more varied
ResNet50. In early deepfake detection methods, dataset, allowing researchers to test their models on a
researchers relied on handcrafted features and wide range of deepfake techniques. These datasets
traditional machine learning techniques. Li et al. have been instrumental in training deep learning
(2018) explored the use of visual artifacts, such as models, improving their generalization across
unnatural blinking patterns, to detect deepfakes. different types of manipulations and compression
These methods, however, faced limitations due to the artifacts. While ResNet50 has proven effective for
increasing sophistication of deepfake generation deepfake detection, other models, such as
algorithms, which reduced the presence of easily EfficientNet and Xception, have also been explored
detectable visual cues. As deepfakes became more for this task. A study by Nguyen et al. (2020)
advanced, researchers turned to deep learning-based compared several CNN architectures, including
methods, which automatically learn hierarchical ResNet50, Xception, and EfficientNet, for their
IRJAEM 587
International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
performance in deepfake detection. The results applied to diversify the training data and prevent
indicated that while ResNet50 performed well, overfitting, which helps the model generalize better
Xception provided slightly better accuracy due to its to unseen videos.[11-14] (Figure 2 Deepfake
ability to capture even finer details in manipulated Manipulations)
video frames. Nonetheless, ResNet50 remains a
popular choice due to its balance between
performance and computational efficiency. Another
notable research trend is the application of recurrent
neural networks (RNNs), particularly long short-term
memory (LSTM) networks, to capture temporal
information across video frames. Sabir et al. (2019)
proposed a method that combines CNNs with LSTMs
to leverage both spatial and temporal features in
video sequences. This approach helps in detecting
temporal inconsistencies in deepfake videos, which
often arise due to poor frame transitions in generated Figure 2 Deepfake Manipulations
content. The research also highlights the importance
of generalization in deepfake detection models. A The core of the system is the ResNet50 model, which
common challenge is ensuring that models trained on is known for its deep residual network architecture.
one dataset can generalize well to unseen data. A ResNet50 has already been pre-trained on the
study by Tolosana et al. (2020) addressed this issue ImageNet dataset, which provides the model with a
by evaluating the cross-dataset performance of robust foundation for recognizing patterns and
different deepfake detection models, emphasizing the features in images. In this system, the convolutional
need for diverse and robust training datasets to ensure layers of ResNet50 are retained to exploit their strong
that detection systems remain effective across feature extraction capabilities, while the fully
different types of deepfakes.[7-10] connected layers are modified to suit the deepfake
4. Purposed System detection task. The transfer learning approach allows
The proposed system for deepfake video detection the model to adapt to this specific task by fine-tuning
utilizes an advanced approach centered around the weights in a way that focuses on detecting unique
transfer learning with the ResNet50 architecture to deepfake artifacts, such as pixel-level distortions,
accurately identify manipulated videos. The system unnatural facial expressions, and inconsistencies in
begins with the acquisition of a large dataset of both lighting or shading that are often overlooked by
real and deepfake videos, sourced from well- human observers. This fine-tuning process involves
established datasets such as the Deepfake Detection retraining the model on the deepfake dataset, where
Challenge (DFDC), FaceForensics++, and Celeb-DF. the loss function—typically binary cross-entropy
The preprocessing phase involves extracting frames optimizes the model's ability to differentiate between
from each video, as deepfake detection largely relies real and manipulated frames. Once the model is
on the analysis of individual frames to detect subtle trained, it is capable of processing new videos frame
visual inconsistencies. These frames are resized to the by frame, classifying each frame as either real or fake.
input size required by the ResNet50 model, typically A majority voting system is used to aggregate these
224x224 pixels, and normalized to standardize pixel frame-level predictions, where the most frequent
values. The preprocessing pipeline may also include class (real or fake) across the video frames
the detection and cropping of facial regions within the determines the final classification of the entire video.
frames, as these areas are often the primary focus of To enhance the accuracy and robustness of the
deepfake manipulations. Data augmentation system, an optional component involving a Long
techniques like flipping, rotation, and zooming are Short-Term Memory (LSTM) network can be
IRJAEM 588
International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
integrated. This addition allows the system to not Conclusion
only analyze individual frames but also capture the The proposed system for deepfake detection, utilizing
temporal relationships between consecutive frames. transfer learning with ResNet50, offers a highly
By doing so, the model can identify inconsistencies effective solution for identifying manipulated videos.
in the motion or transitions between frames—another By leveraging the pre-trained model's powerful
key characteristic of deepfake videos. This hybrid feature extraction capabilities and combining it with
CNN-LSTM approach ensures that both spatial techniques like fine-tuning and optional LSTM
features (from ResNet50) and temporal dynamics integration, the system is able to detect subtle
(from LSTM) are taken into account, improving the inconsistencies characteristic of deepfakes. Through
system's performance on more complex and high- rigorous preprocessing, data augmentation, and
quality deepfakes. The evaluation phase of the system advanced classification techniques, the model
involves testing it on a separate dataset of real and ensures high accuracy and robustness across various
deepfake videos that were not seen during training. datasets and evolving deepfake generation methods.
The performance is measured using standard metrics The system’s adaptability for real-time detection
such as accuracy, precision, recall, and F1-score, further enhances its practical applicability, making it
which provide a comprehensive view of the model’s suitable for deployment in platforms requiring
strengths and weaknesses in detecting deepfakes. efficient deepfake monitoring. Overall, this approach
Additionally, a confusion matrix is used to further provides a scalable, reliable, and efficient method to
analyze the model’s classification performance by combat the growing challenge posed by deepfake
showing the number of true positives, false positives, videos, offering a valuable tool for maintaining media
true negatives, and false negatives. Cross-validation integrity in the digital age.
techniques can also be employed to assess the model's Reference
generalization across different datasets and deepfake [1]. Karras, T., Laine, S., & Aila, T. (2019). A
generation techniques. This ensures that the system is Style-Based Generator Architecture for
not overfitted to a specific dataset and remains Generative Adversarial Networks. IEEE
effective against evolving deepfake methods. The Transactions on Pattern Analysis and
system is designed to be adaptable for real-time Machine Intelligence, 43(6), 1959-1971.
deepfake detection applications, where videos can be [Link]
processed as they are streamed. To enable this, 19
optimization techniques such as model quantization [2]. Roessler, A., Cozzolino, D., Verdoliva, L.,
or pruning can be applied to reduce the computational Riess, C., Thies, J., & Nießner, M. (2019).
load without sacrificing accuracy. This makes the FaceForensics++: Learning to Detect
system scalable and suitable for integration into Manipulated Facial Images. Proceedings of
platforms that require real-time deepfake monitoring, the IEEE/CVF International Conference on
such as social media platforms, video-sharing Computer Vision (ICCV), 1-11.
websites, or media outlets that aim to combat [Link]
misinformation. In summary, the proposed deepfake [3]. Chollet, F. (2017). Xception: Deep Learning
detection system is a highly efficient, accurate, and with Depthwise Separable Convolutions.
adaptable solution that leverages the strengths of Proceedings of the IEEE/CVF Conference on
ResNet50 and transfer learning, with optional LSTM Computer Vision and Pattern Recognition
integration to provide robust detection across a wide (CVPR), 1251-1258.
range of deepfake video manipulations. The system's [Link]
ability to generalize across datasets and adapt to [4]. He, K., Zhang, X., Ren, S., & Sun, J. (2016).
future advancements in deepfake technology makes Deep Residual Learning for Image
it a valuable tool in the ongoing fight against Recognition. Proceedings of the IEEE/CVF
synthetic media and its potential misuse[15-16] Conference on Computer Vision and Pattern
IRJAEM 589
International Research Journal on Advanced Engineering e ISSN: 2584-2854
Volume: 03
and Management Issue:03 March 2025
[Link] Page No: 585-590
[Link]
Recognition (CVPR), 770-778. [Link]
[Link] 0583
[5]. Nguyen, H. H., Yamagishi, J., & Echizen, I. [12]. Li, Y., Chang, M.-C., & Lyu, S. (2018). In
(2019). Use of a Capsule Network to Detect Ictu Oculi: Exposing AI Created Fake Videos
Fake Images and Videos. arXiv preprint by Detecting Eye Blinking. Proceedings of
arXiv:1910.12467. the IEEE/CVF International Conference on
[Link] Computer Vision (ICCV) Workshops, 1-9.
[6]. Afchar, D., Nozick, V., Yamagishi, J., & [Link]
Echizen, I. (2018). MesoNet: A Compact [13]. Zhang, H., Goodfellow, I., Metaxas, D., &
Facial Video Forgery Detection Network. Odena, A. (2019). Self-Attention Generative
IEEE International Workshop on Adversarial Networks. arXiv preprint
Information Forensics and Security (WIFS), arXiv:1805.08318.
1-7. [Link]
[Link] [14]. Dolhansky, B., Howes, R., Pflaum, B.,
[7]. Korshunov, P., & Marcel, S. (2019). Baram, N., & Ferrer, C. C. (2020). The
DeepFakes: A New Threat to Face Deepfake Detection Challenge Dataset.
Recognition? Assessment and Detection. arXiv preprint arXiv:2006.07397.
arXiv preprint arXiv:1812.08685. [Link]
[Link] [15]. Kietzmann, J., Lee, L. W., McCarthy, I. P.,
[8]. Wang, S., Wang, H., Zhang, L., & Zhang, J. & Kietzmann, T. C. (2020). Deepfakes: Trick
(2020). Exploiting Temporal and Spatial or Treat? Business Horizons, 63(2), 135-146.
Constraints in Deepfake Video Detection. [Link]
ACM Transactions on Multimedia 66. Kingma, D. P., & Ba, J. (2015). Adam: A
Computing, Communications, and Method for Stochastic Optimization. arXiv
Applications (TOMM), 16(1), 1-19. preprint arXiv:1412.6980.
[Link] [Link]
[9]. Rossler, A., Cozzolino, D., Verdoliva, L., [16]. Goodfellow, I., Pouget-Abadie, J., Mirza,
Riess, C., Thies, J., & Nießner, M. (2020). M., Xu, B., Warde-Farley, D., Ozair, S., ... &
FaceForensics++: Benchmarking Deepfake Bengio, Y. (2014). Generative Adversarial
Detection Tools. IEEE Transactions on Networks. arXivpreprintarXiv:1406.2661.
Biometrics, Behavior, and Identity Science [Link]
(TBIOM), 3(4), 487-497.
[Link]
1
[10]. Tariq, S., Lee, J., Shahbaz, M. S., Huh, J. H.,
& Park, K. R. (2021). DeepFake Video
Detection: A Survey. IEEE Access, 9,
154583-154614.
[Link]
393
[11]. Dang, H., Liu, F., Stehouwer, J., Liu, X., &
Jain, A. K. (2020). On the Detection of
Digital Face Manipulation. Proceedings of
the IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR),
5781-5790.
IRJAEM 590