0% found this document useful (0 votes)

17 views60 pages

Final PPT-1

The document discusses a project on fake speech detection, which utilizes NLP and machine learning techniques to identify artificially generated speech. It highlights the motivation for developing such systems due to the risks posed by convincing mimicry of human speech in various applications. The document outlines the project's objectives, methodologies, and the gaps in current research, aiming to enhance the accuracy and efficiency of fake speech detection technologies.

Uploaded by

m7086651

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views60 pages

Final PPT-1

Uploaded by

m7086651

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Fake Speech Detection

Arnav Kshetri, Animesh Chaturvedi, Sangam, Deepak

under the guidance of

Prof. Mangesh Hajare

Army Institute of Technology

Savitribai Phule Pune University
November 12, 2024
BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 1 / 60
Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 2 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 3 / 60

Fake Speech Detection

Fake speech detection, also known as spoofing detection or synthetic

speech detection, is a technology designed to identify speech that has
been artificially generated or manipulated.
We use NLP and Machine Learning techniques to process speech data
and verify authenticity of users.
The goal of fake speech detection systems is to distinguish between
genuine human speech and artificially generated speech created using
methods like text-to-speech (TTS), voice conversion (VC), or deep
learning-based models like deepfakes.
We employ a variation of CNN which uses Conv2D, Pooling, Dropout
and Flattening Layers
The project proves useful in not just detecting basic synthetic speech
but also in staying ahead of increasingly sophisticated methods
capable of producing realistic-sounding fake audio.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 4 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 5 / 60

Motivation

Block 1
Ability to mimic human speech convincingly poses serious risks in areas
such as biometric authentication, legal proceedings, and fraud detection,
where voice recordings are increasingly used as critical evidence. Fake
Speech Detection systems serve as a security mechanism for this.

Block 2
Traditional methods like Gaussian Mixture Models (GMM) or Support
Vector Machines (SVM) are becoming less effective against these
advanced spoofing techniques. By leveraging CNNs, it is possible to detect
subtle differences between real and fake speech, enhancing the accuracy
and reliability of fake speech detection in modern applications along with
being cost effective and computationally inexpensive.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 6 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 7 / 60

Literature Review

Learning Efficient Representations for Fake Speech Detection

Deep4SNet: deep learning for fake speech classification
One-Class Fake Speech Detection Based on Improved Support Vector
Data Description
Deepfake Audio Detection via MFCC Features Using Machine
Learning
A Self-Distillation Method For Fake Speech Detection

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 8 / 60

Learning Efficient Representations for Fake Speech
Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang,

Zhengqi Wen, Dan Zhang, Zhao Lv
Info: This paper introduces a method for learning efficient
representations specifically for fake speech detection, utilizing
advanced neural network architectures to enhance the accuracy and
speed of identification.
Scope: Future work may focus on refining representation techniques
and integrating them with real-time detection systems to improve
deployment in practical applications.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 9 / 60

Deep4SNet: deep learning for fake speech classification

Authors: Dora M. Ballesteros, Yohanna Rodriguez-Ortega, Diego

Renza, Gonzalo Arce
Info: The paper presents Deep4SNet, a CNN-based model for
detecting fake speech generated by Imitation and Deep Voice
methods. It uses image augmentation and dropout to improve
accuracy. The model achieved high precision (P = 0.997) and recall
(R = 0.997) for Imitation-based fakes, with an overall accuracy of
98.5%, proving effective in distinguishing fake from real speech.
Scope: Future work could explore adapting the model for more
advanced voice synthesis methods, handling more diverse datasets,
and optimizing the CNN architecture for better efficiency.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 10 / 60

One-Class Fake Speech Detection Based on Improved
Support Vector Data Description

Authors: Jinghong Zhang, Xiaowei Yi, Xianfeng Zhao

Info: The paper presents a novel approach to fake speech detection
using a one-class classification framework based on Improved Support
Vector Data Description (ISVDD). The proposed method addresses
the challenge of detecting counterfeit audio by utilizing a single-class
learning paradigm. The authors demonstrate that the ISVDD model
effectively captures the underlying characteristics of genuine speech,
enabling it to distinguish between real and fake audio samples.
Scope: Future research could focus on enhancing the model’s
robustness by incorporating multimodal data sources, such as visual
and contextual information. Additionally, exploring the integration of
deep learning techniques may further improve detection accuracy and
generalization in diverse environments.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 11 / 60

Deepfake Audio Detection via MFCC Features Using
Machine Learning
Authors: Ameer Hamza, Abdul Rehman Javed, Farkhund Iqbal,
Natalia Kryvinska, Ahmad S. Almadhor, Zunera Jalil, Rouba Borghol
Info: This paper investigates the effectiveness of Mel-frequency
cepstral coefficients (MFCC) as features for detecting deepfake audio.
The authors employ various machine learning algorithms, including
Support Vector Machines (SVM) and Random Forest, to classify
audio samples as either real or fake. The study demonstrates that
MFCC features significantly enhance detection performance, achieving
high accuracy rates in distinguishing authentic audio from
manipulated content.
Scope: Future work could explore the combination of MFCC features
with deep learning approaches to improve detection capabilities.
Additionally, investigating the impact of different audio qualities and
environments on detection accuracy would provide valuable insights
for real-world applications.
BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 12 / 60
A Self-Distillation Method For Fake Speech Detection

Authors: Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang,

Zhengqi Wen, Dan Zhang, Zhao Lv
Info: This paper presents a self-distillation method for fake speech
detection, where a teacher model enhances a student model’s learning
through knowledge transfer, improving robustness against counterfeit
audio.
Scope: Future work may optimize self-distillation techniques and
incorporate adversarial training to further enhance detection
capabilities.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 13 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 14 / 60

Gaps in Present Study

Limited Data Diversity

Generalization Issues
Real-Time Processing
Lack of Interpretability
Integration with Multi-Modal Systems
Adversarial Robustness
Ethical Considerations

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 15 / 60

Limited Data Diversity

Shortcoming: Most studies rely on specific datasets that may not

represent the full spectrum of real-world scenarios, including various
accents, dialects, and background noises. This can result in models
that excel in controlled environments but struggle with practical,
diverse audio inputs.

Solution: To address this, researchers should create and utilize more

comprehensive datasets that encompass a wider range of linguistic
and environmental variations. Collaborating with linguists and audio
engineers can help develop datasets that reflect the complexities of
natural speech, improving model robustness.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 16 / 60

Generalization Issues

Shortcoming: Many detection methods are trained on specific types

of deepfake techniques, leading to models that may not perform well
against emerging or novel manipulation methods. This restricts the
effectiveness of these systems in real-world applications.

Solution: Developing more generalized models capable of detecting

various audio forgeries is essential. Researchers could employ transfer
learning techniques, where models trained on one dataset are
fine-tuned on another, to enhance their adaptability to new types of
deepfakes.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 17 / 60

Real-Time Processing

Shortcoming: While some models demonstrate high accuracy, they

often require substantial computational resources, making real-time
detection challenging. This is particularly important for applications
like live broadcasting and security.

Solution: Research should focus on developing lightweight algorithms

and optimized architectures that can provide fast inference times
without sacrificing accuracy. Techniques such as model pruning and
quantization can help reduce the computational load.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 18 / 60

Lack of Interpretability

Shortcoming: Many advanced machine learning models function as

black boxes, offering limited insight into their decision-making
processes. This lack of transparency can hinder trust and
understanding among users and stakeholders.

Solution: Implementing interpretable machine learning techniques

can help elucidate model behavior. Approaches like SHAP (SHapley
Additive exPlanations) or LIME (Local Interpretable Model-agnostic
Explanations) can provide insights into feature importance, aiding in
the interpretation of detection results.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 19 / 60

Integration with Multi-Modal Systems

Shortcoming: Current approaches often focus solely on audio

detection, neglecting the potential benefits of analyzing multiple
modalities. Combining audio analysis with video or textual content
can enhance detection accuracy and contextual understanding.

Solution: Future research should explore multi-modal detection

systems that integrate audio, video, and text inputs. Developing
models that can analyze and correlate information from these diverse
sources will likely improve the overall effectiveness of fake speech
detection.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 20 / 60

Adversarial Robustness

Shortcoming: Many detection models are susceptible to adversarial

attacks, where subtle alterations to the input data can lead to
misclassification. This vulnerability poses a significant risk in
applications where security is paramount.

Solution: Researchers should investigate techniques to enhance

model robustness against adversarial attacks, such as adversarial
training or using ensemble methods. Regularly updating models to
recognize new attack vectors can also improve resilience.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 21 / 60

Ethical Considerations

Shortcoming: The ethical implications of fake speech detection

technologies, including privacy concerns and potential misuse, are
often overlooked. These issues are critical for responsible deployment
in society.

Solution: It is essential to engage in discussions regarding the ethical

dimensions of fake speech detection technologies. Establishing
guidelines and frameworks for responsible usage, as well as conducting
impact assessments, can help mitigate potential risks associated with
these technologies.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 22 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 23 / 60

Aim

The primary aim is to conduct a comprehensive analysis of fake speech

detection technologies by evaluating various methodologies, including
machine learning algorithms, deep learning architectures, and feature
extraction techniques such as Mel-frequency cepstral coefficients (MFCC)
and spectrogram analysis. Additionally, it aims to propose actionable
recommendations for enhancing the accuracy, efficiency, and ethical
deployment of these systems in real-world applications such as digital
forensics, media verification, and cybersecurity.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 24 / 60

Objective 1

Identification of utterance: The primary objective of the Fake Speech

Detection project is to be able to tell fake speech utterances from bonafide
(authentic) ones. The project should prove viable in detecting Logical
Attacks such as TTS and VC. Physical Attacks such as Replay attacks are
outside the scope of this project.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 25 / 60

Objective 2

Extension of ASVSpoof 2019 Dataset: We intend to increase the

number of training examples in our dataset by 5 times to 10 times by
employing various audio signal processing and speech augmentation
techniques on the existing dataset such as Time Shifting, Time Stretching,
Pitch Scaling, Noise Addition, etc. This will make the model more robust
and improve generalisation capabilities of the model.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 26 / 60

Objective 3

Performance Assessment: After model is built, we will assess the

performance of proposed model against established benchmarks in fake
speech detection, focusing on metrics such as precision, recall, and
F1-score. Evaluation of model on original dataset, augmented dataset and
both combined will be done seperately and compared against various
studies involving similar models and datasets.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 27 / 60

Dataset Overview

ASV stands for Automatic Speaker Verification.

Created for the ASVSpoof 2019 challenge.
Has two tracks, Logical Access (LA) and Physical Access (PA). Only
the LA portion is used. LA portion has fake training examples of TTS
and VC systems.
Dataset has binary labels, namely ’BONAFIDE’ and ’FAKE’ for
bonafide and fake training examples respectively. Bonafide training
examples are actual voice recordings of speakers from the VoxCeleb
corpus, a large-scale speaker recognition dataset containing real
human voices. Fake training examples are synthetic or converted
speech generated by various state-of-the-art TTS and VC systems.
Ratio of speakers is 40% male, 60% female. Over 20 distinct speakers.
Audio sample lengths: Minimum = 650ms, Median = 3300ms,
Maximum = 12000ms.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 28 / 60

Dataset Overview

All audio files in this dataset are in 16 kHz mono waveform format,
with a sampling rate of 16,000 Hz.
Training Set: Contains both genuine and spoofed audio samples for
system development and training. About 2580 genuine and spoofed
samples.
Development Set: Used for tuning and validating systems during
development. Performance on this set typically provides an indication
of how well a system will perform on unseen data. About 2548
genuine samples and 22,000 spoofed samples
Evaluation Set: Includes unseen spoofing attacks to measure system
performance during the challenge evaluation. About 712 genuine
samples and 63,882 spoofed samples.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 29 / 60

Class Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 30 / 60

Object Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 31 / 60

Sequence Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 32 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 33 / 60

Natural Audio

The human ear is receptive to frequency in range of 20 Hz to 20 kHz. We

as humans are way better at perceiving differences in lower frequencies
rather than higher frequencies. This is because humans perceive frequency
NOT linearly BUT logarithmically. But since frequency is a linear quantity,
using it as it is will hinder our application as we require a scale that can
cater to the human ear, i.e, provide logarithmic scaling. This is why we
use the Mel scale.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 34 / 60

Mel Scale

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 35 / 60

Mel Scale

Logarithmic Scale
Equal distances on the scale have same “perceptual” distance
1000 Hz = 1000 Mel
Mel = 2595 log (1 + (f/700))

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 36 / 60

Spectrogram Generation

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 37 / 60

Spectrogram Generation

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 38 / 60

Spectrogram Generation

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 39 / 60

Spectrogram Generation

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 40 / 60

Spectrogram Generation

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 41 / 60

Mel-Spectrogram vs MFCC

MFCCs are derived from Mel-Spectrograms. To get MFCCs, compute

the Discrete Cosine Transform (DCT) on the mel-spectrogram.
Highly correlated features give redundant information. MFCCs are
more decorrelated, which means they can be beneficial in linear
models like a Gaussian Mixture Model (GMM). However, with lots of
data and a strong classifier such as a CNN, mel-spectrograms often
perform better.
MFCCs are compressed representation, often using only first 12-13
coefficients instead of 32-64 mel bands in mel-spectrograms.
Mel-spectrograms are considerably easier to understand when plotted,
as they are a time-frequency representation that maps well to the
observed sounds. MFCCs are trickier to understand as they require a
good understanding of the cepstrum along with the spectrum.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 42 / 60

Methodologies

1. Data Collection and Augmentation: A comprehensive dataset

comprising both real and fake speech samples is gathered. We will
employ the ASVSpoof 2019 dataset. Dataset is extended using
various data augmentation techniques such as Time Stretching, Pitch
Shifting, Volume Scaling, etc. to increase data available by fivefolds
to tenfolds.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 43 / 60

Methodologies

2. Data Preprocessing and Feature Extraction: The collected

audio files undergo preprocessing steps, including normalization, noise
reduction, and segmentation into manageable frames. This ensures
consistent quality and facilitates better feature extraction. Audio
samples truncated to 4 seconds if length greater than 4 seconds. For
samples less than 4 seconds, we loop the audio sample till it reaches a
length of 4 seconds. This length normalization makes sure all samples
in a batch are of the same length. The four second limit is simply the
ceiling of the median of all audio lengths in the training set. For
spectrogram generation, Sampling rate is 16 kHz, Fast Fourier
Transform Coefficients is 1728 and Hamming Window is 108 ms in
length with a 10 ms time frame shift. Log-spectrogram values are
taken and Z-normalised for input representation.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 44 / 60

Methodologies

3. Model Design: We propose a model architecture involving,

Input Processing Block: 2D convolution (5x5), ReLU, Batch
Normalization, Max-Pooling.
Convolution Blocks: 1x1 and 3x3 convolutions, ReLU, Batch
Normalization, Max-Pooling. Total 4 in number.
Classification Block: Linear layers with ReLU and Dropout, followed by
Softmax for predictions.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 45 / 60

Model Achitecture

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 46 / 60

Methodologies

4. Training and Evaluation: The model is trained on the

preprocessed audio features using labeled data (bonafide vs fake).
Binary cross-entropy is chosen as Loss function and Adam’s algorithm
is employed for optimisation. Learning rate scheduling may be used to
enhance convergence. After training, the model’s hyperparameters are
checked on the development set. After performing hyperparameter
tuning, model is evaluated on performance metrics such as accuracy,
precision, recall, F1-score, and ROC-AUC are calculated to assess the
model’s effectiveness. A confusion matrix can also be generated to
visualize the model’s performance in distinguishing between real and
fake speech. This provides insights into misclassifications and areas
for improvement.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 47 / 60

Methodologies

5. Deployment: Finally, we plan on developing a user-friendly web

interface or API to allow users to input speech and receive analysis
results. Future goals involve ensuring the system is scalable and can
generalise among authenticated users.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 48 / 60

Component Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 49 / 60

Activity Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 50 / 60

Deployment Diagram

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 51 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 52 / 60

Implementation and Results

Our best performing model, single-task variant of CNN, achieves a

macro F1 score of 97.61 on the validation set. The model can be
further applied on augmented data to enhance generalisation and will
prove helpful in deployment. The loss in evaluation metrics are well
above the human observation level of about 85%.
Due to the usage of a efficient CNN architecture, the processing time
is very low, from feeding input to generating the result. These models
need fewer than 50,000 parameters and have around 100 KB memory
footprint. This is highly commendable in the field of Audio Signal
Processing.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 53 / 60

Contents

1 Introduction

2 Motivation

3 Literature Review

4 Gaps in Present Study

5 Aim and Objectives

6 Proposed Methodologies

7 Implementation & Results

8 Conclusion

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 54 / 60

Conclusion

This project developed a CNN-based Fake Speech Detection system

that effectively distinguishes between genuine and manipulated audio
samples. By employing data augmentation techniques, we enhanced
the model’s robustness, enabling better generalization to various
audio manipulations. As the prevalence of fake speech increases, this
research underscores the need for advanced methodologies to combat
misinformation in audio communications. Future work will focus on
refining model architectures and incorporating larger, more diverse
datasets to further enhance detection capabilities, contributing to
audio forensics and security efforts.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 55 / 60

Acknowledgement

It gives us great pleasure in presenting the preliminary project report

on ‘Fake Speech Detection’. We would like to take this
opportunity to thank our internal guide Prof. Mangesh Hajare,
along with Prof. Anup Kadam. We would also like to ex- press our
gratitude to our HOD Prof. (Dr.) Sunil Dhore.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 56 / 60

References I

Dora M Ballesteros, Yohanna Rodriguez-Ortega, Diego Renza, and

Gonzalo Arce.
Deep4snet: deep learning for fake speech classification.
Expert Systems with Applications, 184:115465, 2021.
Jinghong Zhang, Xiaowei Yi, and Xianfeng Zhao.
One-class fake speech detection based on improved support vector
data description.
Security and Communication Networks, 2023(1):8830894, 2023.
Ameer Hamza, Abdul Rehman Rehman Javed, Farkhund Iqbal, Natalia
Kryvinska, Ahmad S Almadhor, Zunera Jalil, and Rouba Borghol.
Deepfake audio detection via mfcc features using machine learning.
IEEE Access, 10:134018–134028, 2022.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 57 / 60

References II

Jun Xue, Cunhang Fan, Jiangyan Yi, Chenglong Wang, Zhengqi Wen,
Dan Zhang, and Zhao Lv.
Learning from yourself: A self-distillation method for fake speech
detection.
In ICASSP 2023-2023 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023.
Nishant Subramani and Delip Rao.
Learning efficient representations for fake speech detection.
In Proceedings of the AAAI Conference on Artificial Intelligence,
volume 34, pages 5859–5866, 2020.

BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 58 / 60

Fake Speech Detection

Arnav Kshetri, Animesh Chaturvedi, Sangam, Deepak

under the guidance of

Prof. Mangesh Hajare

Army Institute of Technology

Savitribai Phule Pune University
November 12, 2024
BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 59 / 60
BE Computer 2024-25 (AIT) Computer Engineering November 12, 2024 60 / 60

Detecting Synthetic Speech Deepfakes
No ratings yet
Detecting Synthetic Speech Deepfakes
5 pages
Final Deepfake Voice Detection Report
No ratings yet
Final Deepfake Voice Detection Report
36 pages
Final
No ratings yet
Final
35 pages
Research Paper Update S6
No ratings yet
Research Paper Update S6
9 pages
Audio Deepfake Detection by Using Machine and Deep Learning
No ratings yet
Audio Deepfake Detection by Using Machine and Deep Learning
5 pages
AI Audio Deepfake
No ratings yet
AI Audio Deepfake
18 pages
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
No ratings yet
IJISAE 3 Dr.+Shwetambari+Borade 3 1899
8 pages
Deepfake Audio Detection Via MFCC Features Using M
No ratings yet
Deepfake Audio Detection Via MFCC Features Using M
11 pages
Deepfake Report Finalll-1
No ratings yet
Deepfake Report Finalll-1
37 pages
Deep Fake Audio Detection Project Report
No ratings yet
Deep Fake Audio Detection Project Report
82 pages
Deepfake Audio Detection Using MFCC Features: Priya N V, Pavan H, Prajwal S, Varun R Vinay A
100% (1)
Deepfake Audio Detection Using MFCC Features: Priya N V, Pavan H, Prajwal S, Varun R Vinay A
11 pages
DeepFake Audio Detection
67% (3)
DeepFake Audio Detection
16 pages
Deepfake Audio Detection Techniques
No ratings yet
Deepfake Audio Detection Techniques
12 pages
Deepfake Basepaper
No ratings yet
Deepfake Basepaper
3 pages
Applsci 13 08488 v2
No ratings yet
Applsci 13 08488 v2
15 pages
Audio DeepFake Detection Insights
100% (1)
Audio DeepFake Detection Insights
6 pages
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
No ratings yet
A Hybrid CNN-LSTM Approach For Deepfake Audio Detection CRC FINAL
6 pages
Report
No ratings yet
Report
7 pages
Computers 13 00256
No ratings yet
Computers 13 00256
13 pages
Unmasking - The - Truth - A - Deep - Learning - Approach - To - Detecting - Deepfake - Audio - Through - MFCC - Features - P
No ratings yet
Unmasking - The - Truth - A - Deep - Learning - Approach - To - Detecting - Deepfake - Audio - Through - MFCC - Features - P
8 pages
I RJ Mets 70500014797
No ratings yet
I RJ Mets 70500014797
9 pages
Base Paper 1 (Hybrid Approach)
No ratings yet
Base Paper 1 (Hybrid Approach)
6 pages
I RJ Mets 70200047524
No ratings yet
I RJ Mets 70200047524
6 pages
Deepfakes Audio Detection Techniques Using Deep Convolutional Neural Network-Paper3
No ratings yet
Deepfakes Audio Detection Techniques Using Deep Convolutional Neural Network-Paper3
6 pages
Project PPT Bhu
No ratings yet
Project PPT Bhu
12 pages
Deepfake Audio Detection Using CNNs
No ratings yet
Deepfake Audio Detection Using CNNs
13 pages
Base Paper Audio Deep Fake Detection
No ratings yet
Base Paper Audio Deep Fake Detection
16 pages
Project
No ratings yet
Project
14 pages
Audio Deepfake Detection Using Deep Learning
No ratings yet
Audio Deepfake Detection Using Deep Learning
12 pages
Audio - Deepfake - Detection - Using - Deep - Learning Paper2
No ratings yet
Audio - Deepfake - Detection - Using - Deep - Learning Paper2
6 pages
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
100% (1)
Synthetic Speech Detection Through Short Term and Long-Term Prediction Traces
14 pages
Deepfake Audio Detection and Justification With Ex
No ratings yet
Deepfake Audio Detection and Justification With Ex
19 pages
Implementation Paper
No ratings yet
Implementation Paper
13 pages
Deep Fake Audio Detection Methods
100% (2)
Deep Fake Audio Detection Methods
14 pages
Most Important
No ratings yet
Most Important
33 pages
DeepFake Detection System
No ratings yet
DeepFake Detection System
60 pages
Phoneme-Level Feature Discrepancies: A Key To Detecting Sophisticated Speech Deepfakes
No ratings yet
Phoneme-Level Feature Discrepancies: A Key To Detecting Sophisticated Speech Deepfakes
10 pages
Electronics 14 02040
No ratings yet
Electronics 14 02040
13 pages
Ahmed Raza
No ratings yet
Ahmed Raza
4 pages
AReviewofModernAudioDeepfakeDetectionMethods PDF
No ratings yet
AReviewofModernAudioDeepfakeDetectionMethods PDF
20 pages
Deepfake Audio Detection Techniques
No ratings yet
Deepfake Audio Detection Techniques
9 pages
RBPRATYUSH448
No ratings yet
RBPRATYUSH448
20 pages
Audio Deepfake Detection Using CNN
100% (1)
Audio Deepfake Detection Using CNN
16 pages
Notes
No ratings yet
Notes
3 pages
AudioVeritas A Machine Learning Model To
No ratings yet
AudioVeritas A Machine Learning Model To
8 pages
Audio DeepFake Detection (Innovative)
No ratings yet
Audio DeepFake Detection (Innovative)
16 pages
Paper 6
No ratings yet
Paper 6
10 pages
A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions
100% (1)
A Review of Modern Audio Deepfake Detection Methods: Challenges and Future Directions
20 pages
Audio Deepfake Approaches
No ratings yet
Audio Deepfake Approaches
31 pages
Deepfake Detection Using GANs
No ratings yet
Deepfake Detection Using GANs
9 pages
6044-Article Text-9269-1-10-20200513
No ratings yet
6044-Article Text-9269-1-10-20200513
8 pages
Detecting Audio DeepFakes
No ratings yet
Detecting Audio DeepFakes
19 pages
Beyond The Illusion Ensemble Learning For Effective Voice Deepfake Detection
No ratings yet
Beyond The Illusion Ensemble Learning For Effective Voice Deepfake Detection
20 pages
AI-Driven Detection of Deepfake Audio Calls
No ratings yet
AI-Driven Detection of Deepfake Audio Calls
16 pages
Deep4SNet: Deep Learning For Fake Speech Classification
No ratings yet
Deep4SNet: Deep Learning For Fake Speech Classification
12 pages
Deepfake Audio Detection Via MFCC Features Using Machine Learning
No ratings yet
Deepfake Audio Detection Via MFCC Features Using Machine Learning
11 pages
UG21T2403 Applied Mathematics - VI
No ratings yet
UG21T2403 Applied Mathematics - VI
2 pages
Hydrogen PDF
No ratings yet
Hydrogen PDF
8 pages
Laplace Equation Workshop Tasks and Solutions
No ratings yet
Laplace Equation Workshop Tasks and Solutions
11 pages
Attitudes Towards Mathematics Inventory (ATMI)
100% (1)
Attitudes Towards Mathematics Inventory (ATMI)
2 pages
FPGA Logic Gates and ICs Lab Manual
No ratings yet
FPGA Logic Gates and ICs Lab Manual
29 pages
DPP 5 (Calculus by Ud Sir) New
No ratings yet
DPP 5 (Calculus by Ud Sir) New
9 pages
Batch18 Group01 Excel Assignment Econ105
No ratings yet
Batch18 Group01 Excel Assignment Econ105
41 pages
Calculus Formula Sheet
No ratings yet
Calculus Formula Sheet
2 pages
Phase Noise in CMOS Ring Oscillators
No ratings yet
Phase Noise in CMOS Ring Oscillators
14 pages
IMO Level 2 - Class 5 (2022-2023) VGXPK
No ratings yet
IMO Level 2 - Class 5 (2022-2023) VGXPK
13 pages
Micro-Perforated Panel Absorbers Explained
No ratings yet
Micro-Perforated Panel Absorbers Explained
6 pages
MACD-V Volatility Normalised Momentum
No ratings yet
MACD-V Volatility Normalised Momentum
44 pages
Overall Equipment Effectiveness: Theory and Application: Oee Workshop
100% (1)
Overall Equipment Effectiveness: Theory and Application: Oee Workshop
30 pages
Network Theory - Quick Guide
No ratings yet
Network Theory - Quick Guide
94 pages
Lecture Notes: Distributed OS Theories
No ratings yet
Lecture Notes: Distributed OS Theories
14 pages
The Jacobson Radical
No ratings yet
The Jacobson Radical
7 pages
A Closure Jacking Force Calculation Algorithm For
No ratings yet
A Closure Jacking Force Calculation Algorithm For
13 pages
AWP Unit 2 Q&A
No ratings yet
AWP Unit 2 Q&A
86 pages
SAT Math Prep for 800 Score
No ratings yet
SAT Math Prep for 800 Score
9 pages
5 - Cambridge Mathematics Paper 1
No ratings yet
5 - Cambridge Mathematics Paper 1
7 pages
Lecture # 02 Goverening Equations and Its Discritization
No ratings yet
Lecture # 02 Goverening Equations and Its Discritization
55 pages
Economic Optimum for Farmers' Input Use
No ratings yet
Economic Optimum for Farmers' Input Use
11 pages
Ph.D. Positions - UMN
No ratings yet
Ph.D. Positions - UMN
3 pages
Advanced Analytics For Train Delay Prediction Systems by Including Exogenous Weather Data
No ratings yet
Advanced Analytics For Train Delay Prediction Systems by Including Exogenous Weather Data
10 pages
Polynomial Division Lesson Plan
No ratings yet
Polynomial Division Lesson Plan
4 pages
June 2023 QP - Paper FM1 Edexcel Further Maths A Level
No ratings yet
June 2023 QP - Paper FM1 Edexcel Further Maths A Level
28 pages
Signals and Systems Q&A for ECE Students
100% (2)
Signals and Systems Q&A for ECE Students
21 pages
Recurrence Relation-Substitution Method
No ratings yet
Recurrence Relation-Substitution Method
4 pages
Siemens 840D G-Code Traceability
100% (1)
Siemens 840D G-Code Traceability
41 pages
SALADAGA, G.Educ 102-ASSIGNMENT 2
No ratings yet
SALADAGA, G.Educ 102-ASSIGNMENT 2
7 pages