International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 12 December 2024
Page No: 2682- 2686
https://irjaeh.com
https://doi.org/10.47392/IRJAEH.2024.0370
Fake Feedback Detection Using Machine Learning
MS.M Amshavalli1, Aditya Baibhav2, Deepak Kumar3, MD Arif Reza4
1
Assistant Professor, Department of CSE, Erode Sengunthar Engineering College, Perundurai, Erode, Tamil
Nadu, India.
2,3,4
Student, Department of Computer Science and Engineering, Erode Sengunthar Engineering College,
Perundurai, Erode, Tamil Nadu, India.
Emails:
[email protected], thisisadityabaibhavgmail.com2,
[email protected],
[email protected]Abstract
The prevalence of false feedback increases the reliance on online media for information and interaction, which
is a major challenge for businesses, consumers and reputation management. This project presents a new way
to detect false positives using machine learning techniques. We propose a multi-modal model that uses natural
language processing (NLP) and supervised learning algorithms to analyze text response data. Our
methodology includes sentiment analysis, the extraction of language features and behavioral patterns to
distinguish between genuine feedback and fake news. We evaluate our model using a comprehensive data set
that includes real and synthetic feedback samples. We are Analyzing that our approach which we are going
to implement in future that can achieves high accuracy and robustness and is significantly better than
traditional detection methods. In addition, we discuss the implications of our findings for increasing trust in
online reviews and the potential for feedback monitoring. This initiative will contribute to the growing digital
presence and provide a scalable solution for stakeholders seeking to reduce the impact of false positives in
various domains.
Keywords: Reputation Management, Sentiment Analysis, Scalable Solution, Robustness, Multi-modal Mod
1. Introduction
The detection of fraudulent feedback through combat this escalating issue, machine learning offers
machine learning (ML) has become an essential focus a robust, data-centrist solution by utilizing
in the contemporary digital environment, where sophisticated algorithms and models capable of
online reviews and feedback significantly impact automatically scrutinizing extensive amounts of
businesses, consumers, and reputation management feedback, thereby uncovering concealed patterns and
systems. The swift growth of e-commerce, social nuanced inconsistencies. Methods including natural
media, and various Digital platforms has facilitated language processing (NLP), sentiment analysis, and
the sharing of opinions and reviews by both behavioral pattern recognition have been
legitimate customers and malicious actors, often incorporated into machine learning models to
influencing public perception and the success of evaluate the credibility of feedback by analyzing
businesses. Nevertheless, the rising occurrence of linguistic characteristics, sentiment patterns, and user
false or misleading feedback, including counterfeit interaction behaviors. Additionally, supervised
reviews, deceptive ratings, and altered testimonials, learning techniques allow for the training of systems
poses considerable challenges in differentiating on labeled datasets that include both genuine and
between genuine responses and misleading content. artificial feedback, thereby improving their capacity
Conventional techniques for identifying fraudulent to accurately classify feedback and reduce the
feedback frequently prove inadequate because of the occurrence of false positives. The implementation of
continuously changing strategies employed by multi-modal models that integrate diverse machine
individuals who create deceptive responses, which learning techniques enhances the reliability and
increasingly resemble authentic user interactions. To accuracy of detection systems. This advancement
International Research Journal on Advanced Engineering Hub (IRJAEH) 2682
International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 12 December 2024
Page No: 2682- 2686
https://irjaeh.com
https://doi.org/10.47392/IRJAEH.2024.0370
provides businesses and platforms with scalable, Ahmed M. Elmogy, Usman Tariq, Atef Ibrahim, and
automated solutions that contribute to user trust, Ammar Mohammed [2] in his paper, Fake Reviews
safeguard brand reputation, and maintain the integrity Detection using Supervised Machine Learning. The
of online feedback ecosystems. As the dependence on proposed approach for identifying fraudulent reviews
digital feedback increases across various sectors, employs a blend of supervised machine learning
machine learning will be crucial in addressing the methods and feature engineering, concentrating on
challenges posed by fraudulent feedback and the extraction of both textual and behavioral
promoting transparency within online communities. attributes from reviews and their authors to improve
2. Literature Survey detection precision. During the data pre-processing
Yash Khare, Tejas Bhadane, and Khivasara are stage, the Yelp data set is cleaned and prepared
affiliated with the School of Computer Engineering through techniques such as tokenization, removal of
and Technology at MIT World Peace University in stop words, and lemmatization to ensure text
Pune, India. [1] in his paper Fake News Detection uniformity. Feature extraction encompasses the
System using Web-Extension. The methodology collection of textual attributes, including sentiment
outlined in this paper introduces a web-based tool analysis, TF-IDF, and cosine similarity, as well as
designed to detect fake news through the application behavioral characteristics such as the count of capital
of various machine learning models, notably Long letters, punctuation frequency, presence of emojis,
Short-Term Memory (LSTM) networks in and patterns in the timing of review submissions. A
conjunction with GloVe word embeddings and GPT- variety of classifiers, such as K-Nearest Neighbors
2. During the data preprocessing phase, the titles and (KNN), Naive Bayes, Support Vector Machines
bodies of news articles undergo cleaning via natural (SVM), Logistic Regression, and Random Forest, are
language processing (NLP) methods, which involve utilized. The models are assessed using bi-gram and
the removal of URLs, special characters, and stop tri-gram language models in conjunction with TF-
words. This is followed by the vectorization process IDF for textual feature extraction. The system's
utilizing GloVe embeddings to transform the text into effectiveness is evaluated based on accuracy,
numerical vectors. The LSTM model, a form of precision, recall, and f1-score, utilizing a dataset of
recurrent neural network (RNN), is utilized to capture over 5,000 reviews from Yelp. The findings indicate
long-term dependencies within the text, enabling the that KNN (K=7) surpasses other classifiers,
differentiation between genuine and false news by achieving an f1-score of 82.40%, which increases to
examining semantic patterns. When paired with 86.20% with the inclusion of reviewer behavioral
GloVe embeddings, LSTM enhances the features, underscoring the importance of behavioral
understanding of word context. Furthermore, a GPT- data in the detection of fake reviews. M.N. Istiaq
2 Output Detector is incorporated to recognize AI- Ahsan, Tamzid Nahian, Abdullah All Kafi, Md.
generated content, thereby aiding in the identification Ismail Hossain, and Faisal Muhammad Shah from the
of fabricated news articles. The system is Department of Computer Science & Engineering,
implemented as a web extension, which empowers Ahsanullah University of Science & Technology,
users to report suspicious content. This content is Dhaka, Bangladesh, et al. [3] in his paper An
subsequently analyzed by both models (LSTM + Ensemble Approach to Detect Review Spam Using
GloVe and GPT-2), yielding a probability score that Hybrid Machine Learning Technique. This
reflects the authenticity of the news. Additionally, the methodology integrates supervised learning with
extension flags URLs associated with fake news for active learning techniques to identify fraudulent
future reference. Trained on a data set comprising reviews. In the initial phase, the system detects and
both real and fake news articles, the system achieved eliminates duplicate reviews by employing Kullback-
a remarkable 98.6% accuracy with the LSTM + Leibler divergence (KLD) and Jensen-Shannon
GloVe model, surpassing traditional models such as divergence (JSD) to assess text similarity. The second
Naïve Bayes, which recorded an accuracy of 88.91%. phase is dedicated to creating a hybrid dataset that
International Research Journal on Advanced Engineering Hub (IRJAEH) 2683
International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 12 December 2024
Page No: 2682- 2686
https://irjaeh.com
https://doi.org/10.47392/IRJAEH.2024.0370
combines authentic and fabricated reviews, utilizing Ultimately, a comparative analysis of the results
active learning to label ambiguous instances for based on accuracy and precision indicates that SVM
enhanced training efficacy. During the third phase, surpasses the other algorithms, achieving the highest
this dataset is utilized to train various classifiers, accuracy in both scenarios, with and without stop
including Naive Bayes (NB), Support Vector word removal. Gowri Ramachandran, Daniel
Machine (SVM), Decision Tree (DT), and Maximum Nemeth, David Neville, Dimitrii Zhelezov, Ahmet
Entropy (Maxent), incorporating features such as Yalçin, Oliver Fohrmann, and Bhaskar
unigram, bigram, and trigram word sets to improve Krishnamachari are the authors associated with the
accuracy. The model, tested on a mixture of genuine Viterbi School of Engineering at the University of
and synthetic reviews, demonstrated outstanding Southern California and the Helix Foundation located
performance, with Naive Bayes utilizing bigram in Berlin. et al. [5] proposes an WhistleBlower:
features achieving 95% precision and 88% accuracy, Towards A Decentralized and Open Platform for
thereby confirming its capability in detecting review Spotting Fake News. The proposed methodology
spam. Elshrif Elmurngi and Abdelouahed Gherbi, presents WhistleBlower, a decentralized platform
affiliated with the Department of Software and IT aimed at identifying fake news through the utilization
Engineering at École de Technologie Supérieure in of blockchain and Distributed Ledger Technology
Montreal, Canada., et al. [4] in his paper An (DLT). It incorporates Artificial Intelligence (AI) and
Empirical Study on Detecting Fake Reviews Using Machine Learning (ML) algorithms to enhance the
Machine Learning Techniques. The research accuracy of fake news detection. At the core of this
examines the identification of fraudulent reviews platform are the detection algorithms, which assess
using Sentiment Analysis (SA) techniques, the credibility of news articles by analyzing both their
employing a data set comprising 2,000 movie content and sources. Furthermore, the platform
reviews, evenly split between 1,000 positive and includes a verifiable computation framework that
1,000 negative reviews for the purpose of allows community members to run these detection
classification. The approach initiates with the algorithms on their own nodes, thereby ensuring the
gathering of movie reviews, followed by a data pre integrity of the results through honest computation
processing phase that includes the removal of stop validation. The architecture also features a Token-
words—such as "a," "the," and "of"—to discard Curated Registry (TCR) that permits community
terms that do not significantly aid in the classification members to contest and improve the detection
task. This process is executed using the String To algorithms. This TCR maintains a curated list of
Word Vector filter available in the Weka software. algorithms, enabling users to raise challenges if they
Following this, four feature selection techniques are believe an algorithm's assessment is flawed.
implemented to improve classification accuracy by Community engagement is encouraged, as
eliminating irrelevant features. For the sentiment participants earn tokens for their contributions to the
classification task, four machine learning algorithms cu-ration process. The system functions in a
are utilized: Naïve Bayes (NB), a probabilistic decentralized fashion, distributing computations
classifier grounded in Bayes' theorem; Support across public nodes, which allows the community to
Vector Machine (SVM), a supervised learning model evaluate the effectiveness of the algorithms. This
that discerns patterns for classification; K-Nearest innovative design not only enhances transparency but
Neighbor (KNN), a non-parametric approach reliant also reduces the risks linked to centralized
on distance metrics; and Decision Tree (DT-J48), governance, thereby fostering a collaborative and
which employs tree structures for classification reliable environment for the detection of fake news.
purposes. The identification of fake reviews is Claudio Marche, Ilaria Cabiddu, Christian Giovanni
supported by analyzing the outcomes through a Castangia, Luigi Serreli, and Michele Nitti are
confusion matrix, which evaluates true positives, associated with the Department of Electrical and
false positives, true negatives, and false negatives. Electronic Engineering (DIEE) at the University of
International Research Journal on Advanced Engineering Hub (IRJAEH) 2684
International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 12 December 2024
Page No: 2682- 2686
https://irjaeh.com
https://doi.org/10.47392/IRJAEH.2024.0370
Cagliari, as well as the National Telecommunication feedback across numerous platforms. This problem
Inter University Consortium located in Cagliari, Italy. affects consumers, businesses, and online
et al. [6] has proposed a Implementation of a Multi- marketplaces alike, as it erodes trust and
Approach Fake News Detector and of a Trust misrepresents the actual worth of products and
Management Model for News Sources. The services. Employing machine learning (ML)
document presents a detailed two-part system aimed methodologies to identify fake feedback presents a
at identifying fake news and assessing the reliability viable solution to alleviate this concern, thereby
of news sources [7]. The initial segment, known as improving consumer protection and preserving the
the Fake News Detector, scrutinizes the textual authenticity of online reviews. In our research, we
content of news articles through various machine examined various machine learning algorithms,
learning methodologies. It categorizes news as either including Naïve Bayes, Support Vector Machines
authentic or fraudulent by examining several (SVM), Decision Trees, and ensemble techniques
elements, including writing style—employing like Random Forest. The findings revealed that
linguistic characteristics such as text length, although all models were effective in differentiating
informality, complexity, and variety—fact-checking between authentic and fraudulent reviews, the
by juxtaposing claims in the news against verified Random Forest model demonstrated superior
statements from a pre-trained network like FEVER, accuracy and resilience. This observation is
and sentiment analysis to evaluate the alignment consistent with existing research indicating that
between the headline and the article's content while ensemble methods frequently surpass individual
ensuring objectivity [8]. This detector has been algorithms by combining predictions and mitigating
trained and validated using a significant dataset from overfitting. Additionally, the application of feature
Kaggle, which includes over 20,000 news articles. extraction methods, such as sentiment analysis and
The second segment, the Trust Management Model, the identification of linguistic patterns, significantly
assesses the credibility of news sources based on improved the performance of the models. For
multiple factors: expertise, which gauges the quantity example, reviews characterized by an abundance of
and quality of news a source produces on particular emotional language or the use of unconventional
subjects; relevance, which examines the frequency of phrases were more likely to be flagged as fraudulent.
requests for the source’s news; and goodwill and Conclusion
coherence, which evaluates the historical The project named "Fake Feedback Detection using
dependability and consistency of the source over Machine Learning" signifies a notable progression in
time, taking into account evolving behaviors such as the ongoing efforts to combat fraudulent practices
misleading or deceptive news tactics [9]. This model related to online reviews, which play a crucial role in
also employs a prebunking strategy, designed to influencing consumer choices within the rapidly
pinpoint unreliable sources before misinformation expanding e-commerce landscape. By utilizing
can proliferate. Both components utilize machine advanced machine learning methodologies,
learning techniques and are capable of simulating specifically the Random Forest algorithm, we have
real-time evaluations of news. Furthermore, the successfully established a comprehensive framework
document investigates the potential use of blockchain that can effectively differentiate between genuine and
technology for the secure storage and management of misleading feedback. This distinction is made
news assessments, highlighting its benefits in possible through an in-depth examination of various
comparison to conventional databases [10]. attributes, such as sentiment and linguistic features,
3. Discussion which uncover subtle indicators of fraudulent
The initiative named "Fake Feedback Detection using activity. Although our findings indicate the model's
Machine Learning" tackles a significant issue in the considerable promise for automating the detection of
contemporary digital environment: the widespread fake feedback, several challenges persist that must be
occurrence of fraudulent reviews and misleading addressed to ensure its sustained efficacy. A
International Research Journal on Advanced Engineering Hub (IRJAEH) 2685
International Research Journal on Advanced Engineering Hub (IRJAEH)
e ISSN: 2584-2137
Vol. 02 Issue: 12 December 2024
Page No: 2682- 2686
https://irjaeh.com
https://doi.org/10.47392/IRJAEH.2024.0370
significant concern is the prevalence of imbalanced Review," Journal of Web Engineering, vol.
datasets, where authentic reviews significantly 22, no. 5, pp. 821–848, Dec. 2023. DOI:
outnumber fraudulent ones, which poses a 10.13052/jwe1540-9589.2254.
considerable risk of skewed results that could [6]. M.N. Istiaq Ahsan, Tamzid Nahian, Abdullah
undermine the model's precision and dependability. All Kafi, Md. Ismail Hossain, and Faisal
Additionally, as deceptive strategies continue to Muhammad Shah1. “An Ensemble approach
evolve and grow more intricate, there is an urgent to detect Review Spam using hybrid Machine
necessity for continuous adaptation and retraining of Learning Technique.” 19th International
the model to preserve its relevance and effectiveness Conference on Computer and Information
in practical applications. Ethical considerations are Technology, North South University, Dhaka,
also paramount in the implementation of such Bangladesh, pp. 388–394, Dec. 20162. DOI:
systems; the potential for false positives could result 10.1109/CIT.2016.31.
in unfair repercussions for legitimate users, thereby [7]. Nidhi A. Patel, Prof .Rakesh Patel “A Survey
diminishing trust in the detection system itself. on Fake Review Detection using Machine
References Learning Techniques” and it was presented in
[1]. Ahmed M. Elmogy, Usman Tariq, Atef the year 2018 4th International Conference
Ibrahim, Ammar Mohammed, "Fake Reviews on Computing Communication and
Detection using Supervised Machine Automation (ICCCA).
Learning," International Journal of Advanced [8]. Neville, Dimitrii Zhelezov, Ahmet Yalçin,
Computer Science and Applications Oliver Fohrmann, and Bhaskar
(IJACSA), Vol. 12, No. 1, 2021. Krishnamachari. "WhistleBlower: Towards A
[2]. Claudio Marche, Ilaria Cabiddu, Christian Decentralized and Open Platform for Spotting
Giovanni Castangia, Luigi Serreli, and Fake News." 2020 IEEE International
Michele Nitti. "Implementation of a Multi- Conference onBlockchain , pp. 154-161,
Approach Fake News Detector and of a Trust 2020. DOI:
Management Model for News Sources." 10.1109/Blockchain50366.2020.00026.
IEEE Transactions on Services Computing, [9]. Rami Mohawesh, Shuxiang Xu, Yaser
vol. 16, no. 6, pp. 4288-4300, Nov./Dec. Jararweh, Sumbal Maqsood “Fake Reviews
2023. DOI: 10.1109/TSC.2023.3311629. Detection: A Survey” was presented at the
[3]. Elmurngi, E., & Gherbi, A. (2017). “An year of 6 May 2021in IEEE.
empirical study on detecting fake reviews [10]. Yash Khivasara, Yash Khare, and Tejas
using machine learning techniques”. In Bhadane. "Fake News Detection System
Proceedings of the Seventh International Using Web-Extension." 2020 IEEE Pune
Conference on Innovative Computing Section International Conference (PuneCon),
Technology (INTECH 2017) (pp. 107–114). Vishwakarma Institute of Technology, Pune,
IEEE. India, pp. 119–123, Dec. 2020. DOI:
[4]. Faiza Masood, Ghana Ammad, Ahmad 10.1109/PuneCon50868.2020.9362384.
Almogren, Assad Abbas, Hasan Ali Khattak,
Ikram Ud Din, Mohsen Guizani, and Mansour
Zuair. "Spammer Detection and Fake User
Identification on Social Networks." IEEE
Acces, vol. 7, pp. 68140–68150, June 2019.
DOI: 10.1109/ACCESS.2019.2918196.
[5]. Mohammed Ennaouri and Ahmed Zellou,
"Machine Learning Approaches for Fake
Reviews Detection: A Systematic Literature
International Research Journal on Advanced Engineering Hub (IRJAEH) 2686