Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
…
10 pages
1 file
The recent explosion of false claims in social media and on the Web in general has given rise to a lot of manual fact-checking initiatives. Unfortunately, the number of claims that need to be fact-checked is several orders of magnitude larger than what humans can handle manually. Thus, there has been a lot of research aiming at automating the process. Interestingly, previous work has largely ignored the growing number of claims about images. This is despite the fact that visual imagery is more influential than text and naturally appears alongside fake news. Here we aim at bridging this gap. In particular, we create a new dataset for this problem, and we explore a variety of features modeling the claim, the image, and the relationship between the claim and the image. The evaluation results show sizable improvements over the baseline. We release our dataset, hoping to enable further research on fact-checking claims about images.
Neural Computing and Applications, 2021
Social media are the main contributors to spreading fake images. Fake images are manipulated images altered through software or by other means to change the information they convey. Fake images propagated over microblogging platforms generate misrepresentation and stimulate polarization in the people. Detection of fake images shared over social platforms is extremely critical to mitigating its spread. Fake images are often associated with textual data. Hence, a multi-modal framework is employed utilizing visual and textual feature learning. However, few multi-modal frameworks are already proposed; they are further dependent on additional tasks to learn the correlation between modalities. In this paper, an efficient multi-modal approach is proposed, which detects fake images of microblogging platforms. No further additional subcomponents are required. The proposed framework utilizes explicit convolution neural network model EfficientNetB0 for images and sentence transformer for text analysis. The feature embedding from visual and text is passed through dense layers and later fused to predict fake images. To validate the effectiveness, the proposed model is tested upon a publicly available microblogging dataset, MediaEval (Twitter) and Weibo, where the accuracy prediction of 85.3% and 81.2% is observed, respectively. The model is also verified against the newly created latest Twitter dataset containing images based on India's significant events in 2020. The experimental results illustrate that the proposed model performs better than other state-of-art multi-modal frameworks.
2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
David Cameron addressing Jamaica's parliament. Jamaican election: Labour Party wins narrow victory. Jamaica accuses David Cameron of misrepresenting prisoner deal. ... Query image Text-text consistency Step 2 Image-image consistency Step 1 Step 2 Image-caption consistency : Falsified! WWW Step 1 Query caption Ground truth: Falsified Figure 1. To evaluate the veracity of image-caption pairings, we leverage visual and textual evidence gathered by querying the Web. We propose a novel framework to detect the consistency of the claim-evidence (text-text and image-image), in addition to the image-caption pairing. Highlighted evidence represents the model's highest attention, showing a difference in location compared to the query caption.
Multimedia Systems
The growth in the use of social media platforms such as Facebook and Twitter over the past decade has significantly facilitated and improved the way people communicate with each other. However, the information that is available and shared online is not always credible. These platforms provide a fertile ground for the rapid propagation of breaking news along with other misleading information. The enormous amounts of fake news present online have the potential to trigger serious problems at an individual level and in society at large. Detecting whether the given information is fake or not is a challenging problem and the traits of social media makes the task even more complicated as it eases the generation and spread of content to the masses leading to an enormous volume of content to analyze. The multimedia nature of fake news on online platforms has not been explored fully. This survey presents a comprehensive overview of the state-of-the-art techniques for combating fake news on online media with the prime focus on deep learning (DL) techniques keeping multimodality under consideration. Apart from this, various DL frameworks, pre-trained models, and transfer learning approaches are also underlined. As till date, there are only limited multimodal datasets that are available for this task, the paper highlights various data collection strategies that can be used along with a comparative analysis of available multimodal fake news datasets. The paper also highlights and discusses various open areas and challenges in this direction.
ArXiv, 2021
This paper describes our participant system for the multi-modal fact verification (Factify) challenge at AAAI 2022. Despite the recent advance in text based verification techniques and large pre-trained multimodal models cross vision and language, very limited work has been done in applying multimodal techniques to automate fact checking process, particularly considering the increasing prevalence of claims and fake news about images and videos on social media. In our work, the challenge is treated as multimodal entailment task and framed as multi-class classification. Two baseline approaches are proposed and explored including an ensemble model (combining two uni-modal models) and a multimodal attention network (modeling the interaction between image and text pair from claim and evidence document). We conduct several experiments investigating and benchmarking different SoTA pre-trained transformers and vision models in this work. Our best model is ranked first in leaderboard which o...
Journal of Intelligent Information Systems
Digital Mass Media has become the new paradigm of communication that revolves around online social networks. The increase in the utilization of online social networks (OSNs) as the primary source of information and the increase of online social platforms providing such news has increased the scope of spreading fake news. People spread fake news in multimedia formats like images, audio, and video. Visual-based news is prone to have a psychological impact on the users and is often misleading. Therefore, Multimodal frameworks for detecting fake posts have gained demand in recent times. This paper proposes a framework that flags fake posts with Visual data embedded with text. The proposed framework works on data derived from the Fakeddit dataset, with over 1 million samples containing text, image, metadata, and comments data gathered from a wide range of sources, and tries to exploit the unique features of fake and legitimate images. The proposed framework has different architectures to learn visual and linguistic models from the post individually. Image polarity datasets, derived from Flickr, are also considered for analysis, and the features extracted from these visual and text-based data helped in flagging news. The proposed fusion model has achieved an overall accuracy of 91.94%, Precision of 93.43%, Recall of 93.07%, and F1-score of 93%. The experimental results show that the proposed Multimodality model with Image and Text achieves better results than other state-of-art models working on a similar dataset.
Lecture Notes in Computer Science, 2020
We present an overview of the third edition of the CheckThat! Lab at CLEF 2020. The lab featured five tasks in two different languages: English and Arabic. The first four tasks compose the full pipeline of claim verification in social media: Task 1 on check-worthiness estimation, Task 2 on retrieving previously fact-checked claims, Task 3 on evidence retrieval, and Task 4 on claim verification. The lab is completed with Task 5 on check-worthiness estimation in political debates and speeches. A total of 67 teams registered to participate in the lab (up from 47 at CLEF 2019), and 23 of them actually submitted runs (compared to 14 at CLEF 2019). Most teams used deep neural networks based on BERT, LSTMs, or CNNs, and achieved sizable improvements over the baselines on all tasks. Here we describe the tasks setup, the evaluation results, and a summary of the approaches used by the participants, and we discuss some lessons learned. Last but not least, we release to the research community all datasets from the lab as well as the evaluation scripts, which should enable further research in the important tasks of check-worthiness estimation and automatic claim verification.
2018 IEEE International Conference on Big Data (Big Data), 2018
With the increasing popularity of online social media (e.g., Facebook, Twitter, Reddit), the detection of misleading content on social media has become a critical undertaking. This paper focuses on an important but largely unsolved problem: detecting fauxtography (i.e., social media posts with misleading images). We found that the existing literature falls short in solving this problem. In particular, current solutions either focus on the detection of fake images or misinformed texts of a social media post. However, they cannot solve our problem because the detection of fauxtography depends not only on the truthfulness of the images and the texts but also on the information they deliver together on the posts. In this paper, we develop the FauxBuster, an end-to-end supervised learning scheme that can effectively track down fauxtography by exploring the valuable clues from user's comments of a post on social media. The FauxBuster is content-free in that it does not rely on the analysis of the actual content of the images, and hence is robust against malicious uploaders who can intentionally modify the presentation and description of the images. We evaluate FauxBuster on real-world data collected from two mainstream social media platforms-Reddit and Twitter. Results show that our scheme is both effective and efficient in addressing the fauxtography problem.
Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2020
Large-scale dissemination of disinformation online intended to mislead or deceive the general population is a major societal problem. Rapid progression in image, video, and natural language generative models has only exacerbated this situation and intensified our need for an effective defense mechanism. While existing approaches have been proposed to defend against neural fake news, they are generally constrained to the very limited setting where articles only have text and metadata such as the title and authors. In this paper, we introduce the more realistic and challenging task of defending against machine-generated news that also includes images and captions. To identify the possible weaknesses that adversaries can exploit, we create a NeuralNews dataset composed of 4 different types of generated articles as well as conduct a series of human user study experiments based on this dataset. In addition to the valuable insights gleaned from our user study experiments, we provide a relatively effective approach based on detecting visualsemantic inconsistencies, which will serve as an effective first line of defense and a useful reference for future work in defending against machine-generated disinformation. Parliament was scheduled to reconvene on Oct 9, but Mr. Johnson said he planned to extend its break. nytimes.com What's Next for Britons after Brexit?
arXiv (Cornell University), 2023
Combating disinformation is one of the burning societal crises-about 67% of the American population believes that disinformation produces a lot of uncertainty, and 10% of them knowingly propagate disinformation. Evidence shows that disinformation can manipulate democratic processes and public opinion, causing disruption in the share market, panic and anxiety in society, and even death during crises. Therefore, disinformation should be identified promptly and, if possible, mitigated. With approximately 3.2 billion images and 720,000 hours of video shared online daily on social media platforms, scalable detection of multimodal disinformation requires efficient fact verification. Despite progress in automatic text-based fact verification (e.g., FEVER, LIAR), the research community lacks substantial effort in multimodal fact verification. To address this gap, we introduce FACTIFY 3M, a dataset of 3 million samples that pushes the boundaries of the domain of fact verification via a mul-† Work does not relate to position at Amazon. timodal fake news dataset, in addition to offering explainability through the concept of 5W question-answering. Salient features of the dataset include: (i) textual claims, (ii) ChatGPT-generated paraphrased claims, (iii) associated images, (iv) stable diffusiongenerated additional images (i.e., visual paraphrases), (v) pixel-level image heatmap to foster image-text explainability of the claim, (vi) 5W QA pairs, and (vii) adversarial fake news stories. 1 FACTIFY 3M-an illustration We introduce FACTIFY 3M (3 million), the largest dataset and benchmark for multimodal fact verification. Consider the example in fig. 1. A widely distributed image of the sports legend Magic Johnson with an IV line in his arm was accompanied by the claim that he was donating blood, implicitly during the COVID-19 pandemic. If true, this is troubling because Magic Johnson is a well-known victim of AIDS and is prohibited from donating blood. The picture predated the COVID-19 epidemic by a decade and is related
Lecture Notes in Computer Science
We describe the fourth edition of the CheckThat! Lab, part of the 2021 Conference and Labs of the Evaluation Forum (CLEF). The lab evaluates technology supporting tasks related to factuality, and covers Arabic, Bulgarian, English, Spanish, and Turkish. Task 1 asks to predict which posts in a Twitter stream are worth fact-checking, focusing on COVID-19 and politics (in all five languages). Task 2 asks to determine whether a claim in a tweet can be verified using a set of previously fact-checked claims (in Arabic and English). Task 3 asks to predict the veracity of a news article and its topical domain (in English). The evaluation is based on mean average precision or precision at rank k for the ranking tasks, and macro-F1 for the classification tasks. This was the most popular CLEF-2021 lab in terms of team registrations: 132 teams. Nearly one-third of them participated: 15, 5, and 25 teams submitted official runs for tasks 1, 2, and 3, respectively.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Lecture Notes in Computer Science
Proceedings of the 2011 ACM Symposium …, 2011
Data Management, Analytics and Innovation, 2021
Advances in Information Retrieval, 2020
Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019
Sustainability, 2023
International Journal of Multimedia Information Retrieval
Cornell University - arXiv, 2020
Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
Journal of Ambient Intelligence and Humanized Computing