Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2012
Research in high stakes deception has been held back by the sparsity of ground truth verification for data collected from real world sources. We describe a set of guidelines for acquiring and developing corpora that will enable researchers to build and test models of deceptive narrative while avoiding the problem of sanctioned lying that is typically required in a controlled experiment. Our proposals are drawn from our experience in obtaining data from court cases and other testimony, and uncovering the background information that enabled us to annotate claims made in the narratives as true or false.
Processing and Machine Learning involves creating and developing methods for automatic discernment of deceptive messages from truthful ones. Mistaking intentionally deceptive pieces of information for authentic ones (true to the writer's beliefs) can create negative consequences, since our everyday decision-making, actions, and mood are often impacted by information we encounter. Such research is vital today as it aims to develop tools for the automated recognition of deceptive, disingenuous or fake information (the kind intended to create false beliefs or conclusions in the reader's mind). The ultimate goal is to support truthfulness ratings that signal the trustworthiness of the retrieved information, or alert information seekers to potential deception. To proceed with this agenda, we require elicitation techniques for obtaining samples of both deceptive and truthful messages from study participants in various subject areas. A data collection, or a corpus of truths and lies, should meet certain basic criteria to allow for meaningful analysis and comparison of socio-linguistic behaviors. In this paper we propose solutions and weigh pros and cons of various experimental set-ups in the art of corpus building. The outcomes of three experiments demonstrate certain limitations with using online crowdsourcing for data collection of this type. Incorporating motivation in the task descriptions, and the role of visual context in creating deceptive narratives are other factors that should be addressed in future efforts to build a quality dataset.
2008
Our goal is to use natural language processing to identify deceptive and nondeceptive passages in transcribed narratives. We begin by motivating an analysis of language-based deception that relies on specific linguistic indicators to discover deceptive statements.
Proceeding EACL 2012 Proceedings of the Workshop on Computational Approaches to Deception Detection, 2012
In this study, we explore several popular techniques for obtaining corpora for deception research. Through a survey of traditional as well as non-gold standard creation approaches, we identify advantages and limitations of these techniques for web-based deception detection and offer crowd-sourcing as a novel avenue toward achieving a gold standard corpus. Through an in-depth case study of online hotel reviews, we demonstrate the implementation of this crowdsourcing technique and illustrate its applicability to a broad array of online reviews.
In this study, we use the computational textual analysis tool, the Gramulator, to identify and examine the distinctive linguistic features of deceptive and truthful discourse. The theme of the study is abortion rights and the deceptive texts are derived from a Devil's Advocate approach, conducted to suppress personal beliefs and values. Our study takes the form of a contrastive corpus analysis, and produces systematic differences between truthful and deceptive personal accounts. Results suggest that deceivers employ a distancing strategy that is often associated with deceptive linguistic behavior. Ultimately, these deceivers struggle to adopt a truth perspective. Perhaps of most importance, our results indicate issues of concern with current deception detection theory and methodology. From a theoretical standpoint, our results question whether deceivers are deceiving at all or whether they are merely poorly expressing a rhetorical position, caused by being forced to speculate on a perceived proto-typical position. From a methodological standpoint, our results cause us to question the validity of deception corpora. Consequently, we propose new rigorous standards so as to better understand the subject matter of the deception field. Finally, we question the prevailing approach of abstract data measurement and call for future assessment to consider contextual lexical features. We conclude by suggesting a prudent approach to future research for fear that our eagerness to analyze and theorize may cause us to misidentify deception. After-all, successful deception, which is the kind we seek to detect, is likely to be an elusive and fickle prey.
Linguistic Evidence in Security, Law & Intelligence, 1(1) (2013)
Written witness statements are a unique source for the study of high-stakes textual deception. To date, however, there is no distinction in the way that they and other forms of verbal deception have been analysed, with written statements treated as extensions of transcribed versions of oral reports. Given the highly context-dependent nature of cues, it makes sense to take the characteristics of the medium into account when analysing for deceptive language. This study examines the characteristic features of witness narratives and proposes a new approach to search for deception cues. Narratives are treated as a progression of episodes over time, and deception as a progression of acts over time. This allows for the profiling of linguistic bundles in sequence, revealing the statements' internal gradient, and deceivers' choice of deceptive linguistic strategy. Study results suggest that, at least in the context of written witness statements, the weighting of individual features as deception cues is not static but depends on their interaction with other cues, and that detecting deceivers' use of linguistic strategy is an effective vehicle for identifying deception.
Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader LIS community. Computational tools capable of alerting users to potentially deceptive content in computer-mediated messages are invaluable for supporting undisrupted,
Natural Language Engineering
Current automatic deception detection approaches tend to rely on cues that are based either on specific lexical items or on linguistically abstract features that are not necessarily motivated by the psychology of deception. Notably, while approaches relying on such features can do well when the content domain is similar for training and testing, they suffer when content changes occur. We investigate new linguistically defined features that aim to capture specific details, a psychologically motivated aspect of truthful versus deceptive language that may be diagnostic across content domains. To ascertain the potential utility of these features, we evaluate them on data sets representing a broad sample of deceptive language, including hotel reviews, opinions about emotionally charged topics, and answers to job interview questions. We additionally evaluate these features as part of a deception detection classifier. We find that these linguistically defined specific detail features are m...
Journal of Personality and Social Psychology, 1983
Twenty-Second …, 2009
We evaluate conversational transcripts of deceptive speech using a sophisticated natural language processing tool called Coh-Metrix. Coh-Metrix is unique in that it tracks linguistic features based on social and cognitive factors. The results from Coh-Metrix are compared to linguistic features reported in previous independent deception research, which used a natural language processing tool called LIWC. The comparison provides converging validity for several linguistic features, and establishes new insights on deceptive language.
Proceedings of IAFL 10th Biennial Cionference, 2012
Traditionally, approaches to deception detection have treated deceptive communication as if they were a one-time event, analysing the overall features and tone of the individual message. However useful this methodology may be, it fails to take account of the fact that deception is a progression of phases, rather than a single occurrence. This study looked at deception in written witness statements as just that. Although statements are written at a single point in time, their storylines are sequences of episodes over an extended timeline. Employing marked sentence structures to code discourse segmentation markers in written narratives, the progression of deception was mapped as it unfolded through the course of the story using the interaction of linguistic cues. In addition to identifying two main deceptive linguistic strategies which deceivers resorted to in writing their statements, results suggest that what may be important is not so much the individual cues, rather the way they are used.
2013
Little research has been undertaken into high stakes deception, and even less into high stakes deception in written text. This study addresses that gap. In this thesis, I present a new approach to detecting deception in written narratives based on the definition of deception as a progression and focusing on identifying deceptive linguistic strategy rather than individual cues. I propose a new approach for subdividing whole narratives into their constituent episodes, each of which is linguistically profiled and their progression mapped to identify authors’ deceptive strategies based on cue interaction. I conduct a double blind study using qualitative and quantitative analysis in which linguistic strategy (cue interaction and progression) and overall cue presence are used to predict deception in witness statements. This results in linguistic strategy analysis correctly predicting 85% of deceptive statements (92% overall) compared to 54% (64% overall) with cues identified on a whole st...
Corpus-linguistic applications, 2019
Experimental laboratory results, often performed with college student subjects, have proposed several linguistic phenomena as indicative of speaker deception. We have identified a subset of these phenomena that can be formalized as a linguistic model. The model incorporates three classes of language-based deception cues: (1) linguistic devices used to avoid making a direct statement of fact, for example, hedges; (2) preference for negative expressions in word choice, syntactic structure and semantics; (3) inconsistencies with respect to verb and noun forms, for example, verb tense changes. The question our research addresses is whether the cues we have adapted from laboratory studies will recognize deception in 'real world' statements by suspects and witnesses. The issue addressed here is how to test the accuracy of these linguistic cues with respect to identifying deception. To perform the test, we assembled a corpus of criminal statements, police interrogations, and civil testimony that we annotated in two distinct ways, first for language-based deception cues and second for verification of the claims made in the narrative data. The paper discusses the possible methods for building a corpus to test the deception cue hypothesis, the linguistic phenomena associated with deception, and the issues involved in assembling a forensic corpus.
Legal and Criminological Psychology, 2010
In this paper, we provide our view of the current understanding of high-stakes lies often occurring in forensic contexts. We underscore the importance of avoiding widespread pitfalls of deception detection and challenging prevailing assumptions concerning strategies for catching liars. The promise and limitations of each of non-verbal/body language, facial, verbal/linguistic, and physiological channels in detecting deception are discussed. In observing the absence of a single cue or behavioural channel that consistently reveals deception, a holistic approach with concurrent attention to multiple channels of a target's behaviour (ideally videotaped for review) and changes from baseline behaviour is recommended whenever possible. Among the best-validated cues to be considered together include: illustrators, blink and pause rate, speech rate, vague descriptions, repeated details, contextual embedding, reproduction of conversations, and emotional 'leakage' in the face. While advocating a reliance on empirical evidence, we observe that few studies of high-stakes deception yet have been conducted. Further, some manifestations of lying are highly idiosyncratic and difficult to address in quantitative research, pointing to the need for keen observation skills, and psychological insight. A recurring theme is the need for the field to devise innovative approaches for studying high-stakes lies to promote ecological validity. Ultimately, such work will provide a strong foundation for the responsible application of deception research in forensic and security settings.
Recent improvements in effectiveness and accuracy of the emerging field of automated deception detection and the associated potential of language technologies have triggered increased interest in mass media and general public. Computational tools capable of alerting users to potentially deceptive content in computer-mediated messages are invaluable for supporting undisrupted, computermediated communication and information practices, credibility assessment and decision-making. The goal of this ongoing research is to inform creation of such automated capabilities. In this study we elicit a sample of 90 computer-mediated personal stories with varying levels of deception. Each story has 10 associated human deception level judgments, confidence scores, and explanations. In total, 990 unique respondents participated in the study. Three approaches are taken to the data analysis of the sample: human judges, linguistic detection cues, and machine learning. Comparable to previous research results, human judgments achieve 50-63 percent success rates, depending on what is considered deceptive. Actual deception levels negatively correlate with their confident judgment as being deceptive (r = -0.35, df = 88, ρ = 0.008). The highest-performing machine learning algorithms reach 65 percent accuracy. Linguistic cues are extracted, calculated, and modeled with logistic regression, but are found not to be significant predictors of deception level, confidence score, or an authors' ability to fool a reader. We address the associated challenges with error analysis. The respondents' stories and explanations are manually content-analyzed and result in a faceted deception classification (theme, centrality, realism, essence, self-distancing) and a stated perceived cue typology. Deception detection remains novel, challenging, and important in natural language processing, machine learning, and the broader library information science and technology community.
Telling lies often requires creating a story about an experience or attitude that does not exist. As a result, false stories may be qualitatively different from true stories. The current project investigated the features of linguistic style that distinguish between true and false stories. In an analysis of five independent samples, a computer-based text analysis program correctly classified liars and truth-tellers at a rate of 67% when the topic was constant and a rate of 61% overall. Compared to truth-tellers, liars showed lower cognitive complexity, used fewer self-references and other-references, and used more negative emotion words.
Researching accuracy in deception detection methods is fraught with complications. Probably the most difficult task is designing a research protocol that involves subjects who are real Liars or Truth-tellers. Especially in emulating highstakes situations, it is not realistic to use naïve college students who simulate lies as subjects. Real high-stakes Liars share few characteristics with volunteer college students in a Psychology section. Real high-stakes Liars are savvy and trained terrorists, spies, and criminals. A working deception detection method would be able to identify high-stakes Liars before they cause great damage-the 9/11 hijackers, Harold James Nicholson, or Bernie Madoff, are prime examples. There is little value in identifying college students simulating lies. The demand for a real working deception detection method is massive. To this end, security agencies, private and government, have spent billions of dollars on deception detection efforts in just the last decade. A second essential requirement for testing high-stakes deception detection techniques is the actual need for real-time results. Most real-world deception detection is attempted in brief person-to-person interactions-not in a video room with unlimited time to play back footage for minute examination. A realistic deception detection research protocol should provide subjects who can interact with the method being tested in real-time. This paper lays out a research protocol that provides subjects who emulate real-life, real-time high stakes Liars or Truth-tellers, and outlines a head-to-head protocol to evaluate competing methods, using the same Liars and Truthtellers, on a level playing field. Using this research protocol for a side-by-side comparison of deception detection methods could provide solid and reliable evidence and data about the usefulness and accuracy of competing approaches. This evidence and data could drive research and applications towards more useful deception detection solutions.
1994
In recent years, the need for enhanced methods of credibility assessment in criminal cases has become illuminated. Especially in cases of sexual assault, the words of the accused and complainant are often the sole evidence available to police. Consequently, researchers and practitioners have been searching for ways of differentiating truthful and deceptive accounts, focussing mainly on witnesses and victims.
… of the eighth International Conference on …, 2012
In criminal proceedings, sometimes it is not easy to evaluate the sincerity of oral testimonies. DECOUR -DEception in COURt corpushas been built with the aim of training models suitable to discriminate, from a stylometric point of view, between sincere and deceptive statements. DECOUR is a collection of hearings held in four Italian Courts, in which the speakers lie in front of the judge. These hearings become the object of a specific criminal proceeding for calumny or false testimony, in which the deceptiveness of the statements of the defendant is ascertained. Thanks to the final Court judgment, that points out which lies are told, each utterance of the corpus has been annotated as true, uncertain or false, according to its degree of truthfulness. Since the judgment of deceptiveness follows a judicial inquiry, the annotation has been realized with a greater degree of confidence than ever before. Moreover, in Italy this is the first corpus of deceptive texts not relying on 'mock' lies created in laboratory conditions, but which has been collected in a natural environment.
Frontiers in Psychiatry
2016
We present an approach to creating corpora for use in detecting deception in text, including a discussion of the challenges peculiar to this task. Our approach is based on soliciting several types of reviews from writers and was implemented using Amazon Mechanical Turk. We describe the multi-dimensional corpus of reviews built using this approach, available free of charge from LDC as the Boulder Lies and Truth Corpus (BLT-C). Challenges for both corpus creation and the deception detection include the fact that human performance on the task is typically at chance, that the signal is faint, that paid writers such as turkers are sometimes deceptive, and that deception is a complex human behavior; manifestations of deception depend on details of domain, intrinsic properties of the deceiver (such as education, linguistic competence, and the nature of the intention), and specifics of the deceptive act (e.g., lying vs. fabricating.) To overcome the inherent lack of ground truth, we have de...
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.