Papers by Eileen Fitzpatrick
Synthesis lectures on human language technologies, 2008
Synthesis Lectures on Human Language Technologies
Lu par Farah BENAMARA ZITOUNE Université Paul Sabatier / IRIT Cet ouvrage propose une synthèse de... more Lu par Farah BENAMARA ZITOUNE Université Paul Sabatier / IRIT Cet ouvrage propose une synthèse des principaux travaux menés ces dix dernières années en traitement automatique des langues pour la détection de l'imposture ou tromperie à partir de textes. Après une description détaillée des principales théories développées en psychologie pour analyser les comportements des individus en situation d'imposture, l'ouvrage se concentre sur les tromperies verbales en se focalisant sur les corpus d'étude développés et les méthodes computationnelles employées. L'ouvrage se termine par un bilan qui dresse un panorama des méthodes actuelles et qui propose un ensemble de perspectives pour de futurs développements.
Automatic Detection of Verbal Deception, 2008

Corpus-linguistic applications, 2010
ABSTRACT Experimental laboratory results, often performed with college student subjects, have pro... more ABSTRACT Experimental laboratory results, often performed with college student subjects, have proposed several linguistic phenomena as indicative of speaker deception. We have identified a subset of these phenomena that can be formalized as a linguistic model. The model incorporates three classes of language-based deception cues: (1) linguistic devices used to avoid making a direct statement of fact, for example, hedges; (2) preference for negative expressions in word choice, syntactic structure, and semantics; (3) inconsistencies with respect to verb and noun forms, for example, verb tense changes. The question our research addresses is whether the cues we have adapted from laboratory studies will recognize deception in real-world statements by suspects and witnesses. The issue addressed here is how to test the accuracy of these linguistic cues with respect to identifying deception. To perform the test, we assembled a corpus of criminal statements, police interrogations, and civil testimony that we annotated in two distinct ways, first for language-based deception cues and second for verification of the claims made in the narrative data. The paper discusses the possible methods for building a corpus to test the deception cue hypothesis, the linguistic phenomena associated with deception, and the issues involved in assembling a forensic corpus.

Applied Corpus Linguistics, 2004
The Montclair Electronic Language Database (MELD) is an expanding collection of essays written by... more The Montclair Electronic Language Database (MELD) is an expanding collection of essays written by students of English as a second language. This paper describes the content and structure of the database and gives examples of database applications. The essays in MELD consist of the timed and untimed writing of undergraduate ESL students, dated so that progress can be tracked over time. Demographic data is also collected for each student, including age, sex, L1 background, and prior experience with English. The essays are continuously being tagged for errors in grammar and academic writing as determined by a group of annotators. The database currently consists of 44,477 words of tagged text and another 53,826 words of text ready to be tagged. The database allows various analyses of student writing, from assessment of progress over time to relation of error type and L1 background.
IEEE Global Telecommunications Conference GLOBECOM '91: Countdown to the New Millennium. Conference Record
ABSTRACT A text-to-speech parser is described that applies to texts generated by users of telecom... more ABSTRACT A text-to-speech parser is described that applies to texts generated by users of telecommunications devices for the deaf (TDD). The parser's main tasks are to perform lexical regularization of abbreviations and some nonstandard forms in TDD texts and to identify prosodic phrase boundaries for the synthesized speech. Rules for prosodic phrasing are based on the description of discourse neutral phrasing presented in J. Bachenko and E. Fitzpatrick (see Comput. Linguist., vol.16, no.3, p.155-70, 1990). They use a mix of syntactic and phonological factors to identify prosodic phrase boundaries, but, unlike the original rules, build no hierarchical structure. As a component of the text-to-speech system, the parser was used for three months in a successful field trial

Recently, the idea of "domain tuning" or customizing lexicons to improve results in machine trans... more Recently, the idea of "domain tuning" or customizing lexicons to improve results in machine translation and summarization tasks has driven the need for better testing and training corpora. Traditional methods of automated document identification rely on wordbased methods to find the genre, domain, or authorship of a document. However, the ability to select good training corpora, especially when it comes to machine translation systems, requires automated document selection methods that do not rely on the traditional lexically-based techniques. Because syntactic structures and syntactic feature densities can heavily affect machine translation quality, syntactic feature-based methods of document selection should be used in choosing training and testing corpora. This paper provides evidence that document genres can be distinguished on the basis of syntactic-tag densities alone, supporting the idea that automated document identification is possible using alternative methods. Such methods would be ideal for creating syntactically as well as lexically balanced corpora for both genre and subject matter.
Computing and Information Technologies, 2001
... We are doing this by expanding a corpus of error-annotated written English that we have built... more ... We are doing this by expanding a corpus of error-annotated written English that we have built as a feasibility study [2]. The goal is to make the resulting corpus publicly available for applications in second language pedagogy, research in second language acquisition, and the ...
Literary and Linguistic Computing, 2007
This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, de... more This paper describes an ongoing project in which we are collecting a learner corpus of Arabic, developing a tagset for error annotation and performing Computer-aided Error Analysis (CEA) on the data. We adapted the French Interlanguage Database FRIDA tagset (Granger, 2003a) to the data. We chose FRIDA in order to follow a known standard and to see whether the changes needed to move from a French to an Arabic tagset would give us a measure of the distance between the two languages with respect to learner difficulty. The current collection of texts, which is constantly growing, contains intermediate and advanced-level student writings. We describe the need for such corpora, the learner data we have collected and the tagset we have developed. We also describe the error frequency distribution of both proficiency levels and the ongoing work.

Thc N T T T Linguistic String 1>arscr (1,SI') is a ~vorldng system for the syntactic analysis of ... more Thc N T T T Linguistic String 1>arscr (1,SI') is a ~vorldng system for the syntactic analysis of Ehglish scicntifie tests. It consists of a parsing program, a large-coverage b~glish grammar, and a Icsicon. Thc gramnlarls effcctivcncss in parsing texts is due in large part to a substantial Imdy of detail cvl \vc.Il-formedness rcst rictions which eliminate most incorrect syntactic parses which would be allo~vecl by a weaker grammar. The restrictions mainly test for compatible combinations of word subclasses. This paper dcfines the 109 adjective, noun and verb suhclasscs. These subolasses, as \'ell as others not prcscnted herc, are defined Ln such a way that they can he used as a guide for dassifymg new entries to the LSP lexicon and as a lingpistic reference twl. Fach definition lncludcs a statement of the intent of tllc subclass, a diawostic frame, sentence examples and a worcl list draun from tho present dictionary. The subclasses are defined tro reflect precisely the grammatical propertics tested for by the restrictiol~s of the grammar Where necessary for clariking the intent of the subclass, three additiollal criteria are employed: excision, implicit and corcfcrcncc, and paraphrase. The subclasscs have been defined so as to be consistent with a subsecluent stage of transformational analysis rilzicl~ is currently being imp1 ementd. An illustration of the trcatmcnt of a subclass is: AASP: an aclicctive is in AASP i f it occurs only with the non-sentehtial hon-Sh7 right adjunct to V OBJ (SN an emlxdded, or contained, sentence) (DSNG , 7) : John is able to walk. John is able for Bill t o walk. $John i s able that Bill walks. 2 John i s able whether Bill ~vall;s.

Proceedings of the 24th annual meeting on Association for Computational Linguistics -, 1986
While various aspects of syntactic structure have been shown to bear on the determination of phra... more While various aspects of syntactic structure have been shown to bear on the determination of phraselevel prosody, the text-to-speech field has lacked a robust working system to test the possible relations between syntax and prosody. We describe an implemented system which uses the deterministic parser Fidditch to create the input for a set of prosody rules. The prosody rules generate a prosody tree that specifies the location and relative strength of prosodic phrase boundaries. These specifications are converted to annotations for the Bell Labs text-to-speech system that dictate modulations in pitch and duration for the input sentence. We discuss the results of an experiment to determine the performance of our system. We are encouraged by an initial 5 percent error rate and we see the design of the parser and the modularity of the system allowing changes that will upgrade this rate.

Proceedings of the third conference on Applied natural language processing -, 1992
In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, de... more In this paper, we concern ourselves with an application of text-to-speech for speech-impaired, deaf, and hard of hearing people. The application is unusual because it requires real-time synthesis of unedited, spontaneously generated conversational texts transmitted via a Telecommunications Device for the Deaf (TDD). We describe a parser that we have implemented as a front end for a version of the Bell Laboratories text-to-speech synthesizer (Olive and Liberman 1985). The parser prepares TDD texts for synthesis by (a) performing lexical regularization of abbreviations and some non-standard forms, and (b) identifying prosodic phrase boundaries. Rules for identifying phrase boundaries are derived from the prosodic phrase grammar described in Bachenko and Fitzpatrick (1990). Following the parent analysis, these rules use a mix of syntactic and phonological factors to identify phrase boundaries but, unlike the parent system, they forgo building any hierarchical structure in order to bypass the need for a stacking mechanism; this permits the system to operate in near real time. As a component of the text-to-speech system, the parser has undergone rigorous testing during a successful three-month field trial at an AT&T telecommunications center in California. In addition, laboratory evaluations indicate that the parser's performance compares favorably with human judgments about phrasing.
[1989] Proceedings. The Annual AI Systems in Government Conference
We describe an experimental text-to-speech system that uses a syntactic parser and prosody rules ... more We describe an experimental text-to-speech system that uses a syntactic parser and prosody rules to determine prosodic phrasing for synthesized speech. Our results indicate that many aspects of sentence analysis that are required for other parsing applications, e.g. machine translation and question answering, become unnecessary in parsing for text-to-speech. It is possible to generate natural-sounding prosodic phrasing by relying on information about syntactic category type, partial constituency, and length; information about clausal and verb phrase constituency, predicateargument relations, and prepositional phrase attachment can be bypassed.
Comput. Linguistics, 1990
We describe an experimental text-to-speech system that uses information about syntactic constitue... more We describe an experimental text-to-speech system that uses information about syntactic constituency, adjacency to a verb, and constituent length to determine prosodic phrasing for synthetic speech. A central goal of our work has been to characterize "discourse neutral" phrasing, i.e. sentence-level phrasing patterns that are independent of discourse semantics. Our account builds on Bachenko et al. (1986), but differs in its treatment of clausal structure and predicate-argument relations. Results so far indicate that the current system performs well when measured against a corpus of judgments of prosodic phrasing.
Research in high stakes deception has been held back by the sparsity of ground truth verification... more Research in high stakes deception has been held back by the sparsity of ground truth verification for data collected from real world sources. We describe a set of guidelines for acquiring and developing corpora that will enable researchers to build and test models of deceptive narrative while avoiding the problem of sanctioned lying that is typically required in a controlled experiment. Our proposals are drawn from our experience in obtaining data from court cases and other testimony, and uncovering the background information that enabled us to annotate claims made in the narratives as true or false.
Proceedings of the Workshop on Computational Approaches to Deception Detection, Apr 23, 2012
... Friedman, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura Michaelis, B... more ... Friedman, Sarah Gilman, Cynthia Girand, Martin Graciarena, Andreas Kathol, Laura Michaelis, Bryan L. Pellom, Elizabeth Shriberg, Andreas Stolcke. ... Livermore Chevron Station Appendix B. Distribution of T and F Propositions in Collection Case Words Trues Falses Johnston ...
Uploads
Papers by Eileen Fitzpatrick