Lorraine Goeuriot

Dublin City University, School of Computing, Post-Doc

Followers

Following

Public Views

Haitham El-Ghareeb

Mansoura University

Mohd Afandi Md Amin

Queensland University of Technology

Arthur Tatnall

Victoria University

Noel Carroll

University of Galway

Mark W. Post

The University of Sydney

Carlo V Caballero-Uribe

Universidad del Norte - Barranquilla, Colombia

T. Florian Jaeger

University of Rochester

Marat M . Yavrumyan

University of Salzburg

Eitan Grossman

The Hebrew University of Jerusalem

Lisa Gualtieri

Tufts University

Interests

Uploads

Papers by Lorraine Goeuriot

Sentiment lexicons for health-related opinion mining

Proceedings of the 2nd …, Jan 1, 2012

Abstract Opinion mining consists in extracting from a text opinions expressed by its author and t... more

Découverte et caractérisation des corpus comparables

Découverte et caractérisation des corpus comparables spécialisés

Découverte et caractérisation des corpus comparables spécialisés THÈSE pour obtenir le grade de D... more

Download

Analyse de la modalité dans un corpus spécialisé multilingue

Reconnaissance du type de discours dans des corpus comparables spécialisés

Citeseer

Notre objectif est d'automatiser la construction de corpus comparables spécialisés à partir du We... more Notre objectif est d'automatiser la construction de corpus comparables spécialisés à partir du Web. La comparabilité se base sur trois niveaux : le domaine, le thème et le type de discours. Le domaine et le thème peuvent être filtrés grâce aux mots-clés utilisés lors de la recherche. Nous présentons dans cet article la reconnaissance automatique du type de discours dans des documents spécialisés français et japonais, qui nécessite une analyse linguistique poussée. Une analyse contrastive des documents nous permet de déterminer quelles informations paraissent discriminantes. En s'inspirant des travaux classiques de recherche d'information, nous créons une typologie robuste et linguistiquement motivée basée sur trois niveaux d'analyse : structurel, modal et lexical. Cette typologie nous permet d'apprendre des modèles de classification qui donnent de bons résultats, ce qui montre l'efficacité de cette typologie.

Download

Reconnaissance de critères de comparabilité dans un corpus multilingue spécialisé

Download

Caractérisation des discours scientifique et vulgarisé en français, japonais et russe

Poster at TALN, Jan 1, 2007

L'objectif principal de notre travail consiste à étudier la notion de comparabilité des corpus, e... more L'objectif principal de notre travail consiste à étudier la notion de comparabilité des corpus, et nous abordons cette question dans un contexte monolingue en cherchant à distinguer les documents scientifiques et vulgarisés. Nous travaillons séparément sur des corpus composés de documents du domaine médical dans trois langues à forte distance linguistique (le français, le japonais et le russe). Dans notre approche, les documents sont caractérisés dans chaque langue selon leur thématique et une typologie discursive qui se situe à trois niveaux de l'analyse des documents : structurel, modal et lexical. Le typage des documents est implémenté avec deux algorithmes d'apprentissage (SVMlight et C4.5). L'évaluation des résultats montre que la typologie discursive proposée est portable d'une langue à l'autre car elle permet en effet de distinguer les deux discours. Nous constatons néanmoins des performances très variées selon les langues, les algorithmes et les types de caractéristiques discursives.

Download

Compilation of specialized comparable corpus in French and Japanese

... HAL : hal-00411258, version 1. ... ACL-IJCNLP workshop Building and Using Comparable Corpora... more

Identifying criteria to automatically distinguish between scientific and popular science registers

Textual and informational characteristics of health-related social media content: A study of drug review forums

There is a proliferation of health-related social media sites where people post information about... more There is a proliferation of health-related social media sites where people post information about their diseases and treatments. These sites can be mined for information about users' experience with these diseases and treatments. This paper reports the results of an initial study of the informational content and linguistic characteristics of postings on drug review sites-with an emphasis on the opinions and sentiments expressed. This paper reports our initial analysis of the informational and linguistic characteristics of user postings on drug-review discussion forums. We investigate on knowledge they contain and information that can be extracted from them. We harvested postings from three websites carrying different kinds of user-generated reviews. We analyzed the corpus to identify the most-reviewed drugs, the vocabulary used, focusing on opinion words, and textual characteristics such as length of postings, sentence length and proportion of the various parts-of-speech. We performed semantic tagging with concepts from the UMLS metathesaurus and analyzed the distribution of medical concepts in the corpus. Our results indicate that the corpus covers a large variety of drugs. Drugs related to depression, anxiety, weight loss, and pain relief are most frequently reviewed. Although the linguistic quality of the text is lower than in scientific writing, the medical content is very rich. Opinion mining can be performed on the corpus since it contains many opinion terms.

Download

Textual and Informational Characteristics of Drug-Related Content on Three Kinds of Websites: Drug Review Website, Discussion Board and Hospital Information …

… of Organizational and …, Jan 1, 2011

Compilation of specialized comparable corpora in French and Japanese

Proceedings of the 2nd Workshop on …, Jan 1, 2009

Download

Characterization of scientific and popular science discourse in French, Japanese and Russian

Proceedings of LREC, Jan 1, 2008

We aim to characterize the comparability of corpora, we address this issue in the trilingual cont... more We aim to characterize the comparability of corpora, we address this issue in the trilingual context through the distinction of expert and non expert documents. We work separately with corpora composed of documents from the medical domain in three languages (French, Japanese and Russian) which present an important linguistic distance between them. In our approach, documents are characterized in each language by their topic and by a discursive typology positioned at three levels of document analysis: structural, modal and lexical. The document typology is implemented with two learning algorithms (SVMlight and C4.5). Evaluation of results shows that the proposed discursive typology can be transposed from one language to another, as it indeed allows to distinguish the two aimed discourses (science and popular science). However, we observe that performances vary a lot according to languages, algorithms and types of discursive characteristics.

Download

Multilingual modalities for specialized languages

Terminology, Jan 1, 2010

With the growth of textual data, techniques are necessary for their selection and organization. T... more

Sentiment lexicons for health-related opinion mining

Proceedings of the 2nd …, Jan 1, 2012

Abstract Opinion mining consists in extracting from a text opinions expressed by its author and t... more

Découverte et caractérisation des corpus comparables

Découverte et caractérisation des corpus comparables spécialisés

Découverte et caractérisation des corpus comparables spécialisés THÈSE pour obtenir le grade de D... more

Download

Analyse de la modalité dans un corpus spécialisé multilingue