Skip to main content

delphine battistelli

Université Paris Ouest Nanterre La Défense, Sciences du langage, Faculty Member

Followers

24

Following

8

Co-authors

2

Public Views

Noel B. Salazar

KU Leuven

Albert Bastardas-Boada

Universitat de Barcelona

Louis de Saussure

University of Neuchâtel

University of Alberta

Armando Marques-Guedes

UNL - New University of Lisbon

The Hebrew University of Jerusalem

Macquarie University

The University of Newcastle

University of Genova

University of San Francisco

Interests

Uploads

Papers by delphine battistelli

Modalités d’Action et Inférences

BRILL eBooks, 2002

Prédiction de recommandations d'âge pour l'accès à des enfants à des textes

HAL (Le Centre pour la Communication Scientifique Directe), Aug 15, 2019

La compréhension d'un texte par un individu est conditionnée par l'adéquation des caractéristique... more La compréhension d'un texte par un individu est conditionnée par l'adéquation des caractéristiques de ce texte par rapport aux capacités et aux connaissances de l'individu. Dans le cas d'un enfant, il est donc intéressant de déterminer en quoi son âge inue sur sa compréhension d'un texte. Des travaux psycholinguistiques ont étudié ce problème de près an d'établir dans quelles mesures un texte serait, ou non, destiné à un enfant. En parallèle à cela, les avancées en traitement automatique des langues orent de nouvelles possibilités pour étudier les informations issues de textes. Ce rapport présente donc une manière d'utiliser ces techniques pour déterminer une recommandation d'âge pour un texte destiné à des enfants.

TREMoLo-Tweets corpus : guide d'annotation pour un corpus annoté en registres de langue pour le français

HAL (Le Centre pour la Communication Scientifique Directe), Sep 16, 2021

Mise en commun du sujet pour plusieurs verbes successifs Non inversion sujet/verbe dans une phras... more

Joint building of a corpus and a classifier for language registers in French

HAL (Le Centre pour la Communication Scientifique Directe), May 15, 2018

Les registres de langue sont un trait stylistique marquant dans l'appréciation d'un texte ou d'un... more Les registres de langue sont un trait stylistique marquant dans l'appréciation d'un texte ou d'un discours. Cependant, il sont encore peu étudiés en traitement automatique des langues. Dans cet article, nous présentons une approche semi-supervisée permettant la construction conjointe d'un corpus de textes étiquetés en registres et d'un classifieur associé. Cette approche s'appuie sur un ensemble initial et restreint de données expertes. Via une collecte automatique et massive de pages web, l'approche procède par itérations en alternant l'apprentissage d'un classifieur intermédiaire et l'annotation de nouveaux textes pour augmenter le corpus étiqueté. Nous appliquons cette approche aux registres familier, courant et soutenu. À l'issue du processus de construction, le corpus étiqueté regroupe 800 000 textes et le classifieur, un réseau de neurones, présente un taux de bonne classification de 87 %.

L’émotion à un niveau textuel : la fonction structurante des émotions observée à partir d’annotations

Discours. Revue de linguistique, psycholinguistique et informatique.A journal of linguistics, psycholinguistics and computational linguistics, Sep 9, 2022

Une description du projet est accessible ici : https://texttokids.irisa.fr.

Detección de destacados eventos en un corpus grande combinando técnicas para PLN y minería de datos

Computación Y Sistemas, Jun 1, 2013

In this paper, we present a framework and a system that extracts "salient" events relevant to a q... more In this paper, we present a framework and a system that extracts "salient" events relevant to a query from a large collection of documents, and which also enables events to be placed along a timeline. Each event is represented by a sentence extracted from the collection. We have conducted some experiments showing the interest of the method for this issue. Our method is based on a combination of linguistic modeling (concerning temporal adverbial meanings), symbolic natural language processing techniques (using cascades of morpho-lexical transducers) and data mining techniques (namely, sequential pattern mining under constraints). The system was applied to a corpus of newswires in French provided by the Agence France Presse (AFP). Evaluation was performed in partnership with French newswire agency journalists.

Analyzing modal and enunciative discursive heterogeneity: how to combine semantic resources and a syntactic parser analysis

This paper introduces our methodology for annotating variations in enunciative and modal commitme... more This paper introduces our methodology for annotating variations in enunciative and modal commitment in a text. We first present the theoretical background of the study which puts the emphasis on the close interaction between time, aspect, modality and evidentiality (TAME) categories (and also markers). We then present our semantic resources which encompass not only lexical items, but also morphological inflections and syntactic constructions. We finally describe the first step of our global natural language processing (NLP) workflow which uses a syntactic analysis parser.

Vers un outil de visualisation de la dynamique textuelle : l'exemple des phénomènes citationnels et modaux

HAL (Le Centre pour la Communication Scientifique Directe), Apr 6, 2008

Référentiels et ordonnancements temporels dans les textes

BRILL eBooks, 2007

Detecting salient events in large corpora by a combination of NLP and data mining techniques (poster)

HAL (Le Centre pour la Communication Scientifique Directe), Mar 24, 2013

Prise en charge et phénomènes de portée : retour d’expériences dans un corpus de dépêches de presse

SHS web of conferences, 2014

Arabi a exprimé indice1 [le souhait indice2 [d'aider la Syrie à surmonter cette phase] portée2 ] ... more Arabi a exprimé indice1 [le souhait indice2 [d'aider la Syrie à surmonter cette phase] portée2 ] portée1 2. Paul veut indice1 sûrement indice2 que [Mary vienne.] portée Après une brève présentation de l'ancrage théorique qui est le nôtre ainsi que de nos principes méthodologiques, nous présentons notre système d'annotation automatique permettant d'identifier l'organisation du texte en segments textuels selon leurs caractéristiques énonciatives et modales. Puis dans un troisième temps, nous présenterons deux approches d'évaluation permettant un retour réflexif sur le développement du système d'annotation. Enfin, nous exposerons comment les annotations produites

Information Retrieval: Ranking Results According to Calendar Criteria

Springer eBooks, 2012

Our work deals with calendar information as it is expressed in natural language (NL), that is to ... more Our work deals with calendar information as it is expressed in natural language (NL), that is to say through textual units such as prepositional phrases or noun phrases (e.g. in the 90s, at the beginning of the XVth century). We call these textual units Calendar Expressions (CE). Our work aims at showing how Information Retrieval systems can benefit from dealing with CE. In this paper we describe our overall approach which consists in a formal analysis of CEs that leads to a semantic representation. We then detail an algorithm that uses this representation to filter and rank CEs embedded in texts, according to a query containing a CE. The algorithm is integrated in an experimental search engine (called CaSE). Our representation of calendar information as it is expressed in NL and the function which computes the proximity between the two CEs, one in the text and the other in the query, provides a mean to process a query without any overlapping.

Semantics of Calendar Adverbials for Information Retrieval

Lecture Notes in Computer Science, 2011

Unlike most approaches in the field of temporal expressions annotation, we consider that temporal... more Unlike most approaches in the field of temporal expressions annotation, we consider that temporal adverbials could be relevant units from the point of view of Information Retrieval. We present here the main principles of our semantic modeling approach to temporal adverbial units. It comprises two steps: functional modeling (using a small number of basic operators) and referential modeling (using calendar

Representing and visualizing calendar expressions in texts

Temporal expressions that refer to a part of a calendar area in terms of common calendar division... more Temporal expressions that refer to a part of a calendar area in terms of common calendar divisions are studied. Our claim is that such a "calendar expression" (CE) can be described by a succession of operators operating on a calendar base (CB). These operators are categorized: a pointing operator that transform a CB into a CE; a focalizing/shifting operator that reduces or shifts the CE into another CE, and finally a zoning operator that provides the wanted CE from this last CE. Relying on these operators, a set of annotations is presented which are used to automatically annotate biographic texts. A software application, plugged in the platform Navitext, is described that builds a calendar view of a biographic text.

Une chaîne de traitements pour prédire et appréhender la complexité des textes pour enfants d'un point de vue linguistique et psycho-linguistique

HAL (Le Centre pour la Communication Scientifique Directe), Jun 27, 2022

Nos travaux abordent la question de la mesure de la complexité d'un texte vis-à-vis d'une cible d... more Nos travaux abordent la question de la mesure de la complexité d'un texte vis-à-vis d'une cible de lecteurs, les enfants en âge de lire, au travers de la mise en place d'une chaîne de traitements. Cette chaîne vise à extraire des descripteurs linguistiques, principalement issus de travaux en psycholinguistique et de travaux sur la lisibilité, mobilisables pour appréhender la complexité d'un texte. En l'appliquant sur un corpus de textes de fiction, elle permet d'étudier des corrélations entre certains descripteurs linguistiques et les tranches d'âges associées aux textes par les éditeurs. L'analyse de ces corrélations tend à valider la pertinence de la catégorisation en âges par les éditeurs. Elle justifie ainsi la mobilisation d'un tel corpus pour entraîner à partir des âges éditeurs un modèle de prédiction de l'âge cible d'un texte.

Enunciative and modal variations in newswire texts in French: From guideline to automatic annotation

Linguistic Annotation Workshop, Aug 7, 2013

In this paper we present the development of a corpus of French newswire texts annotated with enun... more In this paper we present the development of a corpus of French newswire texts annotated with enunciative and modal commitment information. The annotation scheme we propose is based on the detection of predicative cuesreferring to an enunciative and/or modal variation-and their scope at a sentence level. We describe how we have improved our annotation guideline by using the evaluation (in terms of precision, recall and F-Measure) of a first round of annotation produced by two expert annotators and by our automatic annotation system. 1. M. Arabi a exprimé cue1 [le souhait cue2 [d'aider la Syrie à surmonter cette phase] scope2 ] scope1 ] // [Mr. Arabi expressed cue1 [a desire cue2 [to help Syria overcome this phase.] scope2 ] scope1

Représentation algébrique des expressions calendaires et vue calendaire d'un texte

Cet article aborde l'étude des expressions temporelles qui font référence directement à des unité... more Cet article aborde l'étude des expressions temporelles qui font référence directement à des unités de temps relatives aux divisions courantes des calendriers, que nous qualifions d'expressions calendaires (EC). Nous proposons une modélisation de ces expressions en définissant une algèbre d'opérateurs qui sont liés aux classes de marqueurs linguistiques qui apparaissent dans les EC. A partir de notre modélisation, une vue calendaire est construite dans la plate-forme de visualisation et navigation textuelle NaviTexte, visant le support à la lecture de textes. Enfin, nous concluons sur les perspectives offertes par le développement d'une première application de navigation temporelle.

Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text

Lecture Notes in Computer Science, 2016

In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. ... more In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore, existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.

TREMoLo corpus : guide d'annotation pour un corpus annoté en registres de langue pour le français

HAL (Le Centre pour la Communication Scientifique Directe), May 5, 2021

Mise en commun du sujet pour plusieurs verbes successifs Non inversion sujet/verbe dans une phras... more

Angry or Sad ? Emotion Annotation for Extremist Content Characterization

HAL (Le Centre pour la Communication Scientifique Directe), Jun 20, 2022

This paper examines the role of emotion annotations to characterize extremist content released on... more This paper examines the role of emotion annotations to characterize extremist content released on social platforms. The analysis of extremist content is important to identify user emotions towards some extremist ideas and to highlight the root cause of where emotions and extremist attitudes merge together. To address these issues our methodology combines knowledge from sociological and linguistic annotations to explore French extremist content collected online. For emotion linguistic analysis, the solution presented in this paper relies on a complex linguistic annotation scheme. The scheme was used to annotate extremist text corpora in French. Data sets were collected online by following semi-automatic procedures for content selection and validation. The paper describes the integrated annotation scheme, the annotation protocol that was setup for French corpora annotation and the results, e.g. agreement measures and remarks on annotation disagreements. The aim of this work is twofold: first, to provide a characterization of extremist contents; second, to validate the annotation scheme and to test its capacity to capture and describe various aspects of emotions.

Modalités d’Action et Inférences

BRILL eBooks, 2002

Prédiction de recommandations d'âge pour l'accès à des enfants à des textes

HAL (Le Centre pour la Communication Scientifique Directe), Aug 15, 2019

La compréhension d'un texte par un individu est conditionnée par l'adéquation des caractéristique... more La compréhension d'un texte par un individu est conditionnée par l'adéquation des caractéristiques de ce texte par rapport aux capacités et aux connaissances de l'individu. Dans le cas d'un enfant, il est donc intéressant de déterminer en quoi son âge inue sur sa compréhension d'un texte. Des travaux psycholinguistiques ont étudié ce problème de près an d'établir dans quelles mesures un texte serait, ou non, destiné à un enfant. En parallèle à cela, les avancées en traitement automatique des langues orent de nouvelles possibilités pour étudier les informations issues de textes. Ce rapport présente donc une manière d'utiliser ces techniques pour déterminer une recommandation d'âge pour un texte destiné à des enfants.

TREMoLo-Tweets corpus : guide d'annotation pour un corpus annoté en registres de langue pour le français

HAL (Le Centre pour la Communication Scientifique Directe), Sep 16, 2021

Mise en commun du sujet pour plusieurs verbes successifs Non inversion sujet/verbe dans une phras... more

Joint building of a corpus and a classifier for language registers in French

HAL (Le Centre pour la Communication Scientifique Directe), May 15, 2018

Les registres de langue sont un trait stylistique marquant dans l'appréciation d'un texte ou d'un... more Les registres de langue sont un trait stylistique marquant dans l'appréciation d'un texte ou d'un discours. Cependant, il sont encore peu étudiés en traitement automatique des langues. Dans cet article, nous présentons une approche semi-supervisée permettant la construction conjointe d'un corpus de textes étiquetés en registres et d'un classifieur associé. Cette approche s'appuie sur un ensemble initial et restreint de données expertes. Via une collecte automatique et massive de pages web, l'approche procède par itérations en alternant l'apprentissage d'un classifieur intermédiaire et l'annotation de nouveaux textes pour augmenter le corpus étiqueté. Nous appliquons cette approche aux registres familier, courant et soutenu. À l'issue du processus de construction, le corpus étiqueté regroupe 800 000 textes et le classifieur, un réseau de neurones, présente un taux de bonne classification de 87 %.

L’émotion à un niveau textuel : la fonction structurante des émotions observée à partir d’annotations

Discours. Revue de linguistique, psycholinguistique et informatique.A journal of linguistics, psycholinguistics and computational linguistics, Sep 9, 2022

Une description du projet est accessible ici : https://texttokids.irisa.fr.

Detección de destacados eventos en un corpus grande combinando técnicas para PLN y minería de datos

Computación Y Sistemas, Jun 1, 2013

In this paper, we present a framework and a system that extracts "salient" events relevant to a q... more In this paper, we present a framework and a system that extracts "salient" events relevant to a query from a large collection of documents, and which also enables events to be placed along a timeline. Each event is represented by a sentence extracted from the collection. We have conducted some experiments showing the interest of the method for this issue. Our method is based on a combination of linguistic modeling (concerning temporal adverbial meanings), symbolic natural language processing techniques (using cascades of morpho-lexical transducers) and data mining techniques (namely, sequential pattern mining under constraints). The system was applied to a corpus of newswires in French provided by the Agence France Presse (AFP). Evaluation was performed in partnership with French newswire agency journalists.

Analyzing modal and enunciative discursive heterogeneity: how to combine semantic resources and a syntactic parser analysis

This paper introduces our methodology for annotating variations in enunciative and modal commitme... more This paper introduces our methodology for annotating variations in enunciative and modal commitment in a text. We first present the theoretical background of the study which puts the emphasis on the close interaction between time, aspect, modality and evidentiality (TAME) categories (and also markers). We then present our semantic resources which encompass not only lexical items, but also morphological inflections and syntactic constructions. We finally describe the first step of our global natural language processing (NLP) workflow which uses a syntactic analysis parser.

Vers un outil de visualisation de la dynamique textuelle : l'exemple des phénomènes citationnels et modaux

HAL (Le Centre pour la Communication Scientifique Directe), Apr 6, 2008

Référentiels et ordonnancements temporels dans les textes

BRILL eBooks, 2007

Detecting salient events in large corpora by a combination of NLP and data mining techniques (poster)

HAL (Le Centre pour la Communication Scientifique Directe), Mar 24, 2013

Prise en charge et phénomènes de portée : retour d’expériences dans un corpus de dépêches de presse

SHS web of conferences, 2014

Arabi a exprimé indice1 [le souhait indice2 [d'aider la Syrie à surmonter cette phase] portée2 ] ... more Arabi a exprimé indice1 [le souhait indice2 [d'aider la Syrie à surmonter cette phase] portée2 ] portée1 2. Paul veut indice1 sûrement indice2 que [Mary vienne.] portée Après une brève présentation de l'ancrage théorique qui est le nôtre ainsi que de nos principes méthodologiques, nous présentons notre système d'annotation automatique permettant d'identifier l'organisation du texte en segments textuels selon leurs caractéristiques énonciatives et modales. Puis dans un troisième temps, nous présenterons deux approches d'évaluation permettant un retour réflexif sur le développement du système d'annotation. Enfin, nous exposerons comment les annotations produites

Information Retrieval: Ranking Results According to Calendar Criteria

Springer eBooks, 2012

Our work deals with calendar information as it is expressed in natural language (NL), that is to ... more Our work deals with calendar information as it is expressed in natural language (NL), that is to say through textual units such as prepositional phrases or noun phrases (e.g. in the 90s, at the beginning of the XVth century). We call these textual units Calendar Expressions (CE). Our work aims at showing how Information Retrieval systems can benefit from dealing with CE. In this paper we describe our overall approach which consists in a formal analysis of CEs that leads to a semantic representation. We then detail an algorithm that uses this representation to filter and rank CEs embedded in texts, according to a query containing a CE. The algorithm is integrated in an experimental search engine (called CaSE). Our representation of calendar information as it is expressed in NL and the function which computes the proximity between the two CEs, one in the text and the other in the query, provides a mean to process a query without any overlapping.

Semantics of Calendar Adverbials for Information Retrieval

Lecture Notes in Computer Science, 2011

Unlike most approaches in the field of temporal expressions annotation, we consider that temporal... more Unlike most approaches in the field of temporal expressions annotation, we consider that temporal adverbials could be relevant units from the point of view of Information Retrieval. We present here the main principles of our semantic modeling approach to temporal adverbial units. It comprises two steps: functional modeling (using a small number of basic operators) and referential modeling (using calendar

Representing and visualizing calendar expressions in texts

Temporal expressions that refer to a part of a calendar area in terms of common calendar division... more Temporal expressions that refer to a part of a calendar area in terms of common calendar divisions are studied. Our claim is that such a "calendar expression" (CE) can be described by a succession of operators operating on a calendar base (CB). These operators are categorized: a pointing operator that transform a CB into a CE; a focalizing/shifting operator that reduces or shifts the CE into another CE, and finally a zoning operator that provides the wanted CE from this last CE. Relying on these operators, a set of annotations is presented which are used to automatically annotate biographic texts. A software application, plugged in the platform Navitext, is described that builds a calendar view of a biographic text.

Une chaîne de traitements pour prédire et appréhender la complexité des textes pour enfants d'un point de vue linguistique et psycho-linguistique

HAL (Le Centre pour la Communication Scientifique Directe), Jun 27, 2022

Nos travaux abordent la question de la mesure de la complexité d'un texte vis-à-vis d'une cible d... more Nos travaux abordent la question de la mesure de la complexité d'un texte vis-à-vis d'une cible de lecteurs, les enfants en âge de lire, au travers de la mise en place d'une chaîne de traitements. Cette chaîne vise à extraire des descripteurs linguistiques, principalement issus de travaux en psycholinguistique et de travaux sur la lisibilité, mobilisables pour appréhender la complexité d'un texte. En l'appliquant sur un corpus de textes de fiction, elle permet d'étudier des corrélations entre certains descripteurs linguistiques et les tranches d'âges associées aux textes par les éditeurs. L'analyse de ces corrélations tend à valider la pertinence de la catégorisation en âges par les éditeurs. Elle justifie ainsi la mobilisation d'un tel corpus pour entraîner à partir des âges éditeurs un modèle de prédiction de l'âge cible d'un texte.

Enunciative and modal variations in newswire texts in French: From guideline to automatic annotation

Linguistic Annotation Workshop, Aug 7, 2013

In this paper we present the development of a corpus of French newswire texts annotated with enun... more In this paper we present the development of a corpus of French newswire texts annotated with enunciative and modal commitment information. The annotation scheme we propose is based on the detection of predicative cuesreferring to an enunciative and/or modal variation-and their scope at a sentence level. We describe how we have improved our annotation guideline by using the evaluation (in terms of precision, recall and F-Measure) of a first round of annotation produced by two expert annotators and by our automatic annotation system. 1. M. Arabi a exprimé cue1 [le souhait cue2 [d'aider la Syrie à surmonter cette phase] scope2 ] scope1 ] // [Mr. Arabi expressed cue1 [a desire cue2 [to help Syria overcome this phase.] scope2 ] scope1

Représentation algébrique des expressions calendaires et vue calendaire d'un texte

Cet article aborde l'étude des expressions temporelles qui font référence directement à des unité... more Cet article aborde l'étude des expressions temporelles qui font référence directement à des unités de temps relatives aux divisions courantes des calendriers, que nous qualifions d'expressions calendaires (EC). Nous proposons une modélisation de ces expressions en définissant une algèbre d'opérateurs qui sont liés aux classes de marqueurs linguistiques qui apparaissent dans les EC. A partir de notre modélisation, une vue calendaire est construite dans la plate-forme de visualisation et navigation textuelle NaviTexte, visant le support à la lecture de textes. Enfin, nous concluons sur les perspectives offertes par le développement d'une première application de navigation temporelle.

Weakly-Supervised Symptom Recognition for Rare Diseases in Biomedical Text

Lecture Notes in Computer Science, 2016

In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. ... more In this paper, we tackle the issue of symptom recognition for rare diseases in biomedical texts. Symptoms typically have more complex and ambiguous structure than other biomedical named entities. Furthermore, existing resources are scarce and incomplete. Therefore, we propose a weakly-supervised framework based on a combination of two approaches: sequential pattern mining under constraints and sequence labeling. We use unannotated biomedical paper abstracts with dictionaries of rare diseases and symptoms to create our training data. Our experiments show that both approaches outperform simple projection of the dictionaries on text, and their combination is beneficial. We also introduce a novel pattern mining constraint based on semantic similarity between words inside patterns.

TREMoLo corpus : guide d'annotation pour un corpus annoté en registres de langue pour le français

HAL (Le Centre pour la Communication Scientifique Directe), May 5, 2021

Mise en commun du sujet pour plusieurs verbes successifs Non inversion sujet/verbe dans une phras... more

Angry or Sad ? Emotion Annotation for Extremist Content Characterization

HAL (Le Centre pour la Communication Scientifique Directe), Jun 20, 2022

This paper examines the role of emotion annotations to characterize extremist content released on... more This paper examines the role of emotion annotations to characterize extremist content released on social platforms. The analysis of extremist content is important to identify user emotions towards some extremist ideas and to highlight the root cause of where emotions and extremist attitudes merge together. To address these issues our methodology combines knowledge from sociological and linguistic annotations to explore French extremist content collected online. For emotion linguistic analysis, the solution presented in this paper relies on a complex linguistic annotation scheme. The scheme was used to annotate extremist text corpora in French. Data sets were collected online by following semi-automatic procedures for content selection and validation. The paper describes the integrated annotation scheme, the annotation protocol that was setup for French corpora annotation and the results, e.g. agreement measures and remarks on annotation disagreements. The aim of this work is twofold: first, to provide a characterization of extremist contents; second, to validate the annotation scheme and to test its capacity to capture and describe various aspects of emotions.