Papers by Georgiana Marsic

The ability to capture the temporal dimension of a natural language text is essential to many nat... more The ability to capture the temporal dimension of a natural language text is essential to many natural language processing applications, such as Question Answering, Automatic Summarisation, and Information Retrieval. Temporal processing is a field of Computational Linguistics which aims to access this dimension and derive a precise temporal representation of a natural language text by extracting time expressions, events and temporal relations, and then representing them according to a chosen knowledge framework. This thesis focuses on the investigation and understanding of the different ways time is expressed in natural language, on the implementation of a temporal processing system in accordance with the results of this investigation, on the evaluation of the system, and on the extensive analysis of the errors and challenges that appear during system development. The ultimate goal of this research is to develop the ability to automatically annotate temporal expressions, verbal events and temporal relations in a natural language text. Temporal expression annotation involves two stages: temporal expression identification concerned with determining the textual extent of a temporal expression, and temporal expression normalisation which finds the value that the temporal expression designates and represents it using an annotation standard. The research presented in this thesis approaches these tasks with a knowledge-based methodology that tackles temporal expressions according to their semantic classification. Several knowledge sources and normalisation models are experimented with to allow an analysis of their impact on system performance. The annotation of events expressed using either finite or non-finite verbs is addressed with a method that overcomes the drawback of existing methods v which associate an event with the class that is most frequently assigned to it in a corpus and are limited in coverage by the small number of events present in the corpus. This limitation is overcome in this research by annotating each WordNet verb with an event class that best characterises that verb. This thesis also describes an original methodology for the identification of temporal relations that hold among events and temporal expressions. The method relies on sentence-level syntactic trees and a propagation of temporal relations between syntactic constituents, by analysing syntactic and lexical properties of the constituents and of the relations between them. The detailed evaluation and error analysis of the methods proposed for solving different temporal processing tasks form an important part of this research. Various corpora widely used by researchers studying different temporal phenomena are employed in the evaluation, thus enabling comparison with state of the art in the field. The detailed error analysis targeting each temporal processing task helps identify not only problems of the implemented methods, but also reliability problems of the annotated resources, and encourages potential reexaminations of some temporal processing tasks. vi The completion of my doctoral studies has been the most significant academic challenge I was ever confronted with. It has been a long journey whose course was sometimes deterred by life getting in the way, but to whose successful completion many people have contributed directly or indirectly, and I would like to take this opportunity to thank them. First of all, I would like to thank my supervisory team, Ruslan Mitkov, John Prager and Constantin Orȃsan, for their trust, encouragement, patience and guidance. I am thankful to Ruslan Mitkov, my director of studies, for making this thesis possible by providing the necessary infrastructure and resources to accomplish my research work, and for acting as my supervisor despite his many other academic and professional commitments. I would like to thank John Prager for finding the time to read my thesis and to provide insightful and creative comments. I am extremely indebted to Constantin Orȃsan, my supervisor, colleague and friend, for always showing a sincere interest in my work, for his constructive criticism, for the extensive discussions concerning my work, and for all the help he has given me throughout my years in Wolverhampton. I would like to express my special gratitude and appreciation to my former research advisor Dan Cristea who introduced me to the world of Natural Language Processing. I still think fondly of my time as a postgraduate student that I have spent working with him. I am privileged for having had Verginica Barbu Mititelu, Iustin Dornescu,
Barbecued Opakapaka: Using Semantic Preferences for Ontology Population
Recent Advances in Natural Language Processing, Sep 1, 2015

The annotation of temporal relations remains a challenge, being a very difficult task for humans,... more The annotation of temporal relations remains a challenge, being a very difficult task for humans, not to mention machines, to reliably and consistently annotate temporal relations in natural language texts. This paper advocates a change in the definition of the problem itself, by proposing a staged divide-and-conquer approach guided by syntax, that offers a more principled way of selecting temporal entities involved in a temporal relation. The decomposition of the problem into smaller syntactically motivated tasks, and the identification of accurate and linguistically grounded solutions to solve them, promote a sound understanding of the phenomena involved in establishing temporal relations. We illustrate the potential of linguistically informed solutions in the area of temporal relation identification by proposing and evaluating an initial set of syntactically motivated tasks. RÉSUMÉ. L'annotation de relations temporelles demeure encore aujourd'hui un défi : annoter manuellement de façon fiable et cohérente les relations temporelles dans des textes reste difficile et l'est bien plus encore lorsqu'il s'agit d'annotation automatique. Cet article préconise un changement dans la définition du problème en proposant une approche qui, en s'appuyant sur la syntaxe et sur une stratégie de type « diviser pour conquérir », offre une manière plus élaborée de sélectionner les entités impliquées dans une relation temporelle. La décomposition du problème en de plus petites questions se concentrant sur la syntaxe et l'identification de solutions précises et linguistiquement fondées pour les résoudre favorisent une meilleure compréhension des phénomènes impliqués dans l'établissement de relations temporelles. Nous illustrons le potentiel des solutions linguistiquement fondées dans le cadre de l'identification de relations temporelles en proposant et évaluant une première série de tâches se concentrant sur la syntaxe.
Patient note scoring methods, systems, and apparatus
pages.cs.brandeis.edu
The adverb ”then” is among the most frequent English temporal adverbs, being also capable of fill... more The adverb ”then” is among the most frequent English temporal adverbs, being also capable of filling a variety of semantic roles. The identification of anaphoric usages of ”then” is important for temporal expression resolution, while the temporal relationship usage is important for ...
Proceedings of the International Conference on Recent …, 2007
Temporal information plays an important role in many NLP applications. The identification of temp... more Temporal information plays an important role in many NLP applications. The identification of temporal relations between temporal entities (events and temporal expressions) is indispens-able in obtaining the temporal interpretation of a given text. This paper ...
Proc. of the International …, 2003
Dan Cristea „Al.I.Cuza“ University of Iasi Faculty of Computer Science and Romanian Academy Insti... more Dan Cristea „Al.I.Cuza“ University of Iasi Faculty of Computer Science and Romanian Academy Institute of Theoretical Computer Science – the Iasi branch [email protected] ... Oana Postolache, Georgiana Puşcaşu, Laurenţiu Ghetu „Al.I.Cuza“ University of Iasi Faculty of ...
Proceedings of the 7th Annual Colloquium for the UK …, 2004
This paper addresses the clause splitting problem and proposes a multilingual method for detectin... more This paper addresses the clause splitting problem and proposes a multilingual method for detecting clause boundaries in unrestricted texts. The method combines language independent machine learning techniques with language specific rules in order to take the first step in building ...
A framework for temporal resolution
Proceedings of the 4th Conference on Language …, 2004
Access to the temporal information conveyed in a text can improve the performance of many NLP app... more Access to the temporal information conveyed in a text can improve the performance of many NLP applications. This paper discusses the automatic annotation of temporal expressions (TEs) in newswire texts. This recent research is being pursued with the aim of ...
Knowledge Engineering: Principles and …
QALL-ME (Question Answering Learning technologies in a multiLingual and Multimodal Environment) 1... more QALL-ME (Question Answering Learning technologies in a multiLingual and Multimodal Environment) 1 is an EU-funded project that aims to develop a shared infrastructure for multilingual and multimodal question answering in the domain of tourism. The purpose of this system ...

clg.wlv.ac.uk
The ability to capture the temporal dimension of a natural language text is essential to many nat... more The ability to capture the temporal dimension of a natural language text is essential to many natural language processing applications, such as Question Answering, Automatic Summarisation, and Information Retrieval. Temporal processing is a field of Computational Linguistics which aims to access this dimension and derive a precise temporal representation of a natural language text by extracting time expressions, events and temporal relations, and then representing them according to a chosen knowledge framework. This thesis focuses on the investigation and understanding of the different ways time is expressed in natural language, on the implementation of a temporal processing system in accordance with the results of this investigation, on the evaluation of the system, and on the extensive analysis of the errors and challenges that appear during system development. The ultimate goal of this research is to develop the ability to automatically annotate temporal expressions, verbal events and temporal relations in a natural language text. Temporal expression annotation involves two stages: temporal expression identification concerned with determining the textual extent of a temporal expression, and temporal expression normalisation which finds the value that the temporal expression designates and represents it using an annotation standard. The research presented in this thesis approaches these tasks with a knowledge-based methodology that tackles temporal expressions according to their semantic classification. Several knowledge sources and normalisation models are experimented with to allow an analysis of their impact on system performance. The annotation of events expressed using either finite or non-finite verbs is addressed with a method that overcomes the drawback of existing methods v which associate an event with the class that is most frequently assigned to it in a corpus and are limited in coverage by the small number of events present in the corpus. This limitation is overcome in this research by annotating each WordNet verb with an event class that best characterises that verb. This thesis also describes an original methodology for the identification of temporal relations that hold among events and temporal expressions. The method relies on sentence-level syntactic trees and a propagation of temporal relations between syntactic constituents, by analysing syntactic and lexical properties of the constituents and of the relations between them. The detailed evaluation and error analysis of the methods proposed for solving different temporal processing tasks form an important part of this research. Various corpora widely used by researchers studying different temporal phenomena are employed in the evaluation, thus enabling comparison with state of the art in the field. The detailed error analysis targeting each temporal processing task helps identify not only problems of the implemented methods, but also reliability problems of the annotated resources, and encourages potential reexaminations of some temporal processing tasks. vi The completion of my doctoral studies has been the most significant academic challenge I was ever confronted with. It has been a long journey whose course was sometimes deterred by life getting in the way, but to whose successful completion many people have contributed directly or indirectly, and I would like to take this opportunity to thank them. First of all, I would like to thank my supervisory team, Ruslan Mitkov, John Prager and Constantin Orȃsan, for their trust, encouragement, patience and guidance. I am thankful to Ruslan Mitkov, my director of studies, for making this thesis possible by providing the necessary infrastructure and resources to accomplish my research work, and for acting as my supervisor despite his many other academic and professional commitments. I would like to thank John Prager for finding the time to read my thesis and to provide insightful and creative comments. I am extremely indebted to Constantin Orȃsan, my supervisor, colleague and friend, for always showing a sincere interest in my work, for his constructive criticism, for the extensive discussions concerning my work, and for all the help he has given me throughout my years in Wolverhampton. I would like to express my special gratitude and appreciation to my former research advisor Dan Cristea who introduced me to the world of Natural Language Processing. I still think fondly of my time as a postgraduate student that I have spent working with him. I am privileged for having had Verginica Barbu Mititelu, Iustin Dornescu,
This paper investigates the use of semantic preferences for ontology population. It draws on a ne... more This paper investigates the use of semantic preferences for ontology population. It draws on a new resource, the Pattern Dictionary of English Verbs, which lists semantic categories expected in each syntactic slot of a verb pattern. Knowledge of semantic preferences is used to drive and control bootstrapped pattern extraction techniques on the EnClueWeb09 corpus with the aim of identifying common nouns belonging to twelve semantic types. Evaluation reveals that syntactic patterns perform better than lexical and surface patterns, at the same time raising issues about assessing ontology population candidates out of context.
Question-Answering Systems for Romanian
Lecture Notes in Computer Science, 2007
This paper describes the development of a Question Answering (QA) system and its evaluation resul... more This paper describes the development of a Question Answering (QA) system and its evaluation results in the Romanian-English cross-lingual track organized as part of the CLEF 1 2006 campaign. The development stages of the cross-lingual Question Answering system are described incrementally throughout the paper, at the same time pinpointing the problems that occurred and the way they were addressed. The system adheres to the classical architecture for QA systems, debuting with question processing followed, after term translation, by information retrieval and answer extraction. Besides the common QA difficulties, the track posed some specific problems, such as the lack of a reliable translation engine from Romanian into English, and the need to evaluate each module individually for a better insight into the system's failures.

Lecture Notes in Computer Science, 2007
This paper presents the participation of University of Alicante at the WiQA pilot task organized ... more This paper presents the participation of University of Alicante at the WiQA pilot task organized as part of the CLEF 2006 campaign. For a given set of topics, this task presupposes the discovery of important novel information distributed across different Wikipedia entries. The approach we adopted for solving this task uses Information Retrieval, query expansion by feedback, relevance and novelty re-ranking, as well as temporal ordering. Our system has participated both in the Spanish and English monolingual tasks. For each of the two participations the results are promising because, by employing a language independent approach, we obtain scores above the average. Moreover, in the case of Spanish, our result is very close to the best achieved score. Apart from introducing our system, the present paper also provides an in-depth result analysis, and proposes future lines of research, as well as follow-up experiments.
Lecture Notes in Computer Science
This paper reports on the participation of the University of Wolverhampton in the Multiple Langua... more This paper reports on the participation of the University of Wolverhampton in the Multiple Language Question Answering (QA@CLEF) track of the CLEF 2007 campaign. We approached the Romanian to English cross-lingual task with a Question Answering (QA) system that processes a question in the source language (i.e. Romanian), translates the identified keywords into the target language (i.e. English), and finally searches for answers in the English document collection. We submitted one run of our system that has achieved an overall accuracy of 14%, and a precision over non-NIL answers of 33.73%. Error analysis revealed that this low performance is mainly due to the lack of a reliable translation methodology from the source in the target language.

Lecture Notes in Computer Science, 2007
This paper presents Wolverhampton University's participation in the WiQA competition. The method ... more This paper presents Wolverhampton University's participation in the WiQA competition. The method chosen for this task combines a high precision, but low recall information retrieval approach with a greedy sentence ranking algorithm. The high precision retrieval is ensured by querying the search engine with the exact topic, in this way obtaining only sentences which contain the topic. In one of the runs, the set of retrieved sentences is expanded using conferential relations between sentences. The greedy algorithm used for ranking selects one sentence at a time, always the one which adds most information to the set of sentences without repeating the existing information too much. The evaluation revealed that it achieves a performance similar to other systems participating in the competition and that the run which uses coreference obtains the highest MRR score among all the participants.
Proceedings of the 4th International Workshop on Semantic Evaluations - SemEval '07, 2007
This paper reports on the participation of University of Wolverhampton and University of Alicante... more This paper reports on the participation of University of Wolverhampton and University of Alicante at the SemEval-2007 TempEval evaluation exercise. TempEval consisted of three tasks involving the identification of event-time and event-event temporal relations. We participated in all three tasks with TICTAC (Syntactico-Semantic Temporal Annotation Cluster), a system comprising both knowledge based and statistical techniques. Our system achieved the highest strict and relaxed scores for tasks A and B, and the highest relaxed score for task C.

Lecture Notes in Computer Science, 2006
This paper describes a machine learning approach to the identification of temporal clauses by dis... more This paper describes a machine learning approach to the identification of temporal clauses by disambiguating the subordinating conjunctions used to introduce them. Temporal clauses are regularly marked by subordinators, many of which are ambiguous, being able to introduce clauses of different semantic roles. The paper also describes our work on generating an annotated corpus of sentences embedding clauses introduced by ambiguous subordinators that might have temporal value. Each such clause is annotated as temporal or non-temporal by testing whether it answers the questions when, how often or how long with respect to the action of its superordinate clause. Using this corpus, we then train and evaluate personalised classifiers for each ambiguous subordinator, in order to set apart temporal usages. Several classifiers are evaluated, and the best performing ones achieve an average accuracy of 89.23% across the set of ambiguous connectives. Currently on research leave from University of Wolverhampton, United Kingdom.

This article presents the participation of University of Wolverhampton in the Romanian to English... more This article presents the participation of University of Wolverhampton in the Romanian to English Question Answering task at CLEF-2008. This year we employed a modular framework which allows different modules to be easily plugged in and customised. The main components of our system deal with the three standard stages used in question answering: question processing, paragraph retrieval and answer extraction, and the system's cross-linguality is ensured by a term translator. The question processor analyses Romanian questions and produces a detailed representation of each question including the terms it contains. English translations are then generated for all question terms by exploiting information included in the Romanian and English WordNets, as well as aligned Wikipedia pages. They form the query that Lucene uses to extract English paragraphs which constitute the input for an answer extractor largely based on the one distributed with the OpenEphyra framework. The results indicate a small improvement in comparison with last year's performance.
Uploads
Papers by Georgiana Marsic