Papers by Caterina Caracciolo
In this paper we present an original approach to natural language query interpretation which has ... more In this paper we present an original approach to natural language query interpretation which has been implemented within the FuLL (Fuzzy Logic and Language) Italian project of BC S.r.l. In particular, we discuss here the creation of linguistic and ontological resources, together with the exploitation of existing ones, for natural language-driven database access and retrieval. Both the database and the queries we experiment with are Italian, but the methodology we broach naturally extends to other languages.
... ISBN: 978–90–5776–176–8 Page 6. Ai luoghi della vita e alle biciclette che li attraversano. i... more ... ISBN: 978–90–5776–176–8 Page 6. Ai luoghi della vita e alle biciclette che li attraversano. iv Page 7. Acknowledgments ... A special thanks to Arjen de Vries for generously reading early versions of different chapters of my thesis and commenting on them. ...
... We thank Joost Kircz and the referees for helpful comments and sug-gestions. INSTITUTE FOR LO... more ... We thank Joost Kircz and the referees for helpful comments and sug-gestions. INSTITUTE FOR LOGIC, LANGUAGE AND COMPUTATION UNIVERSITY OF AMSTERDAM NIEUWE ACHTERGRACHT 166 1018WV AMSTERDAM THE NETHERLANDS { caterina,mdr} @science ...

When presented with a retrieved document, users of a search engine are usually left with the task... more When presented with a retrieved document, users of a search engine are usually left with the task of pinning down the relevant information inside the document. Often this is done by a time-consuming combination of skimming, scrolling and Ctrl+F. In the setting of a digital library for scientific literature the issue is especially urgent when dealing with reference works, such as surveys and handbooks, as these typically contain long documents. Our aim is to develop methods for providing a “go-read-here” type of retrieval functionality, which points the user to a segment where she can best start reading to find out about her topic of interest. We examine multiple query-independent ways of segmenting texts into coherent chunks that can be returned in response to a query. Most (experienced) authors use paragraph breaks to indicate topic shifts, thus providing us with one way of segmenting documents. We compare this structural method with semantic text segmentation methods, both with respect to topical focus and relevancy. Our experimental evidence is based on manually segmented scientific documents and a set of queries against this corpus. Structural segmentation based on contiguous blocks of relevant paragraphs is shown to be a viable solution for our intended application of providing “go-read-here” functionality.
Page 1. NeOn-project.org NeOn: Lifecycle Support for Networked Ontologies Integrated Project (IST... more Page 1. NeOn-project.org NeOn: Lifecycle Support for Networked Ontologies Integrated Project (IST-2005-027595) Priority: IST-2004-2.4.7 Semantic-based knowledge and content systems D1.1.3 NeOn Formalisms for Modularization: Syntax, Semantics, Algebra ...
Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns a... more Ontology matching consists of finding correspondences between ontology entities. OAEI campaigns aim at comparing ontology matching systems on precisely defined test sets. Test sets can use ontologies of different nature (from expressive OWL ontologies to simple directories) and use different modalities, e.g., blind evaluation, open evaluation, consensus. OAEI-2008 builds over previous campaigns by having 4 tracks with 8 test sets followed by 13 participants. Following the trend of previous years, more participants reach the forefront. The official results of the campaign are those published on the OAEI web site.
We address the issue of providing topic driven access to full text documents. The methodology we ... more We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertext links, and as a visualization aid for the reader who is presented with a small set of focused and restricted text snippets. In the presence of a concept hierarchy, or ontology, information retrieval techniques can be used to connect the segments obtained to concepts in the ontology. In this paper we concentrate on the text segmentation phase: we describe our approach to segmentation, discuss issues related to evaluation, and report on preliminary results.
... al. (Eds.) VWF Berlin, 2002 Towards Scientific Information Disclosure Through Concept Hiera... more ... al. (Eds.) VWF Berlin, 2002 Towards Scientific Information Disclosure Through Concept Hierarchies 1 Caterina Caracciolo1, Maarten de Rijke1, and Joost Kircz2 1 ILLC, University ofAmsterdam {caterina, mdr}Qscience.uva.nl 2 KRA Publishing office9kra.nl Abstract. ...
In this paper we report on ongoing work concerning the creation of a network of ontologies based ... more In this paper we report on ongoing work concerning the creation of a network of ontologies based on metadata for time series relative to the domain of fisheries, and hint at the possibility of exploiting the network for web service applications. The results obtained so far show that the reengineering of classification systems stored as relational databases is possible, although some technical problems is still to be addressed.
We address the issue of providing topic driven access to full text documents. The methodology we ... more We address the issue of providing topic driven access to full text documents. The methodology we propose is a combination of topic segmentation and information retrieval techniques. By segmenting the text into topic driven segments, we obtain small and coherent documents that can be used in two ways: as a basis for automatically generating hypertext links, and as a visualization aid for the reader who is presented with a small set of focused and restricted text snippets. In the presence of a concept hierarchy, or ontology, information retrieval techniques can be used to connect the segments obtained to concepts in the ontology. In this paper we concentrate on the text segmentation phase: we describe our approach to segmentation, discuss issues related to evaluation, and report on preliminary results.
... Soonho Kim, Marta Iglesias Sucasas, Caterina Caracciolo, Andrew Bagdanov (FAO); ... Óscar Muñ... more ... Soonho Kim, Marta Iglesias Sucasas, Caterina Caracciolo, Andrew Bagdanov (FAO); ... Óscar Muñoz-García, María-del-Carmen Suárez-Figueroa, Asunción Gómez-Pérez (UPM); ... Hall Milton Keynes, MK7 6AA United Kingdom Contact person: Martin Dzbor, Enrico Motta E-mail ...
International organizations like FAO are intrinsically multilingual. FAO is currently experimenti... more International organizations like FAO are intrinsically multilingual. FAO is currently experimenting with semantic-oriented technologies based on ontologies, with the purpose of integrating data across various information systems and providing better services to end users. However, in order for these technologies to be used in real-life scenarios, models and tools for accommodating and managing multilingual data are needed. This paper analyzes the requirements for the treatment of multilinguality as resulting from the experience we gained at FAO.

Computing Research Repository, 2008
Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used t... more Knowledge organization systems (KOS), like thesauri and other controlled vocabularies, are used to provide subject access to information systems across the web. Due to the heterogeneity of these systems, mapping between vocabularies becomes crucial for retrieving relevant information. However, mapping thesauri is a laborious task, and thus big efforts are being made to automate the mapping process. This paper examines two mapping approaches involving the agricultural thesaurus AGROVOC, one machine-created and one human created. We are addressing the basic question "What are the pros and cons of human and automatic mapping and how can they complement each other?" By pointing out the difficulties in specific cases or groups of cases and grouping the sample into simple and difficult types of mappings, we show the limitations of current automatic methods and come up with some basic recommendations on what approach to use when.
Talks by Caterina Caracciolo
On Open Data: Intellectual property and Data license
Role of standard vocabularies to search for ... more On Open Data: Intellectual property and Data license
Role of standard vocabularies to search for and describe Open Data, especially in the context of soil data infrastructures
agINFRA work on lifting the local values used in ISIS to published, linked vocabularies
agINFRA Soil Terms vs Agrovoc & NALT
Toward a real interoperability of soil data: SOIL.WRB
Uploads
Papers by Caterina Caracciolo
Talks by Caterina Caracciolo
Role of standard vocabularies to search for and describe Open Data, especially in the context of soil data infrastructures
agINFRA work on lifting the local values used in ISIS to published, linked vocabularies
agINFRA Soil Terms vs Agrovoc & NALT
Toward a real interoperability of soil data: SOIL.WRB
Role of standard vocabularies to search for and describe Open Data, especially in the context of soil data infrastructures
agINFRA work on lifting the local values used in ISIS to published, linked vocabularies
agINFRA Soil Terms vs Agrovoc & NALT
Toward a real interoperability of soil data: SOIL.WRB