Papers by Alessio Palmero Aprosio

Proceedings of the 1st Workshop on NLP & DBpedia (ISWC), 2013
DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles. Due ... more DBpedia is a Semantic Web project aiming to extract structured data from Wikipedia articles. Due to the increasing number of resources linked to it, DBpedia plays a central role in the Linked Open Data community. Currently, the information contained in DBpedia is mainly collected from Wikipedia infoboxes, a set of subject-attribute-value triples that represents a summary of the Wikipedia page. These infoboxes are manually compiled by the Wikipedia contributors, and in more than 50% of the Wikipedia articles the infobox is missing. In this article, we use the distant supervision paradigm to extract the missing information directly from the Wikipedia article, using a Relation Extraction tool trained on the information already present in DBpedia. We evaluate our system on a data set consisting of seven DBpedia properties, demonstrating the suitability of the approach in extending the DBpedia coverage.

Proceedings of the 12th International Semantic Web Conference, 2013
DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extra... more DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, the number of accomplished mappings is still small and limited to most frequent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings. We compared our results with the current mappings on a random sample re-annotated by the authors. We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed. Specifically, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not yet available. The resource is made available in an open format.

Proceedings of the 13th International Conference on Knowledge Management and Knowledge Technologies, 2013
DBpedia is a Semantic Web resource that aims at representing Wikipedia in RDF triples. Due to the... more DBpedia is a Semantic Web resource that aims at representing Wikipedia in RDF triples. Due to the large and growing number of resources linked to it, DBpedia has become central for the Semantic Web community. The English version currently covers around 1.7M Wikipedia pages. However, the English Wikipedia contains almost 4M pages. This means that there is a substantial problem of coverage (even bigger in other languages). The coverage slowly increases thanks to the manual effort made by various local communities. This effort is aimed at manually mapping Wikipedia templates into DBpedia ontology classes and then run the open-source software provided by the DBpedia community to extract the triples. In this paper, we present an approach to automatically map templates and we release the resulting resource in 25 languages. We describe the used algorithm, starting from the existing mappings on other languages and extending them using the cross-lingual information available in Wikipedia. We evaluate our system on the mappings of a set of languages already included in DBpedia (but not used during the training phase), demonstrating that our approach can replicate the human mappings with high precision and recall, and producing an additional set of mappings not included in the original DBpedia.

Proceedings of the 10th Extended Semantic Web Conference, 2013
DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central rol... more DBpedia is a project aiming to represent Wikipedia content in RDF triples. It plays a central role in the Semantic Web, due to the large and growing number of resources linked to it. Nowadays, the English version covers around 1.7M Wikipedia pages, although the English Wikipedia contains almost 4M pages, showing a clear problem of coverage. In other languages (like French and Spanish) the coverege is even much lower. Therefore, the objective of this paper is to define a methodology to increase the coverage of DBpedia in different languages. The major problems that we have to solve concern the high number of classes involved in the DBpedia ontology and the lack of coverage for some classes in certain languages. In order to deal with these problems, we first extend population of the classes for the different languages by connecting the corresponding Wikipedia pages through cross-language links. Then, we train a supervised classifier using this extended set as training data. We evaluated our system using a manually annotated test set, demonstrating that our approach can expand DBpedia with high precision (90%) and recall (50%). The resulting resource will be made available through a SPARQL endpoint and a downloadable package.

Intelligenza Artificiale, 2012
The Semantic Web is an extension of the classical web. The data and schemas it adds coexist with ... more The Semantic Web is an extension of the classical web. The data and schemas it adds coexist with the documents that were already linked and available. This not only allows interoperability, reusability and potentially unforeseen applications of opened data, but it also creates a unique situation of availability on the web of huge collections of the same pieces of information available at the same time as text and as structured data. An interesting example is the couple Wikipedia-DBpedia: exploiting these interlinked structured and unstructured data sources in parallel can offer a great potential for both Natural Language Processing and Semantic Web applications. Starting from these observations, this paper addresses the problem of enhancing interactions between non-expert users and data available on the Web. In particular, we present QAKiS, a system for open domain Question Answering over linked data, that addresses the problem of question interpretation as a relation-based match, where fragments of the question are matched to binary relations of the triple store, using relational textual patterns automatically collected. In the current version, the relational patterns are automatically extracted from Wikipedia, while DBpedia is the data set to be queried using a natural language interface.
Proceedings of the ISWC 2012 Posters & Demonstrations Track, 2012
We present QAKiS, a system for open domain Question Answering over linked data. It addresses the ... more We present QAKiS, a system for open domain Question Answering over linked data. It addresses the problem of question interpretation as a relation-based match, where fragments of the question are matched to binary relations of the triple store, using relational textual patterns automatically collected. For the demo, the relational patterns are automatically extracted from Wikipedia, while DBpedia is the RDF data set to be queried using a natural language interface.
Proceedings of the ESWC 2012 workshop Interacting with Linked Data, 2012
We present QAKiS, a system for Question Answering over linked data (in particular, DBpedia). The ... more We present QAKiS, a system for Question Answering over linked data (in particular, DBpedia). The problem of question interpretation is addressed as the automatic identification of the set of relevant relations between entities in the natural language input question, matched against a repository of automatically collected relational patterns (i.e. the WikiFramework repository). Such patterns represent possible lexicalizations of ontological relations, and are associated to a SPARQL query derived from the linked data relational patterns. Wikipedia is used as the source of free text for the automatic extraction of the relational patterns, and DBpedia as the linked data resource to provide relational patterns and to be queried using a natural language interface.
L'abbiamo letto tutti da bambini. Magari ci è stato raccontato e l'abbi... more L'abbiamo letto tutti da bambini. Magari ci è stato raccontato e l'abbiamo poi riletto da adulti, da padri o da madri, da zii o nonni. ne ricordiamo la versione targata Walt disney. Probabilmente l'abbiamo rivisto in qualche rappresentazione televisiva. È Pinocchio, il capolavoro per eccellenza della letteratura per l'infanzia, l'unico testo italiano dell'ottocento, che è conosciuto e si è affermato in tutto il mondo (neanche I promessi sposi hanno raggiunto una simile notorietà). eppure nessuno di noi probabilmente aveva mai pensato ...
Prima dell'avvento di internet, la scienza che si occupava di inviare messaggi al riparo da occhi... more Prima dell'avvento di internet, la scienza che si occupava di inviare messaggi al riparo da occhi indiscreti era segregata a meri scopi militari, anzi molto spesso gli stessi scopritori di nuovi sistemi sicuri per comunicare venivano obbligati a mantenere il segreto, con l'inevitabile conseguenza che le scoperte non venivano loro riconosciute. Simile destinoè toccato ad Alan Turing, ideatore dell'informatica moderna, che durante la Seconda Guerra Mondialeè riuscito a decifrare i messaggi delle forze tedesche, ma del quale non siè saputo nulla fino agli anni Settanta.
Uploads
Papers by Alessio Palmero Aprosio