Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2008, Proceedings of the 17th ACM conference on Information and knowledge management
Most geographic information retrieval systems depend on the detection and disambiguation of place names in documents, assuming that the documents with a specific geographic scope contain explicit place names in the text that are strongly related to the document scopes. However, some non-geographic names such as companies, monuments or sport events, may also provide indirect relevant evidence that can significantly contribute to the assignment of geographic scopes to documents. In this paper, we analyze the amount of implicit and explicit geographic evidence in newspaper documents, and measure its impact on geographic information retrieval by evaluating the performance of a retrieval system using the GeoCLEF evaluation data.
2009
Abstract. For the 2008 participation at GeoCLEF, we focused on improving the extraction of geographic signatures from documents and optimising their use for GIR. The results show that the detection of explicit geographic named entities for including their terms in a tuned weighted index field significantly improves retrieval performance when compared to classic text retrieval.
Foundations and Trends® in Information Retrieval, 2018
Significant amounts of information available today contain references to places on earth. Traditionally such information has been held as structured data and was the concern of Geographic Information Systems (GIS). However, increasing amounts of data in the form of unstructured text are available for indexing and retrieval that also contain spatial references. This monograph describes the field of Geographic Information Retrieval (GIR) that seeks to develop spatially-aware search systems and support user's geographical information needs. Important concepts with respect to storing, querying and analysing geographical information in computers are introduced, before user needs and interaction in the context of GIR are explored. The task of associating documents with coordinates, prior to their indexing and ranking forms the core of any GIR system, and different approaches and their implications are discussed. Evaluating the resulting systems and their components, and different paradigms for doing so continue to be an important area of research in GIR and are illustrated through a number of examples. The article concludes by setting out a range of future challenges for research in this field.
2007
We briefly present requirements and a methodology of semantic annotation for automatic indexing and geo-referencing of text documents. The first evaluation results shows that combining a spatial approach with a classical (statistical-based) IR one, improves in a significant way retrieval accuracy, namely in the case of "realistic" queries.
Geographical Information Retrieval (GIR) is a significant procedure to discover data that addresses the need of inquiry about geographical data. Preparing of typical content is less demanding and data can be recovered proficiently. There are a lot of calculations close by to do the typical content recovery. While recovering geospatial data is exceptionally complex and requires extra operations to be performed. Since geospatial information contains complex subtle elements than general information, for example location, direction. To deal with land inquiries, we proposed Mass Probability on Document Correlation (MPDC) approach. This approach, at first sorts the geological highlights from content that fulfils the given questions. Existing content characterization strategies are inadmissible for geospatial content grouping because of the elatedness of the topographical elements. Contingent upon the MPDC approach result we anticipate cover of the list of capabilities for a record. In view of cover and record relationship, the archives are positioned. Very significant records are removed relying upon the score acquired through positioning. Reports with high score are viewed as the most significant. The trial comes about demonstrate that our proposed technique proficiently recovers the rundown of significant archives.
My thesis aims to augment the Geographic Information Retrieval process with information extracted from world knowledge. This aim is approached from three directions: classifying world knowledge, disambiguating placenames and modelling users. Geographic information is becoming ubiquitous across the Internet, with a significant proportion of web documents and web searches containing geographic entities, and the proliferation of Internet enabled mobile devices. Traditional information retrieval treats these geographic entities in the same way as any other textual data. In this thesis I augment the retrieval process with geographic information, and show how methods built upon world knowledge outperform methods based on heuristic rules. The source of world knowledge used is Wikipedia. Wikipedia has become a phenomenon of the Internet age and needs little introduction. As a linked corpus of semi-structured data, it is unsurpassed. Two approaches to mining information from Wikipedia are rigorously explored: initially I classify Wikipedia articles into broad categories; this is followed by much finer classification where Wikipedia articles are disambiguated as specific locations. The thesis concludes with the proposal of the Steinberg hypothesis: By analysing a range of wikipedias in different languages I demonstrate that a localised view of the world is ubiquitous and inherently part of human nature. All people perceive closer places as larger and more important than distant ones. The core contributions of mythesis are in the areas of extracting information from Wikipedia, supervised placename disambiguation, and providing a quantitative model for how people view the world. The findings clearly have a direct impact for applications such as geographically aware search engines, but in a broader context documents can be automatically annotated with machine readable meta-data and dialogue enhanced with a model of how people view the world. This will reduce ambiguity and confusion in dialogue between people or computers.
International Journal of Geographical Information Science, 2007
Metonymically used location names (toponyms) refer to other, related entities and thus possess a meaning different from their literal, geographic sense. Metonymic uses are to be treated differently to improve the performance of geographic information retrieval (GIR). Statistics on toponym senses show that 75.06% of all location names are used in their literal sense, 17.05% are used metonymically, and 7.89% have a mixed sense. This article presents a method for disambiguating location names in texts between literal and metonymic senses, based on shallow features.The evaluation of this method is two‐fold. First, we use a memory‐based learner (TiMBL) to train a classifier and determine standard evaluation measures such as F‐score and accuracy. The classifier achieved an F‐score of 0.842 and an accuracy of 0.846 for identifying toponym senses in a subset of the CoNLL (Conference on Natural Language Learning) data.Second, we perform retrieval experiments based on the GeoCLEF data (newspaper article corpus and queries) from 2005 and 2006. We compare searching location names in a database index containing both their literal and metonymic senses with searching in an index containing their literal senses only. Evaluation results indicate that removing metonymic senses from the index yields a higher mean average precision (MAP) for GIR. In total, we observed a significant gain in MAP: an increase from 0.0704 to 0.0715 MAP for the GeoCLEF 2005 data, and an increase from 0.1944 to 0.2100 MAP for the GeoCLEF 2006 data.
Geographic location is a key component for information retrieval on the Web, recommendation systems in mobile computing and social networks, and placebased integration on the Linked Data cloud. Previous work has addressed how to estimate locations by named entity recognition, from images, and via structured data. In this paper, we estimate geographic regions from unstructured, non geo-referenced text by computing a probability distribution over the Earth's surface. Our methodology combines natural language processing, geostatistics, and a data-driven bottom-up semantics. We illustrate its potential for mapping geographic regions from non geo-referenced text.
Encyclopedia of Database Systems, 2009
A simple definition is that gazetteers are dictionaries of placenames. The digital gazetteer as a component of georeferenced information systems, however, is more formally modeled. A gazetteer is defined as a collection of gazetteer entries, each of which contains, at a minimum, the tuple N, F, T where N is a place name, F is a formal expression of geographic locationa footprint, and T is a place type expressed with a term (or code) from a typing scheme. Applications often require, in addition, relationships between gazetteer entries, documentation of time frames, and additional information (as described below). The gazetteer model is a type of knowledge organization system (KOS) -or ontology -which can be modified to represent other classes of spatial-temporal information, such as named time periods and named events [3]. Gazetteers support bidirectional translation between informal georeferencing using names (e.g., Las Vegas) and formal georeferencing using coordinates (e.g.,
2008
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic information retrieval requires an evaluation resource which represents realistic information needs and which is geographically challenging. Some experimental results and analysis are reported. GeoCLEF Year Collection Languages Topic Languages
2004
Geographic Information Task (GeoTask) is one of the newly proposed tasks at the NTCIR-4 WEB. Geographic information is close to our daily lives, and is one of the real ways to access Web information. Researches and developments of such aspects have been increasing recently, however, comparative evaluations of such kinds of techniques has not been carried out so far. GeoTask focused on the technology that the system extracts geographic information from Web documents relevant to a given viewpoint. The aim of this workshop is to expedite and advance researches and developments of Geographic IR technologies for the Web, therefore we are going to build reusable test collection for evaluating various methods of Geographic IR for Web documents. In this paper, challenges for searching geographic information are described, which are discussed through the explanation of our research on developing geographic IR systems.
ACM SIGIR Forum, 2004
Geographic Information Retrieval is fast emerging as an interdisciplinary hot-topic, both in an academic and commercial sense. Retrieving data based not only on conceptual key words, but some notion of the locational relevance of the information requires research of a range of techniques, for example• the extraction of geographic terms from structured and, more challengingly, unstructured data;• the identification and removal of ambiguities in such extraction procedures;• methodologies for sffeciently storing information about locations and their relationship;• development of search engines and algorithms to take advantage of such geographic information;• the combination of geographic and contextual relevance to give a meaningful combined relevance to documents; and• techniques to allow the user to interact with and explore the results of queries to a geographically-aware IR system.
2010
Abstract Geographic Information Retrieval (GIR) systems rely on the identification and disambiguation of place names in documents to determine the region about which they are relevant. The place names are mapped into geographic concepts and used to assign an encompassing concept (a scope) to each document. However, sometimes a single scope is too restrictive and insufficient for capturing the geographic semantics of a document.
Computers & Geosciences, 2016
Recognizing references to places in texts is needed in many applications, such as search engines, location-based social media and document classification. In this paper we present a survey of methods and techniques for the recognition and identification of places referenced in texts. We discuss concepts and terminology, and propose a classification of the solutions given in the literature. We introduce a definition of the Geographic Scope Resolution (GSR) problem, dividing it in three steps: geoparsing, reference resolution, and grounding references. Solutions to the first two steps are organized according to the method used, and solutions to the third step are organized according to the type of output produced. We found that it is difficult to compare existing solutions directly to one another, because they often create their own benchmarking data, targeted to their own problem.
Abstract. This paper reports University of Pittsburgh's participation in GeoCLEF 2008. As the first time participants, we only worked on the monolingual GeoCLEF task and submitted four runs under two different methods. Our GCEC method aims to test the effectiveness of our online geographic coordinate extraction and clustering algorithm, and our WIKIGEO method wants to examine the usefulness of using the geo-coordinate information in Wikipedia for identifying geo-locations. Our experiments results show that: 1) our online geographic ...
2006
Abstract. The processing steps required for geographic information retrieval include many steps that are common to all forms of information retrieval, eg stopword filtering, stemming, vocabulary enrichment, understanding Booleans, and fluff removal. Only a few steps, in particular the detection of geographic entities and the assignment of bounding boxes to these, are specific to geographic IR.
2007
Location indicators are text segments from which a geographic scope can be inferred, e.g. adjectives, demonyms (names for inhabitants of a place), geographic codes, orthographic variants, and abbreviations can be mapped to location names in one or more inferential steps. In this paper, the normalization of location indicators and treating morphology of location indicators for geographic information retrieval (GIR) within the system GIRSA (Geographic Information Retrieval by Semantic Annotation) are explored.
2006
This paper presents the 2005 MIRACLE's team approach to Cross-Language Geographical Retrieval (GeoCLEF). The main goal of the GeoCLEF participation of the MIRACLE team was to test the effect that geographical information retrieval techniques cause to information retrieval. The baseline approach is based on the development of named entity recognition and geospatial information retrieval tools and on its combination with linguistic techniques to perform indexing and retrieval tasks.
International Journal on Digital Libraries, 2010
Search engines for Digital Libraries allow users to retrieve documents according to their contents. They process documents without differentiating the manifold aspects of information. Spatial and temporal dimensions are particularly dismissed. These dimensions are, however, of great interest for users of search engines targeting either the Web or specialized Digital Libraries. Recent studies reported that nearly 20% queries convey spatial and temporal information in addition to topical information. These three dimensions were referred to as parts of 'geographic information.' In the literature, search engines handling those dimensions are called 'Geographic Information Retrieval (GIR) systems.' Although several initiatives for evaluating GIR systems were undertaken, none was concerned with evaluating these three dimensions altogether. In this article, we address this issue by designing an evaluation framework, whose usefulness is highlighted through a case study involving a test collection and a GIR system. This framework allowed the comparison of our GIR system to state-of-the-art topical approaches. We also performed experiments for measuring performance improvement stemming from each dimension or their combination. We show that combining the three dimensions yields improvement in effectiveness (+73.9%) over a common topical baseline. Moreover, rather than conveying redundancy, the three dimensions complement each other.
2008
In this paper we present an evaluation resource for geographic information retrieval developed within the Cross Language Evaluation Forum (CLEF). The GeoCLEF track is dedicated to the evaluation of geographic information retrieval systems. The resource encompasses more than 600,000 documents, 75 topics so far, and more than 100,000 relevance judgments for these topics. Geographic information retrieval requires an evaluation resource which represents realistic information needs and which is geographically challenging. Some experimental results and analysis are reported.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.