Rapper

David Mattox; Len Seligman; Ken Smith

Rapper: a wrapper generator with linguistic knowledge

David Mattox

1999

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form. Often, wrappers are needed to extract information and organize it into a graph structure that makes explicit the concepts users want to query and update. This paper presents a new approach to wrapper generation that exploits linguistic knowledge. The approach produces a more fine-grained parse of sources with natural language text than previous efforts. The resulting graph structured databases answer queries that could not be formulated in databases produced by prior generated wrappers. In addition, our approach may be more robust in the face of slight variations in word choice and order. We discuss a prototype implementation, lessons learned to date, evaluation issues, and future research directions.

Key takeaways

Researchers in semistructured data have converged on the use of graph-structured data models (e.g., OEM [9]), in which information is represented using a labeled, directed graph.
We call a program which automatically (or semiautomatically) extracts graph-structured data a "wrapper", and the creation of wrappers "wrapper generation".
In contrast, a deeper parse of this same information ( Figure 5) permits this query to be answered.
Our examples illustrate the benefits of performing deeper parsing in wrapper generation; the resulting graph-structured DBMS can answer queries that would not otherwise be answerable.
When using an approach such as ours that attempts to create a deeper parse of the source text, there is a non-trivial up-front cost in creating wrappers.

Georgia Koutrika

Proceedings of the 11th international conference on Extending database technology Advances in database technology - EDBT '08, 2008

In the classical database world, information access has been based on a paradigm that involves structured, schema-aware, queries and tabular answers. In the current environment, however, where information prevails in most activities of society, serving people, applications, and devices in dramatically increasing numbers, this paradigm has proved to be very limited. On the query side, much work has been done on moving towards keyword queries over structured data. In our previous work, we have touched the other side as well, and have proposed a paradigm that generates entire databases in response to keyword queries. In this paper, we continue in the same direction and propose synthesizing textual answers in response to queries of any kind over structured data. In particular, we study the transformation of a dynamically-generated logical database subset into a narrative through a customizable, extensible, and templatebased process. In doing so, we exploit the structured nature of database schemas and describe three generic translation modules for different formations in the schema, called unary, split, and join modules. We have implemented the proposed translation procedure into our own database front end and have performed several experiments evaluating the textual answers generated as several features and parameters of the system are varied. We have also conducted a set of experiments measuring the effectiveness of such answers on users. The overall results are very encouraging and indicate the promise that our approach has for several applications.

Log In

Rapper: a wrapper generator with linguistic knowledge

Sign up for access to the world's latest research

Abstract

Key takeaways

Related papers

Related topics