A fast index for semistructured data

Neal Sample

A fast index for semistructured data

Neal Sample

2001

Sign up for access to the world's latest research

checkGet notified about relevant papers

checkSave papers to use in your research

checkJoin the discussion with peers

checkTrack your impact

Abstract

Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how "raw paths" are used to optimize ad hoc queries over semistructured data, and how "refined paths" optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system's indexing mechanisms to query the XML.

Key takeaways

In the Index Fabric, we have constructed both refined and raw paths, while the relational index utilized an edge mapping as well as a schema extracted by the STORED [12] system.
• We examine a simple encoding of the raw paths in a semistructured document, and discuss how to answer complex path queries over data with irregular structure using raw paths.
In this section, we discuss encoding XML paths as keys for insertion into the fabric, and how to use path lookups to evaluate queries.
The Index Fabric contained both raw paths and refined paths for the DBLP documents.
Our structure provides a single index for all queries, and one lookup to evaluate the query using a refined path.

Michael Hannaford, B. Alom

The most promising and dominant data format for data processing and representing on the Internet is the Semistructured data form termed XML. XML data has no fixed schema; it evolved and is self describing which results in management difficulties compared to, for example relational data. XML queries differ from relational queries in that the former are expressed as path expressions. The efficient handling of structural relationships has become a key factor in XML query processing. It is therefore a major challenge for the database community to design query processing techniques and storage methods that can manage semistructured data efficiently. The main contribution of this paper is querying semistructured data using bitmap to represent path-value relationship and compress the bitmap to save space. The presented bitmap indexing and querying scheme termed BIQS data that stores the element path, token of the word, attribute and document number in a dynamically created matrix structure. We use word, attribute and path dictionaries for the construction of a Bitmap structure. This paper describes an algorithm to query semistructured data in a more time efficient way than is provided by other relational and semistructured query processing techniques. The presented BIQS structure provides storage and query performance improvement due to the compression of semistructured data.

Log In

A fast index for semistructured data

Sign up for access to the world's latest research

Abstract

Key takeaways

Related papers

Related topics

Related papers