Academia.eduAcademia.edu

A fast index for semistructured data

2001

Abstract

Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We describe the Index Fabric, an indexing structure that provides the efficiency and flexibility we need. We discuss how "raw paths" are used to optimize ad hoc queries over semistructured data, and how "refined paths" optimize specific access paths. Although we can use knowledge about the queries and structure of the data to create refined paths, no such knowledge is needed for raw paths. A performance study shows that our techniques, when implemented on top of a commercial relational database system, outperform the more traditional approach of using the commercial system's indexing mechanisms to query the XML.

Key takeaways

  • In the Index Fabric, we have constructed both refined and raw paths, while the relational index utilized an edge mapping as well as a schema extracted by the STORED [12] system.
  • • We examine a simple encoding of the raw paths in a semistructured document, and discuss how to answer complex path queries over data with irregular structure using raw paths.
  • In this section, we discuss encoding XML paths as keys for insertion into the fabric, and how to use path lookups to evaluate queries.
  • The Index Fabric contained both raw paths and refined paths for the DBLP documents.
  • Our structure provides a single index for all queries, and one lookup to evaluate the query using a refined path.