Semistructured Data Research Papers

Data on the Web: from relations to semistructured data and XML

2000

Sponsoring Editor Diane Cerra Director of Production and Manufacturing Yonie Overton Production Editor Heather Collins Editorial Coordinator Belinda Breyer Cover Design Martin Heirakuji Text Design Mark Ong Composition and Illustration... more

descriptionView Paper arrow_downwardDownload

Views for Semistructured Data

by Vasilis Vassalos

1997, Proceedings of the …

descriptionView Paper arrow_downwardDownload

Storing Semistructured Data with STORED

by alin deutsch

1999, Sigmod Record

descriptionView Paper arrow_downwardDownload

A fast index for semistructured data

by Neal Sample

2001

Queries navigate semistructured data via path expressions, and can be accelerated using an index. Our solution encodes paths as strings, and inserts those strings into a special index that is highly optimized for long and complex keys. We... more

descriptionView Paper arrow_downwardDownload

Extracting Semistructured Information from the Web

by Hèctor Garcia

1997

We describe a configurable tool for extracting semistructured data from a set of HTML pages and for converting the extracted information into database objects. The input to the extractor is a declarative specification that states where... more

descriptionView Paper arrow_downwardDownload

Semantic integration of semistructured and structured data sources

by sonia bergamaschi

1999, ACM SIGMOD Record

Providing an integrated access to multiple heterogeneous sources is a challenging issue in global information systems for cooperation and interoperability. In this context, two fundamental problems arise. First, how to determine if the... more

Providing an integrated access to multiple heterogeneous sources is a challenging issue in global information systems for cooperation and interoperability. In this context, two fundamental problems arise. First, how to determine if the sources contain semantically related information, that is, information related to the same or similar real-world concept(s). Second, how to handle semantic heterogeneity to support integration and uniform query interfaces. Complicating factors with respect to conventional view integration techniques are related to the fact that the sources to be integrated already exist and that semantic heterogeneity occurs on the large-scale, involving terminology, structure, and context of the involved sources, with respect to geographical, organizational, and functional aspects related to information use. Moreover, to meet the requirements of global, Internet-based information systems, it is important that tools developed for supporting these activities are semi-automatic and scalable as much as possible. The goal of this paper is to describe the MOMIS 4, 5] (Mediator envirOnment for Multiple Information Sources) approach to the integration and query of multiple, heterogeneous information sources, containing structured and semistructured data. MOMIS has been conceived as a joint collaboration between University of Milano and Modena in the framework of the INTERDATA national research project, aiming at providing methods and tools for data management in Internet-based information systems. Like other integration projects 1, 10, 14], MOMIS follows a \semantic approach" to information integration based on the conceptual schema, or metadata, of the information sources, and on the following architectural elements: i) a common object-oriented data model, de ned according to the ODL I 3 language, to describe source schemas for integration purposes. The data model and ODL I 3 have been de ned in MOMIS as subset of the ODMG-93 ones, following the proposal for a standard mediator language developed by the I 3 /POB working group 7]. In addition, ODL I 3 introduces new constructors to support the semantic integration process 4, 5]; ii) one or more wrappers, to translate schema descriptions into the common ODL I 3 representation; iii) a mediator and a query-processing component, based on two pre-existing tools, namely ARTEMIS 8] and ODB-Tools 3] (available on Internet at http://sparc20.dsi.unimo.it/), to provide an I 3 architecture for integration and query optimization. In this paper, we focus on capturing and reasoning about semantic aspects of schema descriptions of heterogeneous information sources for supporting integration and query optimization. Both semistructured and structured data sources are taken into account 5]. A Common Thesaurus is constructed, which has the role of a shared ontology for the information sources. The Common Thesaurus is built by analyzing ODL I 3 descriptions of the sources, by exploiting the Description Logics OLCD (Object Language with Complements allowing Descriptive cycles) 2, 6], derived from KL-ONE family 17]. The knowledge in the Common Thesaurus is then exploited for the identi cation of semantically related information in ODL I 3 descriptions of di erent sources and for their integration at the global level. Mapping rules and integrity constraints are de ned at the global level to express the relationships holding between the integrated description and the sources descriptions. ODB-Tools, supporting OLCD and description logic inference techniques, allows the analysis of sources descriptions for generating a consistent Common Thesaurus and provides support for semantic optimization of queries at the global level, based on de ned mapping rules and integrity constraints.

descriptionView Paper arrow_downwardDownload

UnQL: A Query Language and Algebra for Semistructured Data Based on Structural Recursion

by Mary Fernandez

2000, The Vldb Journal

Abstract. This paper presents structural recursion as the basis of the syntax and semantics of query languages for semistructured data and XML. We describe a simple and powerful query language based on pattern matching and show that it... more

descriptionView Paper arrow_downwardDownload

Extracting schema from semistructured data

by Svetlozar Nestorov

1998, Proceedings of the 1998 ACM SIGMOD international conference on Management of data - SIGMOD '98

Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data has some implicit structure.

descriptionView Paper arrow_downwardDownload

Ontology-based extraction and structuring of information from data-rich unstructured documents

by Sakthipriya selvaraj

1998, Proceedings of the …

descriptionView Paper arrow_downwardDownload

Rewriting of Regular Expressions and Regular Path Queries

by Diego Calvanese

1999, Proceedings of the …

Recent work on semi-structured data has revitalized the interest in path queries, i.e., queries that ask for all pairs of objects in the database that are connected by a path conforming to a certain specification, in particular to a... more

descriptionView Paper arrow_downwardDownload

Haystack: A General-Purpose Information Management Tool for End Users Based on Semistructured Data

by Vineet Sinha

2005

We posit that a semistructured data model offers the right balance of rich structure and flexible (or lack of) schema allowing naive end users to record information in whatever form makes it easy for them to manage. We describe our... more

descriptionView Paper arrow_downwardDownload

Query containment for conjunctive queries with regular expressions

by Dan Suciu

1998, Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems - PODS '98

All query languages proposed for semistructured data share as common characteristic the ability to traverse arbitrary long path in the data in the form of regular path expressions. The expressive power of these languages lies in between... more

descriptionView Paper arrow_downwardDownload

Incremental Maintenance for Materialized Views Over Semistructured Data

by Vasilis Vassalos

1998, Proceedings of the …

descriptionView Paper arrow_downwardDownload

Semistructured Data: The TSIMMIS Experience

by Joachim Hammer

1997, Proceedings of the First East- …

In this paper we discuss the management of semi-structured data, ie, data that has irregular or dynamically changing structure. We describe components of the Stanford TSIMMIS Project that help extract semi-structured data from Web pages,... more

descriptionView Paper arrow_downwardDownload

Fast detection of XML structural similarity

by Giuseppe Manco

2005, IEEE Transactions on …

Because of the widespread diffusion of semistructured data in XML format, much research effort is currently devoted to support the storage and retrieval of large collections of such documents. XML documents can be compared as to their... more

descriptionView Paper arrow_downwardDownload

ASTERIX: towards a scalable, semistructured data platform for evolving-world models

by Raman Grover

Distributed and Parallel …

descriptionView Paper arrow_downwardDownload

TopX: efficient and versatile top-k query processing for semistructured data

by D. Majumdar

2007, The VLDB Journal

Recent IR extensions to XML query languages such as Xpath 1.0 Full-Text or the NEXI query language of the INEX benchmark series reflect the emerging interest in IR-style ranked retrieval over semistructured data. TopX is a top-k retrieval... more

descriptionView Paper arrow_downwardDownload

Unauthorized inferences in semistructured databases

by Alexander Brodsky

2006, Information Sciences

descriptionView Paper arrow_downwardDownload

Exploring XML web collections with DescribeX

by Alejandro Vaisman

2010, ACM Transactions on the Web

As web applications mature and evolve, the nature of the semistructured data that drives these applications also changes. An important trend is the need for increased flexibility in the structure of web documents. Hence, applications... more

descriptionView Paper arrow_downwardDownload

Improving graph-walk-based similarity with reranking

by Einat Minkov

2010, ACM Transactions on Information Systems

Relational or semi-structured data is naturally represented by a graph, where nodes denote entities and directed typed edges represent the relations between them. Such graphs are heterogeneous in the sense that they describe different... more

descriptionView Paper arrow_downwardDownload

Querying RDF Descriptions for Community Web Portals

by Gregory Karvounarakis and

2001, Journées Bases de Données Avancées

Community Web Portals (e.g., digital libraries, vertical aggregators, infomediaries) have become quite popular nowadays in supporting specific communities of interest on corporate intranets or the Web. Portal Catalogs, organize and... more

descriptionView Paper arrow_downwardDownload

Annotation and navigation in semantic wikis

by Siegfried Handschuh

2006, SemWiki (ESWC)

descriptionView Paper arrow_downwardDownload

QURSED: Querying and Reporting Semistructured Data

by Vasilis Vassalos

2002, Proceedings of the 2002 …

descriptionView Paper arrow_downwardDownload

Capturing and querying multiple aspects of semistructured data

by Michael Böhlen

1999, PROCEEDINGS OF THE INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES

Motivated to a large extent by the substantial and growing prominence of the World-Wide Web and the potential benefits that may be obtained by applying database concepts and techniques to web data management, new data models and query... more

descriptionView Paper arrow_downwardDownload

Merging multimedia presentations and semistructured temporal data: a graph-based model and its application to clinical information

by Barbara Oliboni

2005, Artificial intelligence in medicine

descriptionView Paper arrow_downwardDownload

A Framework for Management of Semistructured Probabilistic Data

by Judy Goldsmith

2005, Journal of Intelligent Information Systems

This paper describes the theoretical framework and implementation of a database management system for storing and manipulating diverse probability distributions of discrete random variables with finite domains, and associated information.... more

descriptionView Paper arrow_downwardDownload

XML Queries and Algebra In the Enosys Integration Platform

by Vasilis Vassalos

2003, Data & Knowledge …

descriptionView Paper arrow_downwardDownload

Toward Semantic XML Clustering

by Sergio Greco

2006, Proceedings of the 2006 SIAM International Conference on Data Mining

The increasing availability of heterogeneous XML informative sources has raised a number of issues concerning how to represent and manage semistructured data. Although XML sources can exhibit proper structures and contents, differently... more

descriptionView Paper arrow_downwardDownload

Queries with incomplete answers over semistructured data

by Werner Nutt

1999

Semistructured data occur in situations where information lacks a homogeneous structure and is incomplete. Yet, up to now the incompleteness of information has not been re ected by special features of query languages for semistructured... more

descriptionView Paper arrow_downwardDownload

A Data Quality Methodology for Heterogeneous Data

by Federico Cabitza

2011, International Journal of …

We present a Heterogenous Data Quality Methodology (HDQM) for Data Quality (DQ) assessment and improvement that considers all types of data managed in an organization, namely structured data represented in databases, semistructured data... more

descriptionView Paper arrow_downwardDownload

On the power of tree-walking automata

by Frank Neven

2003, Information and Computation

Tree-walking automata (TWAs) recently received new attention in the fields of formal languages and databases. To achieve a better understanding of their expressiveness, we characterize them in terms of transitive closure logic formulas in... more

descriptionView Paper arrow_downwardDownload

Linking Semistructured Data on the Web

by Oktie Hassanzadeh

2011

Many Web data sources and APIs make their data available in XML, JSON, or a domain-specific semi-structured format, with the goal of making the data easily accessible and usable by Web application developers. Although such data formats... more

descriptionView Paper arrow_downwardDownload

Storing Semistructured Data in Relations

by alin deutsch

1999

Existing systems for managing and querying semistructured-data sources store the schema with the data. Lorel QRS + 95] and Tsimmis PGMW95] store their data as graphs. The schema is stored as attributes labeling the graph's edges. Strudel... more

descriptionView Paper arrow_downwardDownload

Path constraints in semistructured data

by Sophie Tison

2007, Theoretical Computer Science

We consider semistructured data as multi rooted edge-labeled directed graphs, and path inclusion constraints on these graphs. A path inclusion constraint p q is satisfied by a semistructured data if any node reached by the regular query p... more

descriptionView Paper arrow_downwardDownload

Semistructured probabilistic databases

by Judy Goldsmith

2001

This work describes a new theoretical framework for uniform storage and management of diverse probabilistic information.

descriptionView Paper arrow_downwardDownload

Designing Semistructured Databases: A Conceptual Approach

by Leonid Kalinichenko

2001, Lecture Notes in Computer Science

Semi-structured data has become prevalent with the growth of the Internet. The data is usually stored in a traditional database system or in a specialized repository. While many information providers have presented their databases on the... more

descriptionView Paper arrow_downwardDownload

Weighted path queries on semistructured databases

by Sergio Flesca

2006, Information and Computation/information and Control

Path queries have been extensively used to query semistructured data, such as the Web and XML documents. In this paper we introduce weighted path queries, an extension of path queries enabling several classes of optimization problems... more

descriptionView Paper arrow_downwardDownload

An adaptive path index for XML data using the query workload

by Chin-Wan Chung

2005, Information Systems

Due to its flexibility, XML is becoming the de facto standard for exchanging and querying documents over the Web. Many XML query languages such as XQuery and XPath use label paths to traverse the irregularly structured XML data. Without a... more

descriptionView Paper arrow_downwardDownload

An algebra for semantic interoperation of semistructured data

by Prasenjit Mitra and

1999

The diversity and availability of information sources on the World Wide Web has set the stage for integration and reuse at an unparalleled scale. There remain signi cant hurdles to exploiting the extent of the Web's resources in a... more

descriptionView Paper arrow_downwardDownload

AxPRE Summaries: Exploring the (Semi-)Structure of XML Web Collections

by Alejandro Vaisman

2008, 2008 IEEE 24th International Conference on Data Engineering

The nature of semistructured data in web collections is evolving. Increasingly, XML web documents (or documents exchanged via web services) are valid with regard to a schema, yet the actual structure of such documents exhibits significant... more

descriptionView Paper arrow_downwardDownload

Flexible Querying of Lifelong Learner Metadata

by Petra Selmer

2012, IEEE Transactions on Learning Technologies

We propose combining query approximation and query relaxation techniques in order to support flexible querying of heterogeneous data arising from lifelong learners' educational and work experiences. A key aim of such querying facilities... more

descriptionView Paper arrow_downwardDownload

Conceptual modeling for semistructured data

by Antonio Badia

descriptionView Paper arrow_downwardDownload

Rapper: a wrapper generator with linguistic knowledge

by David Mattox

1999

Database management systems are becoming available for semistructured data, however, these tools cannot be used on many real-world data sources (e.g., most web sites) in their native form. Often, wrappers are needed to extract information... more

descriptionView Paper arrow_downwardDownload

Expressiveness of a Spatial Logic for Trees

by Sophie Tison

2005, 20th Annual IEEE Symposium on Logic in Computer Science (LICS' 05)

In this paper we investigate the quantifier-free fragment of the TQL logic proposed by Cardelli and Ghelli. The TQL logic, inspired from the ambient logic, is the core of a query language for semistructured data represented as unranked... more

descriptionView Paper arrow_downwardDownload

Extensional Knowledge for Semantic Query Optimization in a Mediator Based System

by Domenico Beneventano

2001

Query processing in global information systems integrating multiple heterogeneous sources is a challenging issue in relation to the effective extraction of information available on-line. In this paper we propose intelligent,... more

descriptionView Paper arrow_downwardDownload

Integration of Weakly Heterogeneous Semistructured Data

by Karel Richta and

2009, Information Systems …

While most business applications typically operate on structured data that can be effectively managed using relational databases, some applications use more complex semistructured data that lacks a stable schema. XML techniques are... more

descriptionView Paper arrow_downwardDownload

Representing Time-dependent Information in Multidimensional XML

by THEODOROS MITAKOS

2001, Journal of Computing and Information Technology

Multidimensional XML (MXML) is an extension of XML that incorporates dimensions in order to represent in an elegant and concise way context-dependent data, that is, data which can exhibit di erent variations in value or structure (e.g.... more

descriptionView Paper arrow_downwardDownload

Modeling Semistructured Data by Using Graph-Based Constraints

by Barbara Oliboni

2003, On The Move to …

descriptionView Paper arrow_downwardDownload

Log In

Semistructured Data

Related Topics