Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2007
…
12 pages
1 file
Interoperability plays an important role for a variety of applications. One of them are Peer Data Management Systems, where autonomous data sources (peers) interact with each other based on semantic mappings between their schemas. The building blocks that enable interoperability and thus the main challenges in such systems are mapping representation, query rewriting, and efficient query processing. While most approaches regard these aspects in separate this paper presents a comprehensive study of the interactions between these blocks. Our considerations try to provide a holistic view on semantic interoperability in distributed environments such as PDMS. We discuss techniques for distributed query processing and rewriting that consider high-level query operators such as top-N and skyline. Furthermore, we discuss how to increase efficiency by applying routing indexes and relaxation of result completeness/correctness.
2007
Peer data management systems (PDMS) are a highly dynamic, decentralized infrastructure for large-scale data integration. They consist of a dynamic set of autonomous peers interconnected with a network of schema mappings. Queries submitted at a peer are answered with local data and by data that is reached along paths of mappings. Due to redundancies in the mapping network, query answering in PDMS can be very inefficient if the complete query result is to be computed. System P, a fully functional PDMS, compromises the completeness of the query result and reduces cost by pruning the query plan at mappings that are estimated to yield only few result tuples. The demo illustrates the following main components of System P: (1) adaptive estimation of result cardinalities of intermediate queries using histograms, (2) completeness-driven query planning under limited resources using specialized heuristics, and (3) the automatic generation of heterogeneous PDMS test instances, controlled by a rich set of parameters.
Information Systems, 2008
Peer Data Management Systems (Pdms) are a novel, useful, but challenging paradigm for distributed data management and query processing. Conventional integrated information systems have a hierarchical structure with an integration component that manages a global schema and distributes queries against this schema to the underlying data sources. Pdms are a natural extension to this architecture by allowing each participating system (peer) to act both as a data source and as an integrator. Peers are interconnected by schema mappings, which guide the rewriting of queries between the heterogeneous schemas, and thus form a P2P (peer-to-peer)-like network.Despite several years of research, the development of efficient Pdms still holds many challenges. In this article we first survey the state of the art on peer data management: We classify Pdms by characteristics concerning their system model, their semantics, their query planning schemes, and their maintenance. Then we systematically examine open research directions in each of those areas. In particular, we observe that research results from both the domain of P2P systems and of conventional distributed data management can have an impact on the development of Pdms.
2004
Recently, the issue of integration and cooperation between information nodes in a networked environment has been studied in different contexts, as data integration [9], the Semantic Web [7], Peer-to-Peer [1, 6], Grid, and service oriented computing [13, 8]. Put in an abstract way, these systems are characterized by an architecture constituted by various autonomous nodes (called sites, sources, agents, or, as we do here, peers) which hold information, and which are linked to other nodes by means of mappings.
2010
Peer data sharing systems use either schema-level or data-level mappings to resolve schema as well as data heterogeneity among data sources (peers). Schema-level mappings create structural relationships among different schemas. On the other hand, data-level mappings associate data values in two different sources. These two kinds of mappings are complementary to each other. However, existing peer database systems have been based solely on either one of these mappings. We believe that if both mappings are addressed simultaneously in a single framework, the resulting approach will enhance data sharing in a way such that we can overcome the limitations of the non-combined approaches. In this paper, we present a model of a peer database management system which allows a bi-level mapping that combines schema-level and data-level mappings into a single relational framework. We present the syntax and semantics of this new kind of mappings. Furthermore, we present an algorithm for query translation that uses the bi-level mappings. Our algorithm relies on tableau for expressing both queries and mappings.
Thirteenth International Database Engineering & Applications Symposium, IDEAS 09, 2009
In Peer Data Management system (PDMS), two fundamental problems for data fusion arise: (a) how to build a semantic reconciliation between data sources schemas managed by peers, and (b) how to locate relevant peers for a given query. Our proposal lies in the application of multi-data source fusion approach [15] in the PDMS context. Multi-data source schemas, which are distributed shared and maintained by peers, are the basis of a semantic overlay network. The design for Peer Multi-Data source Management System (PMDMS) was presented in [16] is an extension of MDS-Manager system [14][15] where data sources are distributed among peers. In this paper, we focus on the MatchMaker component in PMDMS that has the semantic reconciliation (i.e. mapping) responsibility between concrete data sources schemas (i.e. schemas describing data to share with other peers) also known as expertise. Our approach of semantic reconciliation is based on ontologies and XML technologies. Indeed, the peer schema (i.e. an ontology expressed with OWL/RDF), is annotated with a set of synonymous in order to guide later the search of semantics equivalences between expertises. The mappings results are stored in an XML document called Conflicts data source (a part of the multi-data source) as semantic links between concepts such as equivalence, synonym, homonym or disjoint concepts.
Proceedings 19th International Conference on Data Engineering (Cat. No.03CH37405), 2003
Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backwards compatibility. As a result, many small-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.
2005
Peer data management systems (PDMS) are the natural extension of integrated information systems. Conventionally, a single integrating system manages an integrated schema, distributes queries to appropriate sources, and integrates incoming data to a common result. In contrast, a PDMS consists of a set of peers, each of which can play the role of an integrating component. A peer knows about its neighboring peers by mappings, which help to translate queries and transform data. Queries submitted to one peer are answered by data residing at that peer and by data that is reached along paths of mappings through the network of peers. The only restriction for PDMS to cover unbounded data is the need to formulate at least one mapping from some known peer to a new data source. We propose a Semantic Web based method that overcomes this restriction, albeit at a price. As sources are dynamically and automatically included in a PDMS, three factors diminish quality: The new source itself might stor...
6th Workshop of Thesis and …, 2007
Admission Year in the Ph.D. Degree Program: 2005 Conclusion Expected by: February 2009 Concluded Stages: Credits in disciplines; raising of the state of the art on data management in P2P systems; delimitation of thesis research scope; definition of a Peer Data Management System (PDMS) architecture; writing and submission of scientific papers about the proposed PDMS; writing and presentation of qualifying exam and thesis proposal.
ACM SIGMOD Record, 2003
A major problem in today's information-driven world is that sharing heterogeneous, semantically rich data is incredibly difficult. Piazza is a peer data management system that enables sharing heterogeneous data in a distributed and scalable way. Piazza assumes the participants to be interested in sharing data, and willing to define pairwise mappings between their schemas. Then, users formulate queries over their preferred schema, and a query answering system expands recursively any mappings relevant to the query, retrieving data from other peers. In this paper, we provide a brief overview of the Piazza project including our work on developing mapping languages and query reformulation algorithms, assisting the users in defining mappings, indexing, and enforcing access control over shared data.
The VLDB Journal, 2005
Intuitively, data management and data integration tools should be well suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a common and comprehensive schema design before they can be used to store or share information, and they are difficult to extend because schema evolution is heavyweight and may break backward compatibility. As a result, many large-scale data sharing tasks are more easily facilitated by non-database-oriented tools that have little support for semantics.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the 2008 EDBT workshop on Database technologies for handling XML information on the web - DataX '08, 2008
Proceedings of the ACM 13th international workshop on Data warehousing and OLAP - DOLAP '10, 2010
IEEE Transactions on Knowledge and Data Engineering, 2004
Proceedings of the 2008 international symposium on Database engineering & applications - IDEAS '08, 2008
IEEE Transactions on knowledge and Data …, 1998
Ontologies for Agents: Theory and Experiences, 2005
Information Systems, 2012
Data Science Journal, 2009
Web Semantics: Science …, 2010
Journal of Web Semantics, 2012
Lecture Notes in Computer Science, 2004
Lecture Notes in Computer Science, 2004