Academia.eduAcademia.edu

Probabilistic Data Integration

2009

Abstract
sparkles

AI

Data integration is a critical issue in database management, especially in contexts like the Semantic Web. This paper introduces a probabilistic approach to data integration that not only aims to reduce the uncertainty involved in merging local data sources but also includes the uncertainty in the resulting integrated schema. The method presents a compact representation of uncertain mappings between data sources, allowing for automated integration while preserving varying degrees of confidence in the data correctness. Key contributions include the enhancement of automated data integration processes that align closely with the decision-making behavior of human users.

Key takeaways

  • Data is not uncertain, and mappings between schema objects are well known.
  • Each pair of schema objects is compared by all matchers, producing a probabilistic uncertain relationship for each matcher and pair.
  • After the uncertain schema integration process, when we query A we also want to be able to retrieve ext(A), i.e., the instances of the objects matching A.
  • In this experimental analysis of our approach we have shown that 1) managing uncertainty does not significantly affects the time complexity of our method, and 2) when we lose uncertain information, this may contain correct relationships, that would be lost during the schema integration process.
  • In this paper we have presented a complete method of schema integration with a new perspective on the management of uncertainty.