Academia.eduAcademia.edu

The most probable database problem

2014

Abstract

This paper proposes a novel inference task for probabilistic databases: the most probable database (MPD) problem. The MPD is the most probable deterministic database where a given query or constraint is true. We highlight two distinctive applications, in database repair of key and dependency constraints, and in finding most probable explanations in statistical relational learning. The MPD problem raises new theoretical questions, such as the possibility of a dichotomy theorem for MPD, classifying queries as being either PTIME or NP-hard. We show that such a dichotomy would diverge from dichotomies for other inference tasks. We then prove a dichotomy for queries that represent unary functional dependency constraints. Finally, we discuss symmetric probabilities and the opportunities for lifted inference.

Key takeaways

  • Moreover, we show that data repair and cleaning problems [5,16] on probabilistic databases are natural instances of the general MPD task.
  • Asymmetric MPD This is the most general setting, where the probabilistic database can contain arbitrary tuple probabilities.
  • We now highlight two applications of MPD, one in probabilistic databases, and one in statistical relational learning.
  • Given a probabilistic database that expresses the confidence we have in the correctness of each tuple, we can compute the most probable database that adheres to the data constraints Q. MPD thus provides a principled probabilistic framework for repair.
  • We will show that computing the MPD for query Q match is tractable with any probabilistic database.