Papers by Dominique Laurent
HAL (Le Centre pour la Communication Scientifique Directe), Jun 1, 2007

arXiv (Cornell University), Feb 13, 2023
Efficient consistency maintenance of incomplete and dynamic real-life databases is a quality labe... more Efficient consistency maintenance of incomplete and dynamic real-life databases is a quality label for further data analysis. In prior work, we tackled the generic problem of database updating in the presence of tuple generating constraints from a theoretical viewpoint. The current paper considers the usability of our approach by (a) introducing incremental update routines (instead of the previous from-scratch versions) and (b) removing the restriction that limits the contents of the database to fit in the main memory. In doing so, this paper offers new algorithms, proposes queries and data models inviting discussions on the representation of incompleteness on databases. We also propose implementations under a graph database model and the traditional relational database model. Our experiments show that computation times are similar globally but point to discrepancies in some steps.

arXiv (Cornell University), Jan 9, 2023
In this paper, we study consistent query answering in tables with nulls and functional dependenci... more In this paper, we study consistent query answering in tables with nulls and functional dependencies. Given such a table T , we consider the set T of all tuples that can be built up from constants appearing in T ; and we use set theoretic semantics for tuples and functional dependencies to characterize the tuples of T in two orthogonal ways: first as true or false tuples; and then as consistent or inconsistent tuples. Queries are issued against T and evaluated in T . In this setting, we consider a query Q: select X from T where Condition over T and define its consistent answer to be the set of tuples x in T such that: (a) x is a true and consistent tuple with schema X and (b) there exists a true super-tuple t of x in T satisfying the condition. We show that, depending on the 'status' that the super-tuple t has in T , there are different types of consistent answer to Q. The main contributions of the paper are: (a) a novel approach to consistent query answering not using table repairs; (b) polynomial algorithms for computing the sets of true/false tuples and the sets of consistent/inconsistent tuples of T ; (c) polynomial algorithms in the size of T for computing different types of consistent answer for both conjunctive and disjunctive queries; and (d) a detailed discussion of the differences between our approach and the approaches using table repairs.

Providing services by integrating information available in web resources is one of the main goals... more Providing services by integrating information available in web resources is one of the main goals of a mediation architecture. In this paper, we consider the standard wrapper-mediator architecture under the following hypothesis: (i) the information exchanged between wrappers and the mediator consists in XML documents, (ii) wrappers have limited resources, and (iii) to answer queries even if sources are not available, materialized XML views are stored at the mediator level. In this setting, we focus on the problem of maintaining materialized XML views, when the sources change. In our context, wrappers send the updated document without providing any information about the type and the localization of the update in the document. Then, the problems we address are, first, identifying the updates, and, second, updating the view in such a way that accesses to the sources are restricted. Our approach is based on the XAlgebra, which allows to consider XQuery requests on XML documents as relational tables. Moreover, our solution uses identifier annotations for XAlgebra and a diff function.

IGI Global eBooks, Jan 18, 2011
In the context of multidimensional data, OLAP tools are appropriate for the navigation in the dat... more In the context of multidimensional data, OLAP tools are appropriate for the navigation in the data, aiming at discovering pertinent and abstract knowledge. However, due to the size of the data set, a systematic and exhaustive exploration is not feasible. Therefore, the problem is to design automatic tools to ease the navigation in the data and their visualization. In this paper, we present a novel approach allowing to build automatically blocks of similar values in a given data cube that are meant to summarize the content of the cube. Our method is based on a levelwise algorithm (a la Apriori) whose complexity is shown to be polynomial in the number of scans of the data cube. The experiments reported in the paper show that our approach is scalable, in particular in the case where the measure values present in the data cube are discretized using crisp or fuzzy partitions.

Journal of Logic and Computation
In this paper, we investigate rule semantics for deductive databases in the context of four-value... more In this paper, we investigate rule semantics for deductive databases in the context of four-valued logic. In our approach, a database is a pair $\varDelta =(E,R)$, where $E$ is a set of pairs, each pair associating a ground fact with a truth value (thus, allowing to store true, false or inconsistent facts) and $R$ is a set of rules generalizing standard Datalog rules in the following sense: (i) the head of a rule can be a positive or a negative atom and (ii) the body can involve any among the connectors of four-valued logic. We define the database semantics as the least fixed point of a monotonic operator and we compare this semantics with that of k-existential programs defined by Fitting and paraconsistent extended logic programs defined by Arieli. Our main contribution is to show that, if we consider rules as implications (i.e. if we view the database as a set of formulas) then the semantics of the database is the unique minimal model of the set of database formulas. Here, minimal...

In this paper we address the problem of handling inconsistencies in tables with missing values (a... more In this paper we address the problem of handling inconsistencies in tables with missing values (also called nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. We propose a formal framework in which each tuple of a table is associated with a truth value among the following: true, false, inconsistent or unknown; and we show that our framework can be used to study important problems such as consistent query answering, table merging, and data quality measures - to mention just a few. In this paper, however, we focus mainly on consistent query answering, a problem that has received considerable attention during the last decad...

XML constraints are either schema constraints representing rules about document structure (e.g. a... more XML constraints are either schema constraints representing rules about document structure (e.g. a DTD, an XML Schema definition or a specification in Relax-NG), or integrity constraints, which are rules about the values contained in documents (e.g. primary keys, foreign keys, etc.). We address the problem of incrementally verifying these constraints when documents are modified by updates. The structure of an XML document is a tree, whose nodes are element (or attribute) names and whose leaves are associated to values contained in the document. Considered updates are insertion, deletion or replacement of any subtree in the XML tree. Schema constraints are represented by tree automata and tree grammars. Key and foreign key constraints are represented by attribute grammars, adding semantic rules to schema grammars, to carry key and foreign key values (to verify their properties). Our incremental validation tests both schema and integrity constraints while treating the sequence of updat...

ArXiv, 2021
In this paper we address the problem of handling inconsistencies in tables with missing values (a... more In this paper we address the problem of handling inconsistencies in tables with missing values (also called nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. Regarding missing values, we make no assumptions on their existence: a missing value exists only if it is inferred from the functional dependencies of the table. We propose a formal framework in which each tuple of a table is associated with a truth value among the following: true, false, inconsistent or unknown; and we show that our framework can be used to study important problems such as consistent query answering, table merging, and data quality measures to mention just a few. In this paper, however, we focus mainly on consistent query answering, a problem that has received considerable attention during the last decades...

Studies in Computational Intelligence, 2014
In Multi-Relational Data Mining (MRDM), data are represented in a relational form where the indiv... more In Multi-Relational Data Mining (MRDM), data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. Variable pre-processing (including discretization and feature selection) within this multiple table setting differs from the attribute-value case. Besides the target variable information, one should take into account the relational structure of the database. In this paper, we focus on numerical variables located in a non target table. We propose a criterion that evaluates a given discretization of such variables. The idea is to summarize for each individual the information contained in the secondary variable by a feature tuple (one feature per interval of the considered discretization). Each feature represents the number of values of the secondary variable ranging in the corresponding interval. These count features are jointly partitioned by means of data grid models in order to obtain the best separation of the class values. We describe a simple optimization algorithm to find the best equal frequency discretization with respect to the proposed criterion. Experiments on a real and artificial data sets reveal that the discretization approach helps one to discover relevant secondary variables.

Communications in Computer and Information Science, 2013
We present an information system supporting the diagnosis and treatment of rice diseases. The hea... more We present an information system supporting the diagnosis and treatment of rice diseases. The heart of the system is a software mediator allowing farmers to formulate distant queries regarding a disease and responding to farmer queries by recommending a treatment of the disease if one exists. The disease diagnosis and the recommended treatment are based on expert knowledge stored in the mediator database. The processing of farmer queries may involve direct access to the mediator by farmers, online communication between the mediator and the experts, or direct dialog between the farmer and an expert. Our information model is generic in the sense that it can be used to support the needs of farmers in a variety of similar environments, with minor changes. The system presented in this paper is currently under development as part of a Franco-Thai project and aims to assist farmers in the quick diagnosis of rice diseases and their treatment.

In this paper we address the problem of handling inconsistencies in tables with missing values (o... more In this paper we address the problem of handling inconsistencies in tables with missing values (or nulls) and functional dependencies. Although the traditional view is that table instances must respect all functional dependencies imposed on them, it is nevertheless relevant to develop theories about how to handle instances that violate some dependencies. The usual approach to alleviate the impact of inconsistent data on the answers to a query is to introduce the notion of repair: a repair is a minimally di?erent consistent instance and an answer is consistent if it is present in every repair. Our approach is fundamentally di?erent: we use set theoretic semantics for tuples and functional dependencies that allow us to associate each tuple with a truth value among the following: true, false, inconsistent or unknown. The users of the table can then query the set of true tuples as usual. Regarding missing values, we make no assumptions on their existence: a missing value exists only if ...
Communications in Computer and Information Science
In this paper, we introduce a novel approach for dealing with databases containing inconsistent i... more In this paper, we introduce a novel approach for dealing with databases containing inconsistent information. Considering four-valued logics in the context of OWA (Open World Assumption), a database Δ is a pair (E, R) where E is the extension and R the set of rules. In our formalism, the set E is a set of pairs of the form ϕ, v where ϕ is a fact and v is either t, or b, or f (meaning respectively true, inconsistent or false), given that unknown facts are not stored. Moreover the rules extend Datalog neg rules allowing their heads to be a negative atom. We then define the notion of model of such a database, we show how to compute one particular model called semantics, and we investigate properties of this model. We also show how our approach applies to data integration and we review examples from the literature.

Résumé. Le traitement des valeurs manquantes est une problématique importante dans le domaine des... more Résumé. Le traitement des valeurs manquantes est une problématique importante dans le domaine des entrepôts de données. Plusieurs solutions ont été proposées pour la prédiction de valeurs manquantes, présentant les caractéristiques suivantes : (i) la prédiction traite soit des valeurs continues soit des valeurs discrètes, et (ii) la prédiction est approximative (soit elle est associée à une probabilité soit elle concerne un ensemble de valeurs). Récemment, une méthode de prédiction permettant de traiter indépendamment les cas continu et discret a été proposée, en se basant sur les règles d’association. Cette méthode permet de prédire, avec une confiance toujours égale à 1, soit un ensemble de valeurs dans le cas discret, soit un intervalle de valeurs dans le cas continu. Dans cet article, nous reprenons cette approche basée sur l’extraction de règles d’association et nous montrons comment générer des règles de prédictions portant sur une unique valeur et dont la confiance est toujou...
Advances in Databases and Information Systems
We address the issue of database updating in the presence of constraints, using the first order l... more We address the issue of database updating in the presence of constraints, using the first order logic formalism. Based on a chasing technique, we propose a deterministic update strategy. While generalizing key-foreign key constraints, our approach satisfies the consistency and minimal change requirements, with a polynomial time complexity. Our custom version of the chase allows improvement in updating non-null RDF/S databases with constraints.
The invention relates to on-off valve, particularly for insulation by-pass of exhaust gas expansi... more The invention relates to on-off valve, particularly for insulation by-pass of exhaust gas expansion turbines of catalytic cracking plants, of the type comprising a hollow valve body, an opening for the inlet of the gas into the valve, an opening for the outlet of the gas from the valve, at least a shutter and the relevant at least one seat, the valve being characterized in that it comprises a conduit coupled to the inlet opening and a conduit coupled to the outlet opening, the at least one seat being coupled to the inlet conduit or to the outlet conduit and the at least one shutter having a cover shape and sealing engaging with the seat by a rotatory motion transmitted to the shutter by an actuator by a drive shaft and a drive arm.
In this paper, we introduce a novel approach for dealing with databases containing inconsistent i... more In this paper, we introduce a novel approach for dealing with databases containing inconsistent information. Considering four-valued logics in the context of OWA (Open World Assumption), a database \(\varDelta \) is a pair (E, R) where E is the extension and R the set of rules. In our formalism, the set E is a set of pairs of the form \(\langle \varphi , \mathtt{v}\rangle \) where \(\varphi \) is a fact and v is either t, or b, or f (meaning respectively true, inconsistent or false), given that unknown facts are not stored. Moreover the rules extend Datalog\(^{neg}\) rules allowing their heads to be a negative atom.

In this paper, we introduce a novel approach to deductive databases meant to take into account th... more In this paper, we introduce a novel approach to deductive databases meant to take into account the needs of current applications in the area of data integration. To this end, we extend the formalism of standard deductive databases to the context of Four-valued logic so as to account for unknown, inconsistent, true or false information under the open world assumption. In our approach, a database is a pair (E,R) where E is the extension and R the set of rules. The extension is a set of pairs of the form 〈φ, v〉 where φ is a fact and v is a value that can be true, inconsistent or false but not unknown (that is, unknown facts are not stored in the database). The rules follow the form of standard Datalog rules but, contrary to standard rules, their head may be a negative atom. Our main contributions are as follows: (i) we give an expression of first-degree entailment in terms of other connectors and exhibit a functionally complete set of basic connectors not involving first-degree entailm...

Social networks gained more attention over the last years, due to their importance in the users’ ... more Social networks gained more attention over the last years, due to their importance in the users’ modern life. Moreover, social networks have become a great source of information, and several applications have been proposed to extract information from them such as: recommender systems. In this paper we present two recommendation algorithms called: Semantic Social Breadth First Search SSBFS and Semantic Social Depth First Search SSDFS. These algorithms are designed to recommend items to users who are connected via a social network. Our algorithms are based on three main features: a social network analysis measure, graph searching algorithms, and semantic relevance measure. We apply these algorithms to a real dataset (Amazon dataset) and we compare them with item-based collaborative filtering and hybrid recommendation algorithms. Our results show good precision as well as good performance in terms of runtime. Furthermore, our results show that SSBFS and SSDFS search a small part of the...
Uploads
Papers by Dominique Laurent