Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2015
…
6 pages
1 file
LogicBlox is a database product designed for enterprise software development, combining transactions and analytics. The underying data model is a relational database, and the query language, LogiQL, is an extension of Datalog [13]. As such, LogiQL features a simple and unified syntax for traditional relational manipulation as well as deeper analytics. Moreover, its declarative nature allows for substantial static analysis for optimizing evaluation schemes, parallelization, and incremental maintenance, and it allows for sophisticated transactional management [11]. In this paper, we describe various extensions of Datalog for supporting prescriptive and predictive analytics. These extensions come in the form of mathematical optimization (mixed integer programming), machine-learning capabilities, statistical relational models, and probabilistic programming. Some of these extensions are currently implemented in LogicQL, while others are in either development or planning phases.
Datalog is a deductive language tailored for easy database access. We introduce an algebraic modeling language in Datalog for mixed-integer linear optimization models. By using this language, data can be easily queried from a database by means of Datalog and combined with models to produce problem instances readily available to solvers, providing an advantage over conventional optimization modeling languages that rely on reading data via plug-in tools or importing data from external sources via standard files. The declarative nature of Datalog permits the underlying syntax to be kept intuitive and simple. In this paper, we present the algebraic and syntactical ideas behind our proposed platform. Two specific case studies, a production-transportation and traveling salesman model, are presented. A comparison analysis between a commercially available algebraic modeling system and Datalog's algebraic modeling embedded system is also conducted. Datalog's main advantages are clearly exposed when solving real-life decision making problems embedding flexible business rules.
Proceedings of the 2011 international conference on Management of data - SIGMOD '11, 2011
We are witnessing an exciting revival of interest in recursive Datalog queries in a variety of emerging application domains such as data integration, information extraction, networking, program analysis, security, and cloud computing. This tutorial briefly reviews the Datalog language and recursive query processing and optimization techniques, then discusses applications of Datalog in three application domains: data integration, declarative networking, and program analysis. Throughout the tutorial, we use LogicBlox, a commercial Datalog engine for enterprise software systems, to allow the audience to walk through code examples presented in the tutorial.
IEEE Transactions on Knowledge and Data Engineering, 1989
Datalog is a database query language based on the logic programming paradigm; it has been designed and intensively studied over the last five years. We present the syntax and semantics of Datalog and its use for querying a relational database. Then, we classify optimization methods for achieving efficient evaluations of Datalog queries, and present the most relevant methods. Finally, we discuss various exhancements of Datalog, currently under study, and indicate what is still needed in order to extend Datalog's applicability to the solution of real-life problems. The aim of this paper is to provide a survey of research performed on Datalog, also addressed to those members of the database community who are not too familiar with logic programming concepts.
Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, 2015
The LogicBlox system aims to reduce the complexity of software development for modern applications which enhance and automate decision-making and enable their users to evolve their capabilities via a "self-service" model. Our perspective in this area is informed by over twenty years of experience building dozens of missioncritical enterprise applications that are in use by hundreds of large enterprises across industries such as retail, telecommunications, banking, and government. We designed and built LogicBlox to be the system we wished we had when developing those applications. In this paper, we discuss the design considerations behind the LogicBlox system and give an overview of its implementation, highlighting innovative aspects. These include: LogiQL, a unified and declarative language based on Datalog; the use of purely functional data structures; novel join processing strategies; advanced incremental maintenance and live programming facilities; a novel concurrency control scheme; and built-in support for prescriptive and predictive analytics.
Lecture Notes in Computer Science, 2010
Optimization of modern businesses is becoming increasingly dependent on business intelligence and rule-based software to perform predictive analytics over massive data sets and enforce complex business rules. This has led to a resurgence of interest in datalog, because of its powerful capability for processing complex rules, especially those involving recursion, and the exploitation of novel data structures that provide performance advantages over relational database systems. ORM 2 is a conceptual approach for fact oriented modeling that provides a high level graphical and textual syntax to facilitate validation of data models and complex rules with nontechnical domain experts. Datalog LB is an extended form of typed datalog that exploits fact-oriented data structures to provide deep and highly performant support for complex rules with guaranteed decidability. This paper provides an overview of recent research and development efforts to extend the Natural ORM Architect (NORMA) software tool to map ORM models to Datalog LB. ness domain experts, and then automatically transforming these high level constructs into equivalent, lower level constructs (e.g. datalog code) for implementation. This paper provides an overview of our recent research and development efforts to support such a model-driven engineering approach to business analytics. For the conceptual level, we use second generation Object-Role Modeling (ORM 2) [10]. Unlike attribute-based approaches such as Entity-Relationship (ER) modeling [5] and class diagramming within the Unified Modeling Language (UML) [23], ORM is factoriented, where all facts, constraints, and derivation rules may be verbalized naturally in sentences easily understood and validated by nontechnical business users using concrete examples. ORM's graphical notation for data modeling is far more expressive than that of industrial ER diagrams or UML class diagrams, and its attribute-free nature makes it more stable and adaptable to changing business requirements. Brief introductions to ORM may be found in [12, 15], a detailed introduction in [16], a thorough treatment in [18], and a comparison with UML in [14]. An overview of factoriented modeling approaches, including ORM and others such as RIDL [22], NIAM [24], and PSM [20], as well as history and research directions, may be found in [13]. For the datalog platform, we use datalog LB , a vastly extended version of datalog developed by LogicBlox. Datalog LB is a typed datalog [24] that employs fact-oriented data structures with performance benefits similar to those of column stores when processing very complex rules over vast data sets. For detailed coverage of traditional datalog, see [1, 6, 9]. Datalog LB extends basic datalog with stratified negation, types, functions (including aggregate functions), transactions, modules (called "blocks"), constraints, default values, ordered predicates, metalevel support, and other features. Early tool support for ORM introduced two textual languages. Formal ORM Language (FORML) was supported as an output verbalization language in InfoModeler and the ORM solution within Microsoft Visio for Enterprise Architects. Conceptual Query Language (ConQuer) enabled ORM models to be queried, and was implemented in the InfoAssistant and ActiveQuery tools [3, 4]. However, the ConQuer language was used only for formulating non-recursive ORM queries, not constraints or derivation rules, and tool support for it is no longer available. Recently, ORM was extended to ORM 2, with tool support provided by Natural ORM Architect (NORMA) [8], including improved constraint verbalization in FORML 2 [17, 19] as well as further rule options such as semiderived types, deontic rules, and deep support for conceptual outer joins [18]. More recently, we developed a role calculus to formally capture derivation rules in ORM [7], and the VisualBlox team at LogicBlox has extended the NORMA tool to capture derivation rules and have also developed a VisualBlox tool to map ORM models to Datalog LB. Extensions to the NORMA tool allow ORM 2 derivation rules to be entered by clicking options in a Model Browser, and are then stored in a structure based on the role calculus, which offers a high level of semantic stability [7]. Compared to the Ac-tiveQuery tool for ConQuer, NORMA's derivation support covers a wider range of rules (including recursion), has much better rule verbalization, and generates datalog code for implementation instead of SQL. While it is planned to add SQL generation for derivation rules at a later stage, our current efforts are focused on completing the datalog generation. NORMA's derivation language is designed to be relationally complete, and at the time of writing, about 90% of its constructs have been automatically transformed into Datalog LB .
In proc. ICDT 2016
Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and intensional relations. The resulting semantics is robust under different chases and invariant to rewritings that preserve logical equivalence.
2015
Probabilistic programming languages are used for developing statistical models, and they typically consist of two components: a specification of a stochastic process (the prior), and a specification of observations that restrict the probability space to a conditional subspace (the posterior). Use cases of such formalisms include the development of algorithms in machine learning and artificial intelligence. We propose and investigate an extension of Datalog for specifying statistical models, and establish a declarative probabilistic-programming paradigm over databases. Our proposed extension provides convenient mechanisms to include common numerical probability functions; in particular, conclusions of rules may contain values drawn from such functions. The semantics of a program is a probability distribution over the possible outcomes of the input database with respect to the program. Observations are naturally incorporated by means of integrity constraints over the extensional and i...
2021
Today, advanced data analysts make use of both predictive models and optimization problem solving to build data-driven decision making applications, a combination of technologies recently termed Prescriptive Analytics (PA). Current PA applications typically have multiple layers of poorly integrated components: a relational DBMS for data storage/management, ML tools for prediction, and specialized software packages for problem modeling and optimization problem solving. This complex stack leads to inefficient, labor-intensive, and error-prone PA workflows, blocking wider adoption of PA. In this paper, we present SolveDB +an RDBMS for PA applications which supports all PA steps with modeling, predictive, and optimization functionalities, and integrates these in a common SQL-based framework. Major SolveDB + novelties are 1) a powerful SQL-based approach for PA problem specification and solving, 2) an extensible in-DBMS infrastructure for prediction and optimization solvers, and 3) in-DBMS modeling and management of PA models. SolveDB + significantly improves both PA developer productivity and performance.
Foundations and Trends® in Databases, 2012
In recent years, we have witnessed a revival of the use of recursive queries in a variety of emerging application domains such as data integration and exchange, information extraction, networking, and program analysis. A popular language used for expressing these queries is Datalog. This paper surveys for a general audience the Datalog language, recursive query processing, and optimization techniques. This survey differs from prior surveys written in the eighties and nineties in its comprehensiveness of topics, its coverage of recent developments and applications, and its emphasis on features and techniques beyond "classical" Datalog which are vital for practical applications. Specifically, the topics covered include the core Datalog language and various extensions, semantics, query optimizations, magic-sets optimizations, incremental view maintenance, aggregates, negation, and types. We conclude the paper with a survey of recent systems and applications that use Datalog and recursive queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Proceedings of the MLnet …, 1996
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 2010
Lecture Notes in Computer Science, 1993
Proceedings of the 30th ACM SIGPLAN International Conference on Compiler Construction, 2021
Proceedings of the Fourth International Workshop on Graph Data Management Experiences and Systems - GRADES '16, 2016
The Journal of Logic Programming, 1992
2011 14th International Conference on Network-Based Information Systems, 2011
Proceedings of the Second Workshop on Data Management for End-To-End Machine Learning, 2018
Lecture Notes in Computer Science, 2011
Theory and Practice of Logic Programming
IEEE Transactions on Knowledge and Data Engineering