Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
SPARQL is a standard query language to facilitate access to structured data in RDF (Resource Description Framework) format. Researchers have introduced several storage layouts to efficiently store RDF data and algorithms to retrieve information from the RDF data repositories. Although those storage layouts are specialized in the repository, they are mainly originated from the conventional join algorithms for SQL (Structured Query Language) such as merge join, hash join, and Nested-loop join, which are not efficient in operation and memory usage. Thus, some query optimization techniques have been developed to reduce the cost, but they do not show cost-effective results for sequential join process. To overcome this issue, we propose a set of specialized algorithms to execute cycle-free SPARQL and storage layout. Our experimental results tested on LUBM show that the proposed approach is considerably efficient for cycle-free queries.
Resource description framework, RDF, is a standard language model for representing semantic data. As the concept of Semantic Web becomes more viable, the ability to retrieve and exchange semantic data will become increasingly more important. Efficient management of RDF data is one of the key research issues in Semantic Web; consequently, many RDF management systems have been proposed with data storage architectures and query processing algorithms for data retrieval. However, most of the proposed approaches require many join operations that result in the unnecessary processing of intermediate results for SPARQL queries. The additional processing becomes substantial as the RDF data volume is increased. In this paper, we propose an efficient structural index and a query optimizer to process queries without join operations. Empirical experimental results show that our proposed system outperforms conventional query processing approaches, such as Jena, up to 79% in terms of query processing time by reducing the volume of unnecessary intermediate results.
cgi.di.uoa.gr
We study the problem of SPARQL query optimization on top of distributed hash tables. Existing works on SPARQL query processing in such environments have never been implemented in a real system, or do not utilize any optimization techniques and thus exhibit poor performance. Our goal in this paper is to propose efficient and scalable algorithms for optimizing SPARQL basic graph pattern queries. We augment a known distributed query processing algorithm with query optimization strategies that improve performance in terms of query response time and bandwidth usage. We implement our techniques in the system Atlas and study their performance experimentally in a local cluster.
2012
Abstract Query optimization in RDF Stores is a challenging problem as SPARQL queries typically contain many more joins than equivalent relational plans, and hence lead to a large join order search space. In such cases, cost-based query optimization often is not possible. One practical reason for this is that statistics typically are missing in web scale setting such as the Linked Open Datasets (LOD).
Advances in Data and Web …, 2007
2007
Relational technology has shown to be very useful for scalable Semantic Web data management. Numerous researchers have proposed to use RDBMSs to store and query voluminous RDF data using SQL and RDF query languages. In this article, we study how RDF queries with the socalled well-designed graph patterns and nested optional patterns can be efficiently evaluated in an RDBMS. We propose to extend relational databases with a novel relational operator, nested optional join (NOJ), that is more efficient than left outer join in processing nested optional patterns of well-designed graph patterns. We design three efficient algorithms to implement the new operator in relational databases: (1) nested-loops NOJ algorithm (NL-NOJ); (2) sortmerge NOJ algorithm (SM-NOJ); and (3) simple hash NOJ algorithm (SH-NOJ). Based on a real-life RDF dataset, we demonstrate the efficiency of our algorithms by comparing them with the corresponding left outer join implementations and explore the effect of join selectivity on the performance of our algorithms.
Proceedings of the 2013 international conference on Management of data - SIGMOD '13, 2013
Efficient storage and querying of RDF data is of increasing importance, due to the increased popularity and widespread acceptance of RDF on the web and in the enterprise. In this paper, we describe a novel storage and query mechanism for RDF which works on top of existing relational representations. Reliance on relational representations of RDF means that one can take advantage of 35+ years of research on efficient storage and querying, industrial-strength transaction support, locking, security, etc. However, there are significant challenges in storing RDF in relational, which include data sparsity and schema variability. We describe novel mechanisms to shred RDF into relational, and novel query translation techniques to maximize the advantages of this shredded representation. We show that these mechanisms result in consistently good performance across multiple RDF benchmarks, even when compared with current state-of-the-art stores. This work provides the basis for RDF support in DB2 v.10.1.
International journal of engineering research and technology, 2015
In this paper we give an algorithm for querying RDF data using SQL without conversion of RDF instances. This algorithm translates an SQL query into an equivalent SPARQL query that is to be directly executed on the RDF data and allows it for SQL users to efficiently and easily query the RDF data. The SQL queries are formulated based on the converted relational database schema that the algorithm builds from the RDF one. In this algorithm not only simple SQL queries are considered but also complex ones such as those with UNION, INTERSECT or EXCEPT expressions. Keywords— RDB, RDF, SQL, SPARQL, Query translation
ACM SIGMOD Record, 2010
The Resource Description Framework (RDF) is a flexible model for representing information about resources in the web. With the increasing amount of RDF data which is becoming available, efficient and scalable management of RDF data has become a fundamental challenge to achieve the SemanticWeb vision. The RDF model has attracted the attention of the database community and many researchers have proposed different solutions to store and query RDF data efficiently. This survey focuses on using relational query processors to store and query RDF data. We provide an overview of the different approaches and classify them according to their storage and query evaluation strategies.
Proceedings of the 12th International Conference on Web Information Systems and Technologies, 2016
In this paper, we extend the SPARQL triple patterns to include two operators (the negation and the wild-card). We define the syntax and the semantics of these operators, in particular, when using them in the predicate position of SPARQL triple patterns. The use of the negation and wild-card operators and thus the semantics are different from the literature. Then, we show that these two operators could be used to enhance the evaluation performance of some SPARQL queries and to add extra expressiveness.
2013 IEEE International Conference on Big Data, 2013
The proliferation of data in RDF format calls for efficient and scalable solutions for their management. While scalability in the era of big data is a hard requirement, modern systems fail to adapt based on the complexity of the query. Current approaches do not scale well when faced with substantially complex, non-selective joins, resulting in exponential growth of execution times. In this work we present H2RDF+, an RDF store that efficiently performs distributed Merge and Sort-Merge joins over a multiple index scheme. H2RDF+ is highly scalable, utilizing distributed MapReduce processing and HBase indexes. Utilizing aggressive byte-level compression and result grouping over fast scans, it can process both complex and selective join queries in a highly efficient manner. Furthermore, it adaptively chooses for either single-or multi-machine execution based on join complexity estimated through index statistics. Our extensive evaluation demonstrates that H2RDF+ efficiently answers nonselective joins an order of magnitude faster than both current state-of-the-art distributed and centralized stores, while being only tenths of a second slower in simple queries, scaling linearly to the amount of available resources.
Lecture Notes in Computer Science, 2010
In SPARQL, conjunctive queries are expressed by using shared variables across sets of triple patterns, also called basic graph patterns. Based on this characterization, basic graph patterns in a SPARQL query can be partitioned into groups of acyclic patterns that share exactly one variable, or star-shaped groups. We observe that the number of triples in a group is proportional to the number of individuals that play the role of the subject or the object; however, depending on the degree of participation of the subject individuals in the properties, a group could be not much larger than a class or type to which the subject or object belongs. Thus, it may be significantly more efficient to independently evaluate each of the groups, and then merge the resulting sets, than linearly joining all triples in a basic graph pattern. Based on this observation, we have developed query optimization and evaluation techniques on star-shaped groups. We have conducted an empirical analysis on the benefits of the optimization and evaluation techniques in several SPARQL query engines. We observe that our proposed techniques are able to speed up query evaluation time for join queries with star-shaped patterns by at least one order of magnitude. 1 We quote "class" to indicate that we are not talking about classes in the sense of OWL or RDFS, but rather groups of individuals characterized by common properties.
… Workshop on Usage …, 2011
2014
Efficient management of RDF data plays an important role in successfully understanding and fast querying data. Although the current approaches of indexing in RDF Triples such as property tables and vertically partitioned solved many issues; however, they still suffer from the performance in the complex self-join queries and insert data in the same table. As an improvement in this paper, we propose an alternative solution to facilitate flexibility and efficiency in that queries and try to reach to the optimal solution to decrease the self-joins as much as possible, this solution based on the idea of "Recursive Mapping of Twin Tables". Our main goal of Recursive Mapping of Twin Tables (RMTT) approach is divided the main RDF Triple into two tables which have the same structure of RDF Triple and insert the RDF data recursively. Our experimental results compared the performance of join queries in vertically partitioned approach and the RMTT approach using very large RDF data, like DBLP and DBpedia datasets. Our experimental results with a number of complex submitted queries shows that our approach is highly scalable compared with RDF-3X approach and RMTT reduces the number of self-joins especially in complex queries 3-4 times than RDF-3X approach
International Journal of Data Warehousing and Mining
The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient execution plans due to unpredictable data arrival rates and missing statistics. This study is an extension of adaptive join operator which always begins with symmetric hash join to minimize the response time, and can change the join method to bind join to minimize the completion time. The authors extend adaptive join operator with bind-bloom join to further reduce the communication time and, consequently, to minimize the completion time. They compare the new operator with symmetric hash join, bind join, bind-bloom join, and adaptive join operator with respect to the response time and the completion time. Performance evaluation shows that the extended operator provides optimal response time and further reduces the completion time. Moreover, it has the ...
Lecture Notes in Computer Science, 2015
This paper addresses the problem of failing RDF queries. Query relaxation is one of the cooperative techniques that allows providing users with alternative answers instead of an empty result. While previous works on query relaxation over RDF data have focused on defining new relaxation operators, we investigate in this paper techniques to find the parts of an RDF query that are responsible of its failure. Finding such subqueries, named Minimal Failing Subqueries (MFSs), is of great interest to efficiently perform the relaxation process. We propose two algorithmic approaches for computing MFSs. The first approach (LBA) intelligently leverages the subquery lattice of the initial RDF query while the second approach (MBA) is based on a particular matrix that improves the performance of LBA. Our approaches also compute a particular kind of relaxed RDF queries, called Maximal Succeeding Subqueries (XSSs). XSSs are subqueries with a maximal number of triple patterns of the initial query. To validate our approaches, a set of thorough experiments is conducted on the LUBM benchmark and a comparative study with other approaches is done.
Proceedings of the Workshop on Advancing …, 2008
In the last years, the query language SPARQL has evolved into the widely accepted standard for querying RDF. Since many Semantic Web applications make use of data whose storage and management is distributed, distributed SPARQL query processing becomes necessary. In the relation and object-oriented database community the efficiency gain by cost-based, adaptive optimizers for distributed querying is proven, though such optimizers are not available for SPARQL. Thus we describe in this paper a cost model which is meant to act as a sub component of a query optimizer for distributed SPARQL query processing to serve as a cost indicator for other subcomponents of the optimizer, e.g. query decomposition, query rewriting and choosing join algorithms and their order. The cost model is tailored for a heterogeneous grid of SPARQL processors and represents query plans as SPARQL Query Graph Models (SQGM). Costs are assigned in an System-R-like fashion relying on recursive cost and cardinality functions. Therefore evaluation complexities of basic operations in SPARQL queries are derived from the complexities of best practice algorithms for the algebraically equivalent basic operations in relational query languages.
International Journal of Engineering & Technology
Background/Objectives: The mapping RDB to RDF has become important to populate Linked Data more efficiently. This paper shows how to implement SPARQL endpoint in RDB using a conceptual level mapping approach.Methods/Statistical analysis: Many diverse approaches and related languages for mapping RDB to RDF have been proposed. The prominent achievements of mapping RDB to RDF are two standard draft Direct Mapping and R2RML proposed by W3C RDB2RDF Working Group. This paper analyzes these conventional mapping approaches and proposes a new approach based on schema mapping. The paper also presents SPARQL query processing in RDB.Findings: There are distinct differences between instance level mapping and conceptual level mapping for RDB2RDF. Data redundancy of instance level mapping causes many inevitable problems during mapping procedure. The conceptual level mapping can provide straightforward and efficient way. The ER model in RDB and RDF model in Linked Data have obvious similarity. The ...
Relational Database to RDF (RDB2RDF) systems executes SPARQL queries on the relational data. Past studies have shown that RDB2RDF systems do not perform well, in other words, the execution time of a SPARQL query on a RDB2RDF system compared to its semantically equivalent SQL query is much slower. Therefore, we ask ourselves, what optimizations are needed in order to support effective SPARQL execution on relationally stored data? We experimented on Microsoft SQL Server, using the Barton and Berlin SPARQL Benchmark, and Ultrawrap, an automatic RDB2RDF wrapping system that has been architectured to leverage the SQL optimizer. Our initial results identify two important optimizations for effective SPARQL execution using Ultrawrap: detection of unsatisfiable conditions and self-join elimination.
Computing Research Repository, 2008
We study fundamental aspects related to the efficient processing of the SPARQL query language for RDF, proposed by the W3C to encode machine-readable information in the Semantic Web. Our key contributions are (i) a complete complexity analysis for all operator fragments of the SPARQL query language, which -as a central result -shows that the SPARQL operator OPTIONAL alone is responsible for the PSPACE-completeness of the evaluation problem, (ii) a study of equivalences over SPARQL algebra, including both rewriting rules like filter and projection pushing that are wellknown from relational algebra optimization as well as SPARQLspecific rewriting schemes, and (iii) an approach to the semantic optimization of SPARQL queries, built on top of the classical chase algorithm. While studied in the context of a theoretically motivated set semantics, almost all results carry over to the official, bag-based semantics and therefore are of immediate practical relevance.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.