The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient... more
Traditional static query optimization is not adequate for query federation over linked data endpoints due to unpredictable data arrival rates and missing statistics. In this paper, we propose an adaptive join operator for federated query... more
Chip Multi-Processor(CMP) allows multiple threads to execute simultaneously. Because threads share various resources of CMP, such as L2-Cache, CMP system is inherently different from multiprocessors system and, CMP is also different from... more
Deng YD, Jing N, Xiong W. Hash join query optimization based on shared-cache chip multi-processor.
Distributed database systems store and manipulate data on multiple machines. In these systems, the processing cost of query operations is mainly impacted by the data access latency between machines over the network. With recent technology... more
We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in many important... more
In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is... more
This paper introduced a method for producing Hash Merge and Sort Merge Join with extending performance. Hash merge join is non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, and bursty... more
This paper introduced a method for producing Hash Merge and Sort Merge Join with extending performance. Hash merge join is non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, and bursty... more
The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless,... more
Chip Multi-Processor(CMP) allows multiple threads to execute simultaneously. Because threads share various resources of CMP, such as L2-Cache, CMP system is inherently different from multiprocessors system and, CMP is also different from... more
Deng YD, Jing N, Xiong W. Hash join query optimization based on shared-cache chip multi-processor.
Enhancing the performance of large database systems depends heavily on the cost of performing join operations. When two very large tables are joined, optimizing such operation is considered one of the interesting research topics to many... more
Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join... more
Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join... more
Dalam pengoperasian database MaraiaDB diperlukan aplikasi berupa server localhost yang memiliki response waktu untuk menjalankan sebuah query agar dapat mendapatkan waktu yang efisiensi. Pada penelitian ini mengukur perfoma query dalam... more
Constraint Processing and Database techniques overlap significantly. We discuss here the application of a constraint satisfaction technique, called dynamic bundling, to databases. We model the join query computation as a Constraint... more
Large relational databases often rely on fast join implementations for good performance. Recent paradigm shifts in processor architectures has reinvigorated research into how the join operation can be implemented. The FPGA community has... more
When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hashbased joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a... more
A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management... more
When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a... more
Optimasi query merupakana solusi dalam permasalahan kompleknya query yang kita buat guna menghasilkan data dengan kondisi tertentu, optimasi query memberikan sebuah model pemecahan masalah dengan menggabungkan teknik-teknik yang meliputi... more
Increasing amount of RDF data on the Web drives the need for its efficient and effective management. In this light, numerous researchers have proposed to use RDBMSs to store and query RDF annotations using the SQL and SPARQL query... more
The Super Database Computer (SDC) is a high-performance relational database server for a join-intensive environment under development at Univer-sity of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other... more
We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in many important... more
We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a... more
Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU... more
In the recent investigations of reducing the relational join operation complexity several hash-based partitioned-join stategies have been introduced. All of these strategies depend upon the costly operation of data space partitioning... more
We explore join optimizations in the presence of both timebased constraints (sliding windows) and value-based constraints (punctuations). We present the first join solution named PWJoin that exploits such combined constraints to shrink... more
Hash joins combine massive relations in data warehouses, decision support systems, and scientific data stores. Faster hash join performance significantly improves query throughput, response time, and overall system performance. In this... more
The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data... more
Time of creation is one of the predominant (often implicit) clustering strategies found not only in Data Warehouse systems: line items are created together with their corresponding order, objects are created together with their subparts... more
The join of two relations is an important operation in database systems. It occurs frequently in relational queries, and join performance is a sigmficant factor in overall system performance. Cost modek for join algorithms are used by... more
This paper describes PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a... more
This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to... more
The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join... more
The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join... more
We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a... more
The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is... more
The join operation is one of the most expensive and critical issues in nested relational query processing. Many natural queries cannot be expressed by extended join operators proposed for the nested relational model so far without... more
We propose the algorithms for performing multiway joins using a new type of coarse grain reconfigurable hardware accelerator~-- ``Plasticine''~-- that, compared with other accelerators, emphasizes high compute capability and high... more
In many applications data values are inherently uncertain. This includes moving-objects, sensors and biological databases. There has been recent interest in the development of database management systems that can handle uncertain data.... more
As the enormous growth of information challenges the existing string analysis techniques for processing huge volume of data, there always seem to be a hope for newer inventions. Moreover, the problems encountered with the traditional... more
The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the... more
Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU... more
This work proposes V-SMART-Join , a scalable MapReduce-based framework for discovering all pairs of similar entities. The V-SMART-Join framework is applicable to sets, multisets, and vectors. V-SMART-Join is motivated by the observed skew... more
With the Invention of Big data. Big Data is collection of large and complex data. It consist of structured, semi-structured and unstructured types of data. Data get generated from various sources and from different fields. In today era... more
We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in... more