Hash Join Research Papers

Extended Adaptive Join Operator with Bind-Bloom Join for Federated SPARQL Queries

2024, International Journal of Data Warehousing and Mining

The goal of query optimization in query federation over linked data is to minimize the response time and the completion time. Communication time has the highest impact on them both. Static query optimization can end up with inefficient... more

descriptionView Paper arrow_downwardDownload

Adaptive Join Operator for Federated Queries over Linked Data Endpoints

by Damla Oguz

2024, Lecture Notes in Computer Science

Traditional static query optimization is not adequate for query federation over linked data endpoints due to unpredictable data arrival rates and missing statistics. In this paper, we propose an adaptive join operator for federated query... more

descriptionView Paper arrow_downwardDownload

Hash Join Optimization Based on Shared Cache Chip Multi-processor

by Yuefan Deng

2024, Lecture Notes in Computer Science

Chip Multi-Processor(CMP) allows multiple threads to execute simultaneously. Because threads share various resources of CMP, such as L2-Cache, CMP system is inherently different from multiprocessors system and, CMP is also different from... more

descriptionView Paper arrow_downwardDownload

Hash Join Query Optimization Based on Shared-Cache Chip Multi-Processor

by Yuefan Deng

2024, Journal of Software

Deng YD, Jing N, Xiong W. Hash join query optimization based on shared-cache chip multi-processor.

descriptionView Paper arrow_downwardDownload

Evaluation of Hash Join Operations Performance Executing on SDN Switches: A Cost Model Approach

by Luiz Albini

2024, Journal of Information and Data Management

Distributed database systems store and manipulate data on multiple machines. In these systems, the processing cost of query operations is mainly impacted by the data access latency between machines over the network. With recent technology... more

descriptionView Paper arrow_downwardDownload

Hash join algorithms on smps clusters: Effects of netcaches on its scalability and performance

by Edward David Moreno

2024

We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in many important... more

descriptionView Paper arrow_downwardDownload

The Case for Learned In-Memory Joins

by Ibrahim Sabek

2024, Proceedings of the VLDB Endowment

In-memory join is an essential operator in any database engine. It has been extensively investigated in the database literature. In this paper, we study whether exploiting the CDF-based learned models to boost the join performance is... more

descriptionView Paper arrow_downwardDownload

Optimizing Query Performance using Hash and Sort Merge Join

by Jignesh Vania

2024, International Journal for Scientific Research and Development

This paper introduced a method for producing Hash Merge and Sort Merge Join with extending performance. Hash merge join is non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, and bursty... more

descriptionView Paper arrow_downwardDownload

Optimizing Query Performance using Hash and Sort Merge Join

by Jignesh Vania

2024, International Journal for Scientific Research and Development

This paper introduced a method for producing Hash Merge and Sort Merge Join with extending performance. Hash merge join is non-blocking join algorithm that deals with data items from remote sources via unpredictable, slow, and bursty... more

descriptionView Paper arrow_downwardDownload

QbDJ: A Novel Framework for Handling Skew in Parallel Join Processing on Distributed Memory

by Georgios Theodoropoulos

2024, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing

The performance of parallel distributed data management systems becomes increasingly important with the rise of Big Data. Parallel joins have been widely studied both in the parallel processing and the database communities. Nevertheless,... more

descriptionView Paper arrow_downwardDownload

Hash Join Optimization Based on Shared Cache Chip Multi-processor

by Yuefan Deng

2024, Springer eBooks

Chip Multi-Processor(CMP) allows multiple threads to execute simultaneously. Because threads share various resources of CMP, such as L2-Cache, CMP system is inherently different from multiprocessors system and, CMP is also different from... more

descriptionView Paper arrow_downwardDownload

Hash Join Query Optimization Based on Shared-Cache Chip Multi-Processor

by Yuefan Deng

2024, Journal of Software

Deng YD, Jing N, Xiong W. Hash join query optimization based on shared-cache chip multi-processor.

descriptionView Paper arrow_downwardDownload

A Join Algorithm for Large Databases: A Quadtrees Structure Approach

by Hatim Aboalsamh

2024, Journal of King Saud University - Computer and Information Sciences

Enhancing the performance of large database systems depends heavily on the cost of performing join operations. When two very large tables are joined, optimizing such operation is considered one of the interesting research topics to many... more

descriptionView Paper arrow_downwardDownload

Double Index NEsted-Loop Reactive Join for Result Rate Optimization

by M. Bornea

2024, 2009 IEEE 25th International Conference on Data Engineering

Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join... more

descriptionView Paper arrow_downwardDownload

Adaptive Join Operators for Result Rate Optimization on Streaming Inputs

by M. Bornea

2024, IEEE Transactions on Knowledge and Data Engineering

Adaptive join algorithms have recently attracted a lot of attention in emerging applications where data is provided by autonomous data sources through heterogeneous network environments. Their main advantage over traditional join... more

descriptionView Paper arrow_downwardDownload

Tuning Database Pada Sistem Penerimaan Mahasiswa Baru Menggunakan Optimasi Query dan Indexing

by Ipal Akbar

2024, Techno.COM Jurnal

Dalam pengoperasian database MaraiaDB diperlukan aplikasi berupa server localhost yang memiliki response waktu untuk menjalankan sebuah query agar dapat mendapatkan waktu yang efisiensi. Pada penelitian ini mengukur perfoma query dalam... more

descriptionView Paper arrow_downwardDownload

Constraint Processing Techniques for Improving Join Computation: A Proof of Concept

by Berthe Y Choueiry

2024, Constraint Databases

Constraint Processing and Database techniques overlap significantly. We discuss here the application of a constraint satisfaction technique, called dynamic bundling, to databases. We model the join query computation as a Constraint... more

descriptionView Paper arrow_downwardDownload

FPGA-based Multithreading for In-Memory Hash Joins

by Walid Najjar

2023, Conference on Innovative Data Systems Research

Large relational databases often rely on fast join implementations for good performance. Recent paradigm shifts in processor architectures has reinvigorated research into how the join operation can be implemented. The FPGA community has... more

descriptionView Paper arrow_downwardDownload

Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

by Dimitrije Jankov

2023, Proceedings of the VLDB Endowment

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hashbased joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a... more

descriptionView Paper arrow_downwardDownload

Declarative recursive computation on an RDBMS: or, why you should use a database for distributed machine learning

by Dimitrije Jankov

2023, arXiv (Cornell University)

A number of popular systems, most notably Google's TensorFlow, have been implemented from the ground up to support machine learning tasks. We consider how to make a very small set of changes to a modern relational database management... more

descriptionView Paper arrow_downwardDownload

Distributed numerical and machine learning computations via two-phase execution of aggregated join trees

by Dimitrije Jankov

2023, Proceedings of the VLDB Endowment

When numerical and machine learning (ML) computations are expressed relationally, classical query execution strategies (hash-based joins and aggregations) can do a poor job distributing the computation. In this paper, we propose a... more

descriptionView Paper arrow_downwardDownload

Reducing cache misses in hash join probing phase by pre-sorting strategy (abstract only)

by Gihwan Oh

2023, Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data

descriptionView Paper arrow_downwardDownload

Optimasi Query Hash Join Dan Inner Join Pada Sistem Pencarian Data Tracer Study

by Ikhwan Baidlowi Sumafta

2023, Journal of Smart System

Optimasi query merupakana solusi dalam permasalahan kompleknya query yang kita buat guna menghasilkan data dengan kondisi tertentu, optimasi query memberikan sebuah model pemecahan masalah dengan menggabungkan teknik-teknik yang meliputi... more

descriptionView Paper arrow_downwardDownload

Relational Nested Optional Join for Efficient Semantic Web Query Processing

by Mustafa Enis Atay

2023, Lecture Notes in Computer Science

Increasing amount of RDF data on the Web drives the need for its efficient and effective management. In this light, numerous researchers have proposed to use RDBMSs to store and query RDF annotations using the SQL and SPARQL query... more

descriptionView Paper arrow_downwardDownload

Bucket Spreading Parallel Hash: a New, Robust, Parallel Hash Join Method for Data Skew In the Super Database Computer (SDC)

by Yasushi Ogawa

2023, … of the sixteenth international conference on …

The Super Database Computer (SDC) is a high-performance relational database server for a join-intensive environment under development at Univer-sity of Tokyo. SDC is designed to execute a join in a highly parallel way. Compared to other... more

descriptionView Paper arrow_downwardDownload

Hash join algorithms on smps clusters: Effects of netcaches on its scalability and performance

by Edward david

2023

We investigate the effect that caches, particularly caches for remote accesses, have on the performance of hash join algorithms. The join is a computationally intensive operation of relational databases and is used in many important... more

descriptionView Paper arrow_downwardDownload

Spatial hash-joins

by Chinya Ravishankar

2023, ACM SIGMOD Record

We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a... more

descriptionView Paper arrow_downwardDownload

Joins on encoded and partitioned data

by Lin Qiao

2023, Proceedings of the VLDB Endowment

Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU... more

descriptionView Paper arrow_downwardDownload

Join strategies using data space partitioning

by Cem Bozsahin

2023, New Generation Computing

In the recent investigations of reducing the relational join operation complexity several hash-based partitioned-join stategies have been introduced. All of these strategies depend upon the costly operation of data space partitioning... more

descriptionView Paper arrow_downwardDownload

Evaluating window joins over punctuated streams

by Luping Ding

2023, Proceedings of the thirteenth ACM international conference on Information and knowledge management

We explore join optimizations in the presence of both timebased constraints (sliding windows) and value-based constraints (punctuations). We present the first join solution named PWJoin that exploits such combined constraints to shrink... more

descriptionView Paper arrow_downwardDownload

Exploiting join cardinality for faster hash joins

by Ramon Lawrence

2023, Proceedings of the 2009 ACM symposium on Applied Computing

Hash joins combine massive relations in data warehouses, decision support systems, and scientific data stores. Faster hash join performance significantly improves query throughput, response time, and overall system performance. In this... more

descriptionView Paper arrow_downwardDownload

Improving join performance for skewed databases

by Ramon Lawrence

2023, 2008 Canadian Conference on Electrical and Computer Engineering

The largest queries in data warehouses and decision support systems use hybrid hash join to relate information in multiple tables. Hybrid hash join functions independently of the data distributions of the join relations. Real-world data... more

descriptionView Paper arrow_downwardDownload

Diag-Join: An opportunistic join algorithm for 1: N relationships

by Guido Moerkotte

2023, Proceedings of the International …

Time of creation is one of the predominant (often implicit) clustering strategies found not only in Data Warehouse systems: line items are created together with their corresponding order, objects are created together with their subparts... more

descriptionView Paper arrow_downwardDownload

MJoin

by Luping Ding

2023, Proceedings of the 2nd international workshop on Distributed event-based systems

Join algorithms must be redesigned when processing stream data instead of persistently stored data. Data streams are potentially infinite and the query result is expected to be generated incrementally instead of once only. Data arrival... more

descriptionView Paper arrow_downwardDownload

Accurate modeling of the hybrid hash join algorithm

by jignesh patel

2023, Proceedings of the 1994 ACM SIGMETRICS conference on Measurement and modeling of computer systems

The join of two relations is an important operation in database systems. It occurs frequently in relational queries, and join performance is a sigmficant factor in overall system performance. Cost modek for join algorithms are used by... more

descriptionView Paper arrow_downwardDownload

Partition based spatial-merge join

by jignesh patel

2023, Proceedings of the 1996 ACM SIGMOD international conference on Management of data - SIGMOD '96

This paper describes PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a... more

This paper describes PBSM (Partition Based Spatial-Merge), a new algorithm for performing spatial join operation. This algorithm is especially effective when neither of the inputs to the join have an index on the joining attribute. Such a situation could arise if both inputs to the join are intermediate results in a complex query, or in a parallel environment where the inputs must be dynamically redistributed. The PBSM algorithm partitions the inputs into manageable chunks, and joins them using a computational geometry based plane-sweeping technique. This paper also presents a performance study comparing the the traditional indexed nested loops join algorithm, a spatial join algorithm based on joining spatial indices, and the PBSM algorithm. These comparisons are based on complete implementations of these algorithms in Paradise, a database system for handling GIS applications. Using real data sets, the performance study examines the behavior of these spatial join algorithms in a variety of situations, including the cases when both, one, or none of the inputs to the join have an suitable index. The study also examines the effect of clustering the join inputs on the performance of these join algorithms. The performance comparisons demonstrates the feasibility, and applicability of the PBSM join algorithm. 1 Introduction With the increasing popularity of automated processes in fields like Earth Sciences, Cartography, Remote Sensing, Land Information Systems etc., and the rapid increase in the availability of data from a wide variety of sources like satellite images, mapping agencies, simulation outputs etc., the last decade has witnessed an increase in the demands for systems that can store, manage, and manipulate spatial data. Increasingly, a database system has been employed to meet these requirements. Examples of commercial database systems that have been used for these applications are ARC/INFO [Arc95], Intergraph's MGE [Cor95], and Illustra [Ube94]). Data stored in these spatial database systems includes simple geometric types like points, lines, polygons, and surfaces,

descriptionView Paper arrow_downwardDownload

Distributed stream join query processing with semijoins

by tri tran

2023, Distributed and Parallel Databases

This paper addresses the distributed stream processing of window-based multi-way join queries considering the semijoin as a key join operator. In distributed stream processing, data streams arriving at remote sites need to be shipped to... more

descriptionView Paper arrow_downwardDownload

Towards Eliminating Random 1/0 in Hash Joins

by ML Lo

2023

The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join... more

descriptionView Paper arrow_downwardDownload

Towards eliminating random I/O in hash joins

by ML Lo

2023, Proceedings of the Twelfth International Conference on Data Engineering

The widening performance gap between CPU and disk is significant for hash join performance. Most current hash join methods try t o reduce the volume of data transferred between memory and disk. In this paper, we try to reduce hash-join... more

descriptionView Paper arrow_downwardDownload

Spatial hash-joins

by ML Lo

2023, ACM SIGMOD Record

We examine how to apply the hash-join paradigm to spatial joins, and define a new framework for spatial hash-joins. Our spatial partition functions have two components: a set of bucket extents and an assignment function, which may map a... more

descriptionView Paper arrow_downwardDownload

Applying segmented right-deep trees to pipelining multiple hash joins

by ML Lo

2023, IEEE Transactions on Knowledge and Data Engineering

The pipelined execution of multijoin queries in a multiprocessor-based database system is explored in this paper. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is... more

descriptionView Paper arrow_downwardDownload

An efficient join for nested relational databases

by Hong-Cheu Liu

2023, Lecture Notes in Computer Science

The join operation is one of the most expensive and critical issues in nested relational query processing. Many natural queries cannot be expressed by extended join operators proposed for the nested relational model so far without... more

descriptionView Paper arrow_downwardDownload

Efficient Multiway Hash Join on Reconfigurable Hardware

by Rekha Singhal

2023

We propose the algorithms for performing multiway joins using a new type of coarse grain reconfigurable hardware accelerator~-- ``Plasticine''~-- that, compared with other accelerators, emphasizes high compute capability and high... more

descriptionView Paper arrow_downwardDownload

Efficient join processing over uncertain data

by RAHUL SHAH 2002

2023, Proceedings of the 15th ACM international conference on Information and knowledge management - CIKM '06

In many applications data values are inherently uncertain. This includes moving-objects, sensors and biological databases. There has been recent interest in the development of database management systems that can handle uncertain data.... more

descriptionView Paper arrow_downwardDownload

A Novel SSPS Framework for String Similarity Join

by Florence Tushabe

2023, International Journal of Computer Applications

As the enormous growth of information challenges the existing string analysis techniques for processing huge volume of data, there always seem to be a hope for newer inventions. Moreover, the problems encountered with the traditional... more

descriptionView Paper arrow_downwardDownload

by Sunil Prabhakar

2022, Information Systems

The efficient processing of multidimensional similarity joins is important for a large class of applications. The dimensionality of the data for these applications ranges from low to high. Most existing methods have focused on the... more

descriptionView Paper arrow_downwardDownload

Joins on encoded and partitioned data

by Oliver Draese

2022, Proceedings of the VLDB Endowment

Compression has historically been used to reduce the cost of storage, I/Os from that storage, and buffer pool utilization, at the expense of the CPU required to decompress data every time it is queried. However, significant additional CPU... more

descriptionView Paper arrow_downwardDownload

V-SMART-join

by Ahmed Metwally

2022, Proceedings of the VLDB Endowment

This work proposes V-SMART-Join , a scalable MapReduce-based framework for discovering all pairs of similar entities. The V-SMART-Join framework is applicable to sets, multisets, and vectors. V-SMART-Join is motivated by the observed skew... more

descriptionView Paper arrow_downwardDownload

Review on Effective Data Mining Technique use with Structured and Unstructured data of Big Data

by DNYANDEO KHEMNAR

2022, International Journal of Advance Research and Innovative Ideas in Education

With the Invention of Big data. Big Data is collection of large and complex data. It consist of structured, semi-structured and unstructured types of data. Data get generated from various sources and from different fields. In today era... more

descriptionView Paper arrow_downwardDownload

Processing top-k join queries

by Laure Berti-Equille

2022, Proceedings of the VLDB Endowment

We consider the problem of efficiently finding the top-k answers for join queries over web-accessible databases. Classical algorithms for finding top-k answers use branch-and-bound techniques to avoid computing scores of all candidates in... more

descriptionView Paper arrow_downwardDownload

Log In

Hash Join