Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2019
…
3 pages
1 file
Hardware-based acceleration technologies provide significant potential to enhance database processing speeds using various devices like GPUs, FPGAs, and TPUs. An exploration of optimizing database operations such as query planning and execution on these accelerators led to the proposal of a public repository for implementations, benchmarking differences, and outlining future research directions. Challenges in achieving efficient implementations across evolving hardware and the need for standardized benchmarking were key discussion points.
2013
General purpose computing platforms have generally been favored over customized computational setups, due to the simplified usability and significant reduction of development time. These general purpose machines make use of the Von-Neumann architectural model which suffers from the sequential aspect of computing and heavy reliance on memory offloading. This dissertation proposes the use of hardware accelerators such as Field Programmable Gate Arrays (FPGAs) and Graphics Processing Units (GPUs) as a substitute or co-processor to general purpose CPUs, with a focus on database applications. Here, large amounts of data are queried in a time-critical manner. This dissertation shows that using hardware platforms allows processing data in a streaming (single pass) and massively parallel manner, hence speeding up computation by several orders of magnitude when compared to general purpose CPUs. The complexity of programming these parallel platforms is abstracted from the developers, as hardware constructs are automatically generated from high-level application languages and/or specifications. This dissertation explores the hardware acceleration of XML path and twig filtering, using novel dynamic programming algorithms. Publish-subscribe systems present v the state of the art in information dissemination to multiple users. Current XML-based publish-subscribe systems provide users with considerable flexibility allowing the formulation of complex queries on the content as well as the (tree) structure of the streaming messages. Messages that contain one or more matches for a given user profile (query) are forwarded to the user. This dissertation further studies FPGA-based architectures for processing expressive motion patterns on continuous spatio-temporal streams. Complex motion patterns are described as substantially flexible variable-enhanced regular expressions over a spatial alphabet that can be implicitly or explicitly anchored to the time domain. Using FPGAs, thousands of queries are matched in parallel. The challenges in handling several constructs of the assumed query language are explored, with a study on the tradeoffs between expressiveness, scalability and matching accuracy (eliminating false-positives). Finally, the first parallel Golomb-Rice (GR) integer decompression FPGAbased architecture is detailed, allowing the decoding of unmodified GR streams at the deterministic rate of several bytes (multiple integers) per hardware cycle. Integer decompression is a first step in the querying of inverted indexes.
IEEE Micro, 2000
atabase management systems currently serve very important business applications. Many of these systems use a relational model that provides simple and powerful features. The problem is that this relational model insufficiently supports important emerging applications such as CAD, office automation, and knowledge-based systems when new types of data are needed. (See box on The Relational Model on the next page.)
IEEE Micro, 2000
comparisons of hashing buckets having the same index. 16 Therefore, the desirable number of pro cessors is limited to the number of tuples in the hashing bucket of the smaller relation, multiplied by the number of processing elements used for each tuple (usually one PE per tuple).
Proceedings of the VLDB Endowment
The data revolution is fueled by advances in machine learning, databases, and hardware design. Programmable accelerators are making their way into each of these areas independently. As such, there is a void of solutions that enables hardware acceleration at the intersection of these disjoint fields. This paper sets out to be the initial step towards a unifying solution for in- D atabase A cceleration of Advanced A nalytics (DAnA). Deploying specialized hardware, such as FPGAs, for in-database analytics currently requires hand-designing the hardware and manually routing the data. Instead, DAnA automatically maps a high-level specification of advanced analytics queries to an FPGA accelerator. The accelerator implementation is generated for a User Defined Function (UDF), expressed as a part of an SQL query using a Python-embedded Domain-Specific Language (DSL). To realize an efficient in-database integration, DAnA accelerators contain a novel hardware structure, Striders , that directl...
Existing work on accelerating analytic DB query processing with (discrete) GPUs fails to fully realize their potential for speedup through parallelism: Published results do not achieve significant speedup over more performant CPU-only DBMSes when processing complete queries. This paper presents a successful effort to better meet this challenge, in the form of a proof-of-concept query processing framework. The framework constitutes a graft onto an existing DBMS, altering some parts of it and replacing its execution engine entirely. It intensively refactors query execution plans, making them better-parallelizable, before executing them on either a CPU or on GPU. This results in a significant speedup even on a CPU, and a further speedup when using a GPU, over the chosen host DBMS (MonetDB) — which itself already bests most published results utilizing a GPU for query processing. Finally, we outline some concrete future improvements on our results which can cut processing time by half and possibly much more.
The VLDB Journal, 2019
While FPGAs have seen prior use in database systems, in recent years interest in using FPGA to accelerate databases has declined in both industry and academia for the following three reasons. First, specifically for in-memory databases, FPGAs integrated with conventional I/O provide insufficient bandwidth, limiting performance. Second, GPUs, which can also provide high throughput, and are easier to program, have emerged as a strong accelerator alternative. Third, programming FPGAs required developers to have full-stack skills, from high-level algorithm design to low-level circuit implementations. The good news is that these challenges are being addressed. New interface technologies connect FPGAs into the system at main-memory bandwidth and the latest FPGAs provide local memory competitive in capacity and bandwidth with GPUs. Ease of programming is improving through support of shared coherent virtual memory between the host and the accelerator, support for higher-level languages, and...
2017
The key objective of database systems is to e ciently manage an always increasing amount of data. Thereby, a high query throughput and a low query latency are core requirements. To satisfy these requirements, database engines are highly adapted to the given hardware by using all features of modern processors. Apart from this software optimization, even tailor-made processing circuits running on FGPAs are built to run mostly stateless query plans with a high throughput. A similar approach, which was already investigated three decades ago, is to build customized hardware like a database processor. Tailor-made hardware allows to achieve performance numbers that cannot be reached with software running on general-purpose CPUs, while at the same time, addressing the dark silicon problem. The main disadvantage of custom hardware is the high development cost that comes with designing and verifying a new processor, as well as building respective drivers and the software stack. However, there...
2016
Data processing on a continuously growing amount of information and the increasing power restrictions have become an ubiquitous challenge in our world today. Besides parallel computing, a promising approach to improve the energy efficiency of current systems is to integrate specialized hardware. This paper presents a Tensilica RISC processor extended with an instruction set to accelerate basic database operators frequently used in modern database systems. The core was taped out in a 28 nm SLP CMOS technology and allows energy-efficient query processing as well as query optimization by applying selectivity estimation techniques. Our chip measurements show an 1000x energy improvement on selected database operators compared to state-of-the-art systems.
Proceedings of the 2016 International Conference on Management of Data - SIGMOD '16, 2016
In this paper, we show how we use Nvidia GPUs and host CPU cores for faster query processing in a DB2 database using BLU Acceleration (DB2's column store technology). Moreover, we show the benef ts and problems of using hardware accelerators (more specif cally GPUs) in a real commercial Relational Database Management System (RDBMS). We investigate the effect of off-loading specif c database operations to a GPU, and show how doing so results in a signif cant performance improvement. We then demonstrate that for some queries, using just CPU to perform the entire operation is more benef cial. While we use some of Nvidia's fast kernels for operations like sort, we have also developed our own high performance kernels for operations such as group by and aggregation. Finally, we show how we use a dynamic design that can make use of optimizer metadata to intelligently choose a GPU kernel to run. For the f rst time in the literature, we use benchmarks representative of customer environments to gauge the performance of our prototype, the results of which show that we can get a speed increase upwards of 2x, using a realistic set of queries.
Computer Communications and Networks, 2015
The rapid growth of "big-data" intensified the problem of data movement when processing data analytics: Large amounts of data need to move through the memory up to the CPU before any computation takes place. To tackle this costly problem, Processing-in-Memory (PIM) inverts the traditional data processing by pushing computation to memory with an impact on performance and energy efficiency. In this paper, we present an experimental study on processing database SIMD operators in PIM compared to current x86 processor (i.e., using AVX512 instructions). We discuss the execution time gap between those architectures. However, this is the first experimental study, in the database community, to discuss the trade-offs of execution time and energy consumption between PIM and x86 in the main query execution systems: materialized, vectorized, and pipelined. We also discuss the results of a hybrid query scheduling when interleaving the execution of the SIMD operators between PIM and x86 processing hardware. In our results, the hybrid query plan reduced the execution time by 45%. It also drastically reduced energy consumption by more than 2× compared to hardware-specific query plans.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.
Distributed and Parallel Databases
2017 IEEE 33rd International Conference on Data Engineering (ICDE), 2017
Lecture Notes in Computer Science, 2017
Datenbank-Spektrum, 2021
Proceedings of the May 16-19, 1983, national computer conference on - AFIPS '83, 1983
Proceedings of the 13th International Workshop on Data Management on New Hardware
Datenbank-Spektrum, 2018
it - Information Technology, 2017
International Symposium on Parallel Architectures, Algorithms and Networks, 1997
ACM SIGMOD Record, 1995
IEEE Transactions on Parallel and Distributed Systems, 2011
Procedia Computer Science, 2010