In this paper we describe a special caching technique, called UB-Cache, which is tailored to work... more In this paper we describe a special caching technique, called UB-Cache, which is tailored to work with data organized as a UB-Tree [6], [7], a novel multidimensional datastructure. The UB-Cache makes it possible to read data from disk in arbitrary sort order according to those attributes that are used in the UB-Tree. This property can be used to speed up all operations of relational algebra substantially. We assume that the reader is familiar with the UB-Tree as described in [6] or [7].
Proceedings International Database Engineering and Applications Symposium
Advanced data warehouses and web databases have set the demand for processing large sets of time ... more Advanced data warehouses and web databases have set the demand for processing large sets of time ranges, quality classes, fuzzy data, personalized data and extended objects. Since, all of these data types can be mapped to intervals, interval indexing can dramatically speed up or even be an enabling technology for these new applications. We introduce a method for managing intervals by indexing the dual space with the UB-Tree. We show that our method is an effective and efficient solution, benefitting from all good characteristics of the UB-Tree, i.e., concurrency control, worst case guarantees for insertion, deletion and update as well as efficient query processing. Our technique can easily be integrated into an RDBMS engine providing the UB-Tree as access method. We also show that our technique is superior and more flexible to previously suggested techniques.
The UB-Tree: Performance of Multidimensional Range Queries
We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimension... more We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimensional data, as they arise in all relational databases and in particular in data-warehousing and data-mining applications. The UB-Tree is balanced and has all the guaranteed performance characteristics of B-Trees, i.e., it requires linear space for storage and logarithmic time for the basic operations of insertion, retrieval and deletion. Therefore it can efficiently support OLTP. In addition the UB-Tree preserves clustering of objects with respect to Cartesian distance. Therefore, it shows its main strengths for multidimensional data. It has very high potential for parallel processing. A single UB-Tree can replace a large number of secondary indexes and join indexes including foreign column join indexes (FCJ). For updates this means that only one UB-Tree must be managed instead of several secondary indexes. This reduces runtime and storage requirements substantially. For retrieval the UB-Tr...
The problem of providing operational integrity of data bases as opposed to operating systems is d... more The problem of providing operational integrity of data bases as opposed to operating systems is discussed. Techniques of resource locking, mainly individual object locking and predicate locking, are surveyed, improved, and unified. An efficient on-line transitive closure algorithm for deadlock discovery is presented and analyzed. Several strategies for preventing indefinite delay of transactions are proposed. Phantoms and the need for predicate locking are surveyed and reconsidered. Several strategies for handling phantoms are proposed: one without predicate locking and two in ~aich predicate locking is needed for writing transactions only, and in which individual object locking sufficies for pure readers.
Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings, 1997
With the success of the WorldWide Web new solutions for very fast document access have emerged. E... more With the success of the WorldWide Web new solutions for very fast document access have emerged. ELEKTRA is an article delivery system that supports full-text and relational information retrieval as well as electronic article ordering and delivery. Full-length articles can be both viewed by standard-web browsers and printed by PostScript printers in a high resolution quality. ELEKTRA has been realized by using the general multimedia digital library tool Omnis 4.0.
The use of multiprocessor architectures requires the parallelization of sorting algorithms. A par... more The use of multiprocessor architectures requires the parallelization of sorting algorithms. A parallel sorting algorithm based on horizontal parallelization is presented. This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew. The core of the parallel sorting algorithm is a new adaptive partitioning method. The effect of data skew is remedied by taking samples representing the distribution of the input data. The parallel algorithm has been implemented on top of a shared disk multiprocessor architecture. The performance evaluation of the algorithm shows that it has linear speedup. Furthermore, the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337), 1999
Most operations of the relational algebra or SQL require a sorted stream of tuples for efficient ... more Most operations of the relational algebra or SQL require a sorted stream of tuples for efficient processing. Therefore, processing complex relational queries relies on efficient access to a table in some sort order. In principle, indexes could be used, but they are superior to a full table scan only, if the result set is sufficiently restricted in the index attribute. In this paper we present the Tetris algorithm, which utilizes restrictions to process a table in sort order of any attribute without the need of external sorting. The algorithm relies on the space partitioning of a multidimensional access method. A sweep line technique is used to read data in sort order of any attribute, while accessing each disk page of a table only once. Results are produced earlier than with traditional sorting techniques, allowing better response times for interactive applications and pipelined processing of the result set. We describe a prototype implementation of the Tetris algorithm using UB-Trees on top of Oracle 8, define a cost model and present performance measurements for some queries of the TPC-D benchmark.
In modern universal database management systems (DBMSs) user- defined data types along with exten... more In modern universal database management systems (DBMSs) user- defined data types along with extensible indexing structures try to bridge the gap between standard data-independent DBMS implementations and the requirement of specialized access methods for efficient domain-specific data retrieval, maintenance, and storage. However, these approaches often suffer from restricting the degree of freedom in the implementation and limiting the availability of crucial database features. Due to their design concepts, these extensible indexing frameworks are not intended to be suitable for rapid development and evaluation of research prototypes, as they lack essential generalization, completeness, and depth of their integration into the host DBMS. We discuss the advantages and drawbacks of available extensible indexing techniques and present several methods that can be easily combined into a powerful and flexible framework for storing, indexing, and manipulating domain specific data from any data source. We demonstrate that this framework comprises all properties truly extensible indexing should have. A prototype implementation of this framework was integrated into the relational DBMS Transbase®.
A Performance Model for Preplanned Disk Sorting. The idea of preplanning strings on disks which a... more A Performance Model for Preplanned Disk Sorting. The idea of preplanning strings on disks which are merged together is investigated from a performance point of view. Schemes of internal buffer allocation, initial string creation by an internal sort, and string distribution on disks are evaluated. An algorithm is given for the construction of suboptimal merge trees called plannable merge trees. A cost model is presented for accurate preplanning which consists of detailed assumptions on disk allocation for k input disks and r-way merge planning. Timing considerations for sort and merge including hardware characteristics of moveable head disks show a significant gain of time compared to widely used sort/merge applications. Leistungsanalyse beim vorgeplanten Sortieren auf Magnctplatten. Der Ansatz, Strings, die zusammengemischt werden, auf Magnetplatten vorzuplanen, wird unter Leistungsgesichtspunkten untersucht. Konzepte far die interne Pufferzuordnung, fiir die Erzeugung der anf'finglichen Strings dutch ein internes Sortierveffahren, und ffir die Stringverteilung auf Magnetplatten werden ansgewertet. Ein Algorithmns besehreibt die Konstruktion yon suboptimalen Mischb/iumen, die planbare Mischbgiume genannt werden. Ein Kostenmodell, das auf detaillierte Annahmen der Zuordnung yon k Eingabeplatten und der Planung eines r-Wege-Misehens beruht, wird far das exakte Vorplanen aufgestellt. Zeitbetrachtungen for Sortieren und Mischen, die Hardware-Eigenschaften yon Magnetplatten einschlieBen, zeigen signifikante Zeitgewinne verglichen mit weitverbreiteten Sortier-and Mischverfabren.
The paper describes a scheme for symbolic manipulation of index expressions which arise as a by-p... more The paper describes a scheme for symbolic manipulation of index expressions which arise as a by-product of the symbolic manipulation of expressions in the matrix calculi described by the authors in a previous paper. This scheme attempts program optimization by transforming the original algorithm rather than the machine code. The goal is to automatically generate code for handling the tedious address calculations necessitated by complicated data structures. The paper is therefore preoccupied with “indexing by position.” The relationship of “indexing by name” and “indexing by position” is discussed.
A dump technique for programs written in ALGOL 60 is described. This technique provides an intell... more A dump technique for programs written in ALGOL 60 is described. This technique provides an intelligible analysis of an unsuccessful computation process in terms of the original source program.
Only few multidimensional access methods have made their way into commercial relational DBMS. Eve... more Only few multidimensional access methods have made their way into commercial relational DBMS. Even if a RDBMS ships with a multidimensional index, the multidimensional index usually is an add-on like Oracle SDO, which is not integrated into the SQL interpreter, query processor and query optimizer of the DBMS kernel. Our demonstration shows TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multidimensional extension of the B-Tree. This integration was performed in an ESPRIT project funded by the European Commission. We put the main emphasis of our demonstration on the application of UB-Tree indexes in realworld databases for OLAP. However, we also address general issues of UB-Trees like creation, spacerequirements, or comparison to other indexing methods.
Online Analytical Processing (OLAP) requires query response times within the range of a few secon... more Online Analytical Processing (OLAP) requires query response times within the range of a few seconds in order to allow for interactive drilling, slicing, or dicing through an OLAP cube. While small OLAP applications use multidimensional database systems, large OLAP applications like the SAP BW rely on relational (ROLAP) databases for efficient data storage and retrieval. ROLAP databases use specialized data models like star or snowflake schemata for data storage and create a large set of indexes or materialized views in order to answer queries efficiently. In our case study, we show the performance benefits of TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multi-dimensional extension of the B-Tree. With this newly developed access structure, TransBase HyperCube enables interactive OLAP without the need of storing a large set of materialized views or creating a large set of indexes. We compare not only the query performance, but also consider index size and maintenance costs. For the case study we use a 42 million record ROLAP database of GfK, the largest German market research company.
Analysis-oriented database applications, such as data warehousing or customer relationship manage... more Analysis-oriented database applications, such as data warehousing or customer relationship management, play a crucial role in the database area. In general, the multidimensional data model is used in these applications, realized as star or snow-flake schemata in the relational world. The so-called star queries are the prevalent type of queries on such schemata. All database vendors have extended their products to support star queries efficiently. However, mostly reporting queries benefit from the optimizations, like pre-aggregation, while ad-hoc queries usually lack efficient support. We present the DBMS Transbase® in this paper, which provides a new physical organization of the data based on hierarchical clustering and multidimensional clustering combined with multidimensional indexing. In combination with new query optimizations (e.g., hierarchical pre-grouping) significant performance improvements are achieved. The paper describes how the new technology is implemented in the Tran...
Multidimensional access methods have shown high potential for significant performance improvement... more Multidimensional access methods have shown high potential for significant performance improvements in various application domains. However, only few approaches have made their way into commercial products. In commercial database management systems (DBMSs) the B-Tree is still the prevalent indexing technique. Integrating new indexing methods into existing database kernels is in general a very complex and costly task. Exceptions exist, as our experience of integrating the UB-Tree into TransBase, a commercial DBMS, shows. The UB-Tree is a very promising multidimensional index, which has shown its superiority over traditional access methods in different scenarios, especially in OLAP applications. In this paper we discuss the major issues of a UB-Tree integration. As we will show, the complexity and cost of this task is reduced significantly due to the fact that the UB-Tree relies on the classical B-Tree. Even though commercial DBMSs provide interfaces for index extensions, we favor the kernel integration because of the tight coupling with the query optimizer, which allows for optimal usage of the UB-Tree in execution plans. Measurements on a real-world data warehouse show that the kernel integration leads to an additional performance improvement compared to our prototype implementation and competing index methods.
In this paper we describe a special caching technique, called UB-Cache, which is tailored to work... more In this paper we describe a special caching technique, called UB-Cache, which is tailored to work with data organized as a UB-Tree [6], [7], a novel multidimensional datastructure. The UB-Cache makes it possible to read data from disk in arbitrary sort order according to those attributes that are used in the UB-Tree. This property can be used to speed up all operations of relational algebra substantially. We assume that the reader is familiar with the UB-Tree as described in [6] or [7].
Proceedings International Database Engineering and Applications Symposium
Advanced data warehouses and web databases have set the demand for processing large sets of time ... more Advanced data warehouses and web databases have set the demand for processing large sets of time ranges, quality classes, fuzzy data, personalized data and extended objects. Since, all of these data types can be mapped to intervals, interval indexing can dramatically speed up or even be an enabling technology for these new applications. We introduce a method for managing intervals by indexing the dual space with the UB-Tree. We show that our method is an effective and efficient solution, benefitting from all good characteristics of the UB-Tree, i.e., concurrency control, worst case guarantees for insertion, deletion and update as well as efficient query processing. Our technique can easily be integrated into an RDBMS engine providing the UB-Tree as access method. We also show that our technique is superior and more flexible to previously suggested techniques.
The UB-Tree: Performance of Multidimensional Range Queries
We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimension... more We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimensional data, as they arise in all relational databases and in particular in data-warehousing and data-mining applications. The UB-Tree is balanced and has all the guaranteed performance characteristics of B-Trees, i.e., it requires linear space for storage and logarithmic time for the basic operations of insertion, retrieval and deletion. Therefore it can efficiently support OLTP. In addition the UB-Tree preserves clustering of objects with respect to Cartesian distance. Therefore, it shows its main strengths for multidimensional data. It has very high potential for parallel processing. A single UB-Tree can replace a large number of secondary indexes and join indexes including foreign column join indexes (FCJ). For updates this means that only one UB-Tree must be managed instead of several secondary indexes. This reduces runtime and storage requirements substantially. For retrieval the UB-Tr...
The problem of providing operational integrity of data bases as opposed to operating systems is d... more The problem of providing operational integrity of data bases as opposed to operating systems is discussed. Techniques of resource locking, mainly individual object locking and predicate locking, are surveyed, improved, and unified. An efficient on-line transitive closure algorithm for deadlock discovery is presented and analyzed. Several strategies for preventing indefinite delay of transactions are proposed. Phantoms and the need for predicate locking are surveyed and reconsidered. Several strategies for handling phantoms are proposed: one without predicate locking and two in ~aich predicate locking is needed for writing transactions only, and in which individual object locking sufficies for pure readers.
Database and Expert Systems Applications. 8th International Conference, DEXA '97. Proceedings, 1997
With the success of the WorldWide Web new solutions for very fast document access have emerged. E... more With the success of the WorldWide Web new solutions for very fast document access have emerged. ELEKTRA is an article delivery system that supports full-text and relational information retrieval as well as electronic article ordering and delivery. Full-length articles can be both viewed by standard-web browsers and printed by PostScript printers in a high resolution quality. ELEKTRA has been realized by using the general multimedia digital library tool Omnis 4.0.
The use of multiprocessor architectures requires the parallelization of sorting algorithms. A par... more The use of multiprocessor architectures requires the parallelization of sorting algorithms. A parallel sorting algorithm based on horizontal parallelization is presented. This algorithm is suited for large data volumes (external sorting) and does not suffer from processing skew in presence of data skew. The core of the parallel sorting algorithm is a new adaptive partitioning method. The effect of data skew is remedied by taking samples representing the distribution of the input data. The parallel algorithm has been implemented on top of a shared disk multiprocessor architecture. The performance evaluation of the algorithm shows that it has linear speedup. Furthermore, the optimal degree of CPU parallelism is derived if I/O limitations are taken into account.
Proceedings 15th International Conference on Data Engineering (Cat. No.99CB36337), 1999
Most operations of the relational algebra or SQL require a sorted stream of tuples for efficient ... more Most operations of the relational algebra or SQL require a sorted stream of tuples for efficient processing. Therefore, processing complex relational queries relies on efficient access to a table in some sort order. In principle, indexes could be used, but they are superior to a full table scan only, if the result set is sufficiently restricted in the index attribute. In this paper we present the Tetris algorithm, which utilizes restrictions to process a table in sort order of any attribute without the need of external sorting. The algorithm relies on the space partitioning of a multidimensional access method. A sweep line technique is used to read data in sort order of any attribute, while accessing each disk page of a table only once. Results are produced earlier than with traditional sorting techniques, allowing better response times for interactive applications and pipelined processing of the result set. We describe a prototype implementation of the Tetris algorithm using UB-Trees on top of Oracle 8, define a cost model and present performance measurements for some queries of the TPC-D benchmark.
In modern universal database management systems (DBMSs) user- defined data types along with exten... more In modern universal database management systems (DBMSs) user- defined data types along with extensible indexing structures try to bridge the gap between standard data-independent DBMS implementations and the requirement of specialized access methods for efficient domain-specific data retrieval, maintenance, and storage. However, these approaches often suffer from restricting the degree of freedom in the implementation and limiting the availability of crucial database features. Due to their design concepts, these extensible indexing frameworks are not intended to be suitable for rapid development and evaluation of research prototypes, as they lack essential generalization, completeness, and depth of their integration into the host DBMS. We discuss the advantages and drawbacks of available extensible indexing techniques and present several methods that can be easily combined into a powerful and flexible framework for storing, indexing, and manipulating domain specific data from any data source. We demonstrate that this framework comprises all properties truly extensible indexing should have. A prototype implementation of this framework was integrated into the relational DBMS Transbase®.
A Performance Model for Preplanned Disk Sorting. The idea of preplanning strings on disks which a... more A Performance Model for Preplanned Disk Sorting. The idea of preplanning strings on disks which are merged together is investigated from a performance point of view. Schemes of internal buffer allocation, initial string creation by an internal sort, and string distribution on disks are evaluated. An algorithm is given for the construction of suboptimal merge trees called plannable merge trees. A cost model is presented for accurate preplanning which consists of detailed assumptions on disk allocation for k input disks and r-way merge planning. Timing considerations for sort and merge including hardware characteristics of moveable head disks show a significant gain of time compared to widely used sort/merge applications. Leistungsanalyse beim vorgeplanten Sortieren auf Magnctplatten. Der Ansatz, Strings, die zusammengemischt werden, auf Magnetplatten vorzuplanen, wird unter Leistungsgesichtspunkten untersucht. Konzepte far die interne Pufferzuordnung, fiir die Erzeugung der anf'finglichen Strings dutch ein internes Sortierveffahren, und ffir die Stringverteilung auf Magnetplatten werden ansgewertet. Ein Algorithmns besehreibt die Konstruktion yon suboptimalen Mischb/iumen, die planbare Mischbgiume genannt werden. Ein Kostenmodell, das auf detaillierte Annahmen der Zuordnung yon k Eingabeplatten und der Planung eines r-Wege-Misehens beruht, wird far das exakte Vorplanen aufgestellt. Zeitbetrachtungen for Sortieren und Mischen, die Hardware-Eigenschaften yon Magnetplatten einschlieBen, zeigen signifikante Zeitgewinne verglichen mit weitverbreiteten Sortier-and Mischverfabren.
The paper describes a scheme for symbolic manipulation of index expressions which arise as a by-p... more The paper describes a scheme for symbolic manipulation of index expressions which arise as a by-product of the symbolic manipulation of expressions in the matrix calculi described by the authors in a previous paper. This scheme attempts program optimization by transforming the original algorithm rather than the machine code. The goal is to automatically generate code for handling the tedious address calculations necessitated by complicated data structures. The paper is therefore preoccupied with “indexing by position.” The relationship of “indexing by name” and “indexing by position” is discussed.
A dump technique for programs written in ALGOL 60 is described. This technique provides an intell... more A dump technique for programs written in ALGOL 60 is described. This technique provides an intelligible analysis of an unsuccessful computation process in terms of the original source program.
Only few multidimensional access methods have made their way into commercial relational DBMS. Eve... more Only few multidimensional access methods have made their way into commercial relational DBMS. Even if a RDBMS ships with a multidimensional index, the multidimensional index usually is an add-on like Oracle SDO, which is not integrated into the SQL interpreter, query processor and query optimizer of the DBMS kernel. Our demonstration shows TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multidimensional extension of the B-Tree. This integration was performed in an ESPRIT project funded by the European Commission. We put the main emphasis of our demonstration on the application of UB-Tree indexes in realworld databases for OLAP. However, we also address general issues of UB-Trees like creation, spacerequirements, or comparison to other indexing methods.
Online Analytical Processing (OLAP) requires query response times within the range of a few secon... more Online Analytical Processing (OLAP) requires query response times within the range of a few seconds in order to allow for interactive drilling, slicing, or dicing through an OLAP cube. While small OLAP applications use multidimensional database systems, large OLAP applications like the SAP BW rely on relational (ROLAP) databases for efficient data storage and retrieval. ROLAP databases use specialized data models like star or snowflake schemata for data storage and create a large set of indexes or materialized views in order to answer queries efficiently. In our case study, we show the performance benefits of TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multi-dimensional extension of the B-Tree. With this newly developed access structure, TransBase HyperCube enables interactive OLAP without the need of storing a large set of materialized views or creating a large set of indexes. We compare not only the query performance, but also consider index size and maintenance costs. For the case study we use a 42 million record ROLAP database of GfK, the largest German market research company.
Analysis-oriented database applications, such as data warehousing or customer relationship manage... more Analysis-oriented database applications, such as data warehousing or customer relationship management, play a crucial role in the database area. In general, the multidimensional data model is used in these applications, realized as star or snow-flake schemata in the relational world. The so-called star queries are the prevalent type of queries on such schemata. All database vendors have extended their products to support star queries efficiently. However, mostly reporting queries benefit from the optimizations, like pre-aggregation, while ad-hoc queries usually lack efficient support. We present the DBMS Transbase® in this paper, which provides a new physical organization of the data based on hierarchical clustering and multidimensional clustering combined with multidimensional indexing. In combination with new query optimizations (e.g., hierarchical pre-grouping) significant performance improvements are achieved. The paper describes how the new technology is implemented in the Tran...
Multidimensional access methods have shown high potential for significant performance improvement... more Multidimensional access methods have shown high potential for significant performance improvements in various application domains. However, only few approaches have made their way into commercial products. In commercial database management systems (DBMSs) the B-Tree is still the prevalent indexing technique. Integrating new indexing methods into existing database kernels is in general a very complex and costly task. Exceptions exist, as our experience of integrating the UB-Tree into TransBase, a commercial DBMS, shows. The UB-Tree is a very promising multidimensional index, which has shown its superiority over traditional access methods in different scenarios, especially in OLAP applications. In this paper we discuss the major issues of a UB-Tree integration. As we will show, the complexity and cost of this task is reduced significantly due to the fact that the UB-Tree relies on the classical B-Tree. Even though commercial DBMSs provide interfaces for index extensions, we favor the kernel integration because of the tight coupling with the query optimizer, which allows for optimal usage of the UB-Tree in execution plans. Measurements on a real-world data warehouse show that the kernel integration leads to an additional performance improvement compared to our prototype implementation and competing index methods.
Uploads
Papers by Rudolf Bayer