Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2003
Relational index structures, as for instance the Relational Interval Tree, the Relational R-Tree, or the Linear Quadtree, support efficient processing of queries on top of existing object-relational database systems. Furthermore, there exist effective and efficient models to estimate the selectivity and the I/O cost in order to guide the cost-based optimizer whether and how to include these index structures into the execution plan. By design, the models immediately fit to common extensible indexing/optimization frameworks, and their implementations exploit the built-in statistics facilities of the database server. In this paper, we show how these statistics can also be used for accelerating the access methods themselves by reducing the number of generated join partners. The different join partners are grouped together according to a cost-based grouping algorithm. Our first experiments on an Oracle9i database yield a speed-up of up to 1,000% for the Relational Interval Tree, the Relational R-Tree and for the Linear Quadtree.
2003
Relational index structures, as for instance the Relational Interval Tree, the Relational R-Tree, or the Linear Quadtree, support efficient processing of queries on top of existing object-relational database systems. Furthermore, there exist effective and efficient models to estimate the selectivity and the I/O cost in order to guide the cost-based optimizer whether and how to include these index structures into the execution plan. By design, the models immediately fit to common extensible indexing/optimization frameworks, and their implementations exploit the built-in statistics facilities of the database server. In this paper, we show how these statistics can also be used for accelerating geo-spatial queries using the relational quadtree by reducing the number of generated join partners which results in less logical reads and consequently improves the overall runtime. We cut down on the number of join partners by grouping different join partners together according to a statistic driven grouping algorithm. Our experiments on an Oracle9i database yield an average speed-up between 30% and 300% for spatial selection queries on the Relational Quadtree.
2004
Relational index structures, as for instance the Relational Interval Tree or the Linear Quadtree, support efficient processing of queries on top of existing object-relational database systems. Furthermore, there exist effective and efficient models to estimate the selectivity and the I/O cost in order to guide the cost-based optimizer whether and how to include these index structures into the execution plan. By design, the models immediately fit to common extensible indexing/optimization frameworks, and their implementations exploit the built-in statistics facilities of the database server. In this paper, we show how these statistics can also be used for accelerating the access methods themselves by reducing the number of generated join partners which results in fewer logical reads and consequently improves the overall runtime. We cut down on the number of join partners by grouping different join partners together according to a statistic driven grouping algorithm. Our experiments on an Oracle9i database yield an average speed-up between 20% and 10,000% for spatial collision queries on the Relational Interval Tree and on the Relational Quadtree. 2.1 Statistics Related to the Relational Access Method We start with a definition common to all relational access methods: Definition 1 (Relational Access Method) [6] An access method is called a relational access method, iff any index-related data is exclusively stored in and retrieved from relational tables. An instance of a relational access method is called a relational index. The following tables comprise the persistent data of a relational index: (i) User table: a single table, storing the original user data being indexed. (ii) Index tables: n tables, n ≥ 0, storing index data derived from the user table. (iii) Meta table: a single table for each database and each relational access method, storing O(1) rows for each instance of an index. The stored data is called user data, index data, and meta data.
2004
In contrast to space-partitioning index structures, data-partitioning index structures naturally adapt to the actual data distribution which results in a very good query response behavior. Besides efficient query processing, modern database applications including computer-aided design, medical imaging, or molecular biology require fully-fledged database management systems in order to guarantee industrial-strength. In this paper, we show how we can achieve efficient query processing on data-partitioning index structures within general purpose database systems. We reduce the navigational index traversal cost by using "extended index range scans". If a directory node is "largely" covered by the actual query, the recursive tree traversal for this node can beneficially be replaced by a scan on the leaf level of the index instead of navigating through the directory any longer. On the other hand, for highly selective queries, the index is used as usual. In this paper, we demonstrate the benefits of this idea for spatial collision queries on the Relational R-tree. Our experiments with an Oracle9i database system show that our new approach outperforms common index structures and the sequential scan considerably.
Lecture Notes in Computer Science, 2004
The efficient management of interval sequences represents a core requirement for many temporal and spatial database applications. With the Relational Interval Tree (RI-tree), an efficient access method has been proposed to process intersection queries of spatial objects encoded by interval sequences on top of existing object-relational database systems. This paper complements that approach by effective and efficient models to estimate the selectivity and the I/O cost of interval sequence intersection queries in order to guide the cost-based optimizer whether and how to include the RI-tree into the execution plan. By design, the models immediately fit to common extensible indexing/optimization frameworks, and their implementations exploit the built-in statistics facilities of the database server. According to our experimental evaluation on an Oracle database, the average relative error of the estimated query results and costs lies in the range of 0% to 32%, depending on the size and the structural complexity of the query objects.
Proc. 26th Int. Conf. on Very Large …, 2000
Modern database applications show a growing demand for efficient and dynamic management of intervals, particularly for temporal and spatial data or for constraint handling. Common approaches require the augmentation of index structures which, however, is not supported by existing relational database systems. By design, the new Relational Interval Tree 1 (RI-tree) employs built-in indexes on an as-they-are basis and is easy to implement. Whereas the functionality and efficiency of the RI-tree is supported by any off-the-shelf relational DBMS, it is perfectly encapsulated by the object-relational data model. The RI-tree requires O(n/b) disk blocks of size b to store n intervals, O(log b n) I/O operations for insertion or deletion, and O(h · log b n + r/b) I/Os for an intersection query producing r results. The height h of the virtual backbone tree corresponds to the current expansion and granularity of the data space but does not depend on n. As demonstrated by our experimental evaluation on an Oracle8i server, competing dynamic interval access methods are outperformed by factors of up to 42 for disk accesses and 4.9 for query response time.
Information Processing Letters, 1984
Rcc, 2001
The advent of telecommunication era and the constant development of hardware and network structures have encouraged the decentralization of data while increasing the needs to access information from different sites. Query optimization strategies aim to minimize the cost of transferring data across networks. Many techniques and algorithms have been proposed to optimize queries. Perhaps one of the more important algorithms is the AHY algorithm using semi-joins that is implemented by Apers, Hevner and Yao in [1]. Nowadays, a new technique called PERF (Partially Encoded Record Filters) seems to bring some improvement over semi-joins [12]. PERF joins are two-way semi-joins using a bit vector as their backward phase. Our research encompasses applying PERF joins to two well know algorithms: AHY and W, which both deal with query optimization. Programs were designed to implement both the original and the enhanced algorithms. Several experiments were conducted and the results showed a very considerable enhancement obtained by applying the PERF concept. This major improvement led us to further observations and studies.
Proceedings of ICECCS '96: 2nd IEEE International Conference on Engineering of Complex Computer Systems (held jointly with 6th CSESAW and 4th IEEE RTAW), 1996
Although various types of path indexes-indexes on path expressions-have been proposed for efficient processing of object-oriented queries, conventional join algorithms do not effectively utilize them. We propose a new join algorithm called OID join algorithm that effectively utilizes (multiple) path indexes in object-oriented databases. When (multiple) path indexes are available for a query, OID join algorithm may reduce the query evaluation cost significantly by taking full advantage of the path indexes. We present cost analysis for OID join algorithm and compare it with those of conventional ones.
2005
With the increasing occurrence of temporal and spatial data in present-day database applications, the interval data type is adopted by more and more database systems. For an efficient support of queries that contain selections on interval attributes as well as simple-valued attributes (e. g. numbers, strings) at the same time, special index structures are required supporting both types of predicates in combination. Based on the Relational Interval Tree, we present various indexing schemes that support such combined queries and can be integrated in relational database systems with minimum effort. Experiments on different query types show superior performance for the new techniques in comparison to competing access methods.
IRJET, 2021
Databases and database management systems have been the backbone of computing world for the past many years. The enterprise, web and cloud computing market is growing bigger in terms of size. It will definitely continue to gain prominence in the coming years. With the standardization and consolidation of information technology systems in most enterprises, the demand for highly scalable, reliable and faster relational database systems is on the rise. The databases are crucial for any enterprise operations and to ensure the operations go on smoothly without any issues, database performance is highly crucial. The high performance of the databases could be very well managed by practicing and adopting good database optimization strategies. Indexing is one of the most important strategy to assure the optimal performance of relational databases. To fix the problem of poor database performance and improve the database performance optimization, indexing strategies are essential. Index is basically a data structure based on one or more columns of the database. With faster data retrieval and minimal disk accesses for each query, indexing strategies emerge as powerful technique for performance optimization of relational databases.
Information Systems, 1984
Indexes are a commonly used structure that provides fast access to the data. Their use imply storage and maintenance costs. This paper presents a technique to reduce index size, based on the elimination of tuple offsets in the classical B + tree structure. It is shown that this technique gives advantages both in the tuple access and index maintenance.
Lecture Notes in Computer Science, 2001
Intervals represent a fundamental data type for temporal, scientific, and spatial databases where time stamps and point data are extended to time spans and range data, respectively. For OLTP and OLAP applications on large amounts of data, not only intersection queries have to be processed efficiently but also general interval relationships including before, meets, overlaps, starts, finishes, contains, equals, during, startedBy, finishedBy, overlappedBy, metBy, and after. Our new algorithms use the Relational Interval Tree, a purely SQL-based and objectrelationally wrapped index structure. The technique therefore preserves the industrial strength of the underlying RDBMS including stability, transactions, and performance. The efficiency of our approach is demonstrated by an experimental evaluation on a real weblog data set containing one million sessions.
In multimedia databases, the spatial index structures based on trees (like R-tree, M-tree) have been proved to be efficient and scalable for low-dimensional data retrieval. However, if the data dimensionality is too high, the hierarchy of nested regions (represented by the tree nodes) becomes spatially indistinct. Hence, the query processing deteriorates to inefficient index traversal (in terms of random-access I/O costs) and in such case the tree-based indexes are less efficient than the sequential search. This is mainly due to repeated access to many nodes at the top levels of the tree. In this paper we propose a modified storage layout of tree-based indexes, such that nodes belonging to the same tree level are stored together. Such level-ordered storage allows to prefetch several top levels of the tree to the buffer pool by only a few or even a single contiguous I/O operation (i.e. one-seek read). The experimental results show that our approach can speedup the tree-based search significantly.
Keywords: Indexing; R* tree indexing; P+ tree indexing.
1998
We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimensional data, as they arise in all relational databases and in particular in data- warehousing and data-mining applications. The UB-Tree is balanced and has all the guaranteed performance characteristics of B-Trees, i.e., it requires linear space for storage and logarithmic time for the basic operations of insertion, retrieval
Journal of King Saud University - Computer and Information Sciences, 2010
Enhancing the performance of large database systems depends heavily on the cost of performing join operations. When two very large tables are joined, optimizing such operation is considered one of the interesting research topics to many researchers, especially when both tables, to be joined, are very large to fit in main memory. In such case, join is usually performed by any other method than hash Join algorithms. In this paper, a novel join algorithm that is based on the use of quadtrees, is introduced. Applying the proposed algorithm on two very large tables, that are too large to fit in main memory, is proven to be fast and efficient. In the proposed new algorithm, both tables are represented by a storage efficient quadtree that is designed to handle one-dimensional arrays (1-D arrays). The algorithm works on the two 1-D arrays of the two tables to perform join operations. For the new algorithm, time and space complexities are studied. Experimental studies show the efficiency and superiority of this algorithm. The proposed join algorithm requires minimum number of I/O operations and operates in main memory with O(n log (n/k)) time complexity, where k is number of key groups with same first letter, and (n/k) is much smaller than n.
EUROMICRO 97. Proceedings of the 23rd EUROMICRO Conference: New Frontiers of Information Technology (Cat. No.97TB100167), 2000
The paper focuses on the indexing on non-primitive (complex) values of attributes in an object management system. A new index structure for indexing on set (multivalued) attributes is proposed. This structure is based an a partial order imposed on the values of the indexed attribute, which are subsets of a set of primitive values. It is shown that the proposed index allows the system to eficiently perform typical set operators that are postulated to be applied in object query languages (is-equal, is-subset, is-superset), without performing any costly operations on lists of object identiJiers that would be necessary in traditional index structures. The new index structure, called partial-order tree, is described and algorithms performing the set operators are outlined
The query optimizer is the component in a relational database system that identifies efficient execution plans for input queries. Modern optimizers generally explore many alternative query plans in a cost-based manner. Specifically, the resource consumption and associated cost of each candidate plan is estimated, and the plan with the least expected cost is chosen for execution. The cost estimation for a plan depends on several factors, including resource availability during execution, the specific operators that compose the plan, and the size of intermediate results that would be generated during the plan execution. Among these factors, the intermediate-result size (or cardinality) estimation is the main source of inaccuracies during optimization: cardinality estimation typically relies on several simplifying assumptions that often do not hold in practice. Optimizers then sometimes base their decisions on inaccurate information and produce low-quality execution plans. To address this limitation, in this thesis we introduce the concept of SITs, which are statistics built on query expressions. SITs directly and accurately model intermediate results in a query execution plan, and therefore avoid error-prone simplifying assumptions during cardinality estimation. If optimizers have appropriate SITs available during optimization, the resulting query plans can be dramatically better than otherwise.
Proceedings 14th International Conference on Data Engineering, 1998
The join query is one of the fundamental operations in Data Base Management Systems (DBMSs). Modern DBMSs should be able to support non-traditional data, including spatial objects, in an efficient manner. Towards this goal, spatial data structures can be adopted in order to support the execution of join queries on sets of multidimensional data. This paper introduces analytical models that estimate the cost (in terms of node or disk accesses) of join queries involving two multidimensional indexed data sets using R-tree-based structures. In addition, experimental results are presented, which show the accuracy of the analytical estimations when compared to actual runs on both synthetic and real data sets. It turns out that the relative error rarely exceeds 15% for all combinations, a fact that makes the proposed cost models useful tools for efficient spatial query optimization.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.