Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2005, Distributed and Parallel Databases
Multidimensional indexing is concerned with the indexing of multi-attributed records, where queries can be applied on some or all of the attributes. Indexing multi-attributed records is referred to by the term multidimensional indexing because each record is viewed as a point in a multidimensional space with a number of dimensions that is equal to the number of attributes. The values of the point coordinates along each dimension are equivalent to the values of the corresponding attributes. In this paper, the PN-tree, a new index structure for multidimensional spaces, is presented. This index structure is an efficient structure for indexing multidimensional points and is parallel by nature. Moreover, the proposed index structure does not lose its efficiency if it is serially processed or if it is processed using a small number of processors. The PN-tree can take advantage of as many processors as the dimensionality of the space. The PN-tree makes use of B + -trees that have been developed and tested over years in many DBMSs. The PN-tree is compared to the Hybrid tree that is known for its superiority among various index structures. Experimental results show that parallel processing of the PN-tree reduces significantly the number of disk accesses involved in the search operation. Even in its serial case, the PN-tree outperforms the Hybrid tree for large database sizes.
2001
Nowadays feature vector based similarity search is increasingly emerging in database systems. Consequently, many multidimensional data index techniques have been widely introduced to database researcher community. These index techniques are categorized into two main classes: SP (space partitioning)/KD-tree-based and DP (data partitioning)/R-tree-based. Recently, a hybrid index structure has been proposed. It combines both SP/KDtree-based and DP/R-tree-based techniques to form a new, more efficient index structure. However, weaknesses are still existing in techniques above. In this paper, we introduce a novel and flexible index structure for multidimensional data, the SH-tree (Super Hybrid tree). Theoretical analyses show that the SHtree is a good combination of both techniques with respect to both presentation and search algorithms. It overcomes the shortcomings and makes use of their positive aspects to facilitate efficient similarity searches.
Proceedings of the Sixth Euromicro Workshop on Parallel and Distributed Processing - PDP '98 -, 1998
to communicate among themselves by means of a ,4 wide class os multidimensional indexes employs a recursive partitioning of the data space as the kd-tree does. In this paper we present the m-Q-tree as a multidimensional data structure that can achieve the maximum degree of 2" children in every node (where m is the number of index attributes) and a maximum of only one underflow page per node. We describe the m-Q-tree, and give searching and inserting algorithms. In order to develop a solution for building the m-Q-tree, we dejne and use a conceptual tool, called prejx gruph, which permits us to manage the regions associated to all sons of every node. The proposed algorithm is of order Ofin). Finally, we present the results of a series of tests which indicate that the structure performs well. m-Q-tree gives a general technique for declustering data in a parallel database. We propose m-Q-tree as a new general access method which permits the exploitation of the potential parallelism of all relational operations, in addition to favour the execution of complex queries, including dqferent kinds of conditions over several attributesfor one or more relations.
2001
Only few multidimensional access methods have made their way into commercial relational DBMS. Even if a RDBMS ships with a multidimensional index, the multidimensional index usually is an add-on like Oracle SDO, which is not integrated into the SQL interpreter, query processor and query optimizer of the DBMS kernel. Our demonstration shows TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multidimensional extension of the B-Tree. This integration was performed in an ESPRIT project funded by the European Commission. We put the main emphasis of our demonstration on the application of UB-Tree indexes in realworld databases for OLAP. However, we also address general issues of UB-Trees like creation, spacerequirements, or comparison to other indexing methods.
1997
We propose a new multi-attribute index. Our approach combines the hB-tree, a multi-attribute index, and the Π-tree, an abstract index which offers efficient concurrency and recovery methods. We call the resulting method the hB Π -tree. We describe several versions of the hB Π -tree, each using a different node-splitting and index-term-posting algorithm. We also describe a new node deletion algorithm. We have implemented all the versions of the hB Π -tree. Our performance results show that even the version that offers no performance guarantees, actually performs very well in terms of storage utilization, index size (fan-out), exact-match and range searching, under various data types and distributions. We have also shown that our index is fairly insensitive to increases in dimension. Thus, it is suitable for indexing high-dimensional applications. This property and the fact that all our versions of the hB Π -tree can use the Π-tree concurrency and recovery algorithms make the hB Πtree a promising candidate for inclusion in a general-purpose DBMS.
1998
We investigate the usability and performance of the UB-Tree (universal B-Tree) for multidimensional data, as they arise in all relational databases and in particular in data- warehousing and data-mining applications. The UB-Tree is balanced and has all the guaranteed performance characteristics of B-Trees, i.e., it requires linear space for storage and logarithmic time for the basic operations of insertion, retrieval
2003
Similarity searches in multidimensional Nonordered Discrete Data Spaces (NDDS) are becoming increasingly important for application areas such as genome sequence databases. Existing indexing methods developed for multidimensional (ordered) Continuous Data Spaces (CDS) such as R-tree cannot be directly applied to an NDDS. This is because some essential geometric concepts/properties such as the minimum bounding region and the area of a region in a CDS are no longer valid in an NDDS. On the other hand, indexing methods based on metric spaces such as M-tree are too general to effectively utilize the data distribution characteristics in an NDDS. Therefore, their retrieval performance is not optimized. To support efficient similarity searches in an NDDS, we propose a new dynamic indexing technique, called the ND-tree. The key idea is to extend the relevant geometric concepts as well as some indexing strategies used in CDSs to NDDSs. Efficient algorithms for ND-tree construction are presented. Our experimental results on synthetic and genomic sequence data demonstrate that the performance of the ND-tree is significantly better than that of the linear scan and M-tree in high dimensional NDDSs.
Proceedings of SPDP '96: 8th IEEE Symposium on Parallel and Distributed Processing, 1996
Indexing multidimensional data is inherently complex leading to slow query processing. This behavior becomes more pronounced with the increase in database size and/or number of dimensions. In this paper; we address this issue by processing an index structure in parallel. First, we study direrent ways of partitioning an index structure. We then propose eflcient algorithms for processing each query in parallel on the index structure. Using these strategies, we parallelized two multidimensional index structures -R * and LIB and evaluated the performance gains f o r the Gazetteer and the Catalog data of the Alexandria Digital Library on the Meiko CS-2.
2015
The information retrieval from Big Data requires more efficient techniques for data indexing. According to the work in this paper, a Dynamic Order Multi-field Index (DOMI) structure has been introduced and implemented. The proposed DOMI indexing structure allows dynamic rather than sequential ordering of fields, in addition, compacting values with common prefixes. Hence, the DOMI allows efficiently indexing of huge data sets and answering queries that involve multi-fields, as well as, queries that involve a single field. A comparative study by building a compositefield index among the proposed DOMI and other popular indexing data structures such as B and B+ trees has been implemented. The comparison results show that DOMI composite indexing soutperform other composite indexing structures in case of answering a single field query that addresses a non-leading field. The performance of DOMI is slightly better than that of B+ in case of answering a composite-field query. Therefore, DOMI...
We propose the PATRICIA-hypercube-tree, or PH-tree, a multi-dimensional data storage and indexing structure. It is based on binary PATRICIA-tries combined with hypercubes for efficient data access. Space efficiency is achieved by combining prefix sharing with a space optimised implementation. This leads to storage space requirements that are comparable or below storage of the same data in non-index structures such as arrays of objects. The storage structure also serves as a multi-dimensional index on all dimensions of the stored data. This enables efficient access to stored data via point and range queries. We explain the concept of the PH-tree and demonstrate the performance of a sample implementation on various datasets and compare it to other spatial indices such as the kD-tree. The experiments show that for larger datasets beyond 10^7 entries, the PH-tree increasingly and consistently outperforms other structures in terms of space efficiency, query performance and update performance. For some highly skewed datasets, it even shows super-constant performance, becoming faster for larger datasets.
Information Systems, 1982
A new method for multiple attribute indexing, the Multidimensional B-Tree (MBDT), is developed. This method is well suited for dynamic databases, since it handles several types of associative queries efficiently and requires low-cost maintenance. Algorithms and search strategies for exact match, partial match, and range queries are presented and statistical procedures are given to estimate the average and worst case retrieval times. The applicability of our organization to practical databases is discussed and analytical tradeoffs with regard to index organizations based on k-d trees are established.
In multimedia databases, the spatial index structures based on trees (like R-tree, M-tree) have been proved to be efficient and scalable for low-dimensional data retrieval. However, if the data dimensionality is too high, the hierarchy of nested regions (represented by the tree nodes) becomes spatially indistinct. Hence, the query processing deteriorates to inefficient index traversal (in terms of random-access I/O costs) and in such case the tree-based indexes are less efficient than the sequential search. This is mainly due to repeated access to many nodes at the top levels of the tree. In this paper we propose a modified storage layout of tree-based indexes, such that nodes belonging to the same tree level are stored together. Such level-ordered storage allows to prefetch several top levels of the tree to the buffer pool by only a few or even a single contiguous I/O operation (i.e. one-seek read). The experimental results show that our approach can speedup the tree-based search significantly.
2005
In this paper, we propose an efficient access method, named MK-tree, to dynamically index large data sets in high dimensional spaces. It is an extension of Mtree with key dimension to improve the efficiency of space partition and reduce the response time of similarity search for high dimensional data. The main idea behind the key dimension is to make the fanout of tree larger by partitioning a subspace further into two subspaces, called a twin-node, according to the key dimension. To get a high space utilization, we conduct data reallocation within a twin-node dynamically, therefore further improve the performance of MK-tree. Our experimental results show that a higher filtering efficiency can be obtained by using the concept of key dimension for both R-neighbor search and K-nearest neighbor search.
Sovremennye Informacionnye Tehnologii i IT-obrazovanie, 2018
We present a new dynamic index structure for multidimensional data. The considered index structure is based on an extended grid file concept. Strengths and weaknesses of the grid files were analyzed. Based on that analysis we proposed to strengthen the concept of grid files by considering their stripes as linear hash tables, introducing the concept of chunk and representing the grid file structure as a graph. As a result we significantly reduced the amount of disk operations. Efficient algorithms for storage and access of index directory are proposed, in order to minimize memory usage and lookup operations complexities. Estimations of complexities for these algorithms are presented. A comparison of our approach to support effective grid file structure with other known approaches is presented. This comparison shows effectiveness of suggested metadata storage environment. An estimation of directory size is presented. A prototype to support of our grid file concept has been created and...
As a member of R-tree family, R*-tree is widely used in multimedia databases and spatial databases, in which NN (Nearest Neighbor) search is very popular. According to our investigations, (1) the degree of objects clustering in the leaf nodes is a very important factor on performance of NN search; (2) Normally, in R*-tree, its objects are not well-clustered in their leaf nodes. This paper proposes a new index structure, called Clustering-Based R*-tree (denoted CBR*-tree), for static databases by introducing clustering technology to R*-tree. Although some packing algorithms for R-trees have been proposed, all of them try to pack the same (or roughly same) number of objects in each leaf node, which often result in that the distribution of objects in leaf nodes can not reflect their actual distribution. The experimental results show that the CBR*-tree has better NN search performance than R*-tree and packed R-trees.
2001
Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST, is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST's tuning parameters.
Proc. of the 8th Int'l Conf. on Database Systems …, 2003
2015
Now-a-days most of the scientific and business applications require very large datasets for storage and manipulation and also high dimensionality is needed for achieving high accuracy. The high dimensionality and enormous size of such datasets pose very challenging problems in management, analysis, and retrieval of the datasets. The very large datasets crossing size even petabytes and high dimensionality its ranges vary from ten to several thousands. Most of the existing indexing structure is adequate to access vary large datasets and high dimensionality applications. This only motivate to design a new tool to access vary large datasets and high dimensionality effectively and efficiently. The main aim of this paper is to develop a new dynamic indexing structure to support vary large datasets and high dimensionality. This new structure is tree based used to facilitate efficient access. It is highly adaptable to any type of applications. The newly developed structure is based on neare...
GeoInformatica, 2012
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log 2 n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.