Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
As a member of R-tree family, R*-tree is widely used in multimedia databases and spatial databases, in which NN (Nearest Neighbor) search is very popular. According to our investigations, (1) the degree of objects clustering in the leaf nodes is a very important factor on performance of NN search; (2) Normally, in R*-tree, its objects are not well-clustered in their leaf nodes. This paper proposes a new index structure, called Clustering-Based R*-tree (denoted CBR*-tree), for static databases by introducing clustering technology to R*-tree. Although some packing algorithms for R-trees have been proposed, all of them try to pack the same (or roughly same) number of objects in each leaf node, which often result in that the distribution of objects in leaf nodes can not reflect their actual distribution. The experimental results show that the CBR*-tree has better NN search performance than R*-tree and packed R-trees.
Keywords: Indexing; R* tree indexing; P+ tree indexing.
In multimedia databases, the spatial index structures based on trees (like R-tree, M-tree) have been proved to be efficient and scalable for low-dimensional data retrieval. However, if the data dimensionality is too high, the hierarchy of nested regions (represented by the tree nodes) becomes spatially indistinct. Hence, the query processing deteriorates to inefficient index traversal (in terms of random-access I/O costs) and in such case the tree-based indexes are less efficient than the sequential search. This is mainly due to repeated access to many nodes at the top levels of the tree. In this paper we propose a modified storage layout of tree-based indexes, such that nodes belonging to the same tree level are stored together. Such level-ordered storage allows to prefetch several top levels of the tree to the buffer pool by only a few or even a single contiguous I/O operation (i.e. one-seek read). The experimental results show that our approach can speedup the tree-based search significantly.
Spatial databases are optimized for the management of data stored based on their geometric space. Researchers through high degree scalability have proposed several spatial indexing structures towards this effect. Among these indexing structures is the X-tree. The existing X-trees and its variants are designed for dynamic environment, with the capability for handling insertions and deletions. Notwithstanding, the X-tree degrades on retrieval performance as dimensionality increases and brings about poor worst-case performance than sequential scan. We propose a new X-tree packing techniques for static spatial databases which performs better in space utilization through cautious packing. This new improved structure yields two basic advantage: It reduces the space overhead of the index and produces a better response time, because the aX-tree has a higher fan-out and so the tree always ends up shorter. New model for super-node construction and effective method for optimal packing using an improved str bulk-loading technique is proposed. The study reveals that proposed system performs better than many existing spatial indexing structures.
2005
In this paper, we propose an efficient access method, named MK-tree, to dynamically index large data sets in high dimensional spaces. It is an extension of Mtree with key dimension to improve the efficiency of space partition and reduce the response time of similarity search for high dimensional data. The main idea behind the key dimension is to make the fanout of tree larger by partitioning a subspace further into two subspaces, called a twin-node, according to the key dimension. To get a high space utilization, we conduct data reallocation within a twin-node dynamically, therefore further improve the performance of MK-tree. Our experimental results show that a higher filtering efficiency can be obtained by using the concept of key dimension for both R-neighbor search and K-nearest neighbor search.
Proc. of the 8th Int'l Conf. on Database Systems …, 2003
International Journal of Future Computer and Communication, 2013
We propose the PATRICIA-hypercube-tree, or PH-tree, a multi-dimensional data storage and indexing structure. It is based on binary PATRICIA-tries combined with hypercubes for efficient data access. Space efficiency is achieved by combining prefix sharing with a space optimised implementation. This leads to storage space requirements that are comparable or below storage of the same data in non-index structures such as arrays of objects. The storage structure also serves as a multi-dimensional index on all dimensions of the stored data. This enables efficient access to stored data via point and range queries. We explain the concept of the PH-tree and demonstrate the performance of a sample implementation on various datasets and compare it to other spatial indices such as the kD-tree. The experiments show that for larger datasets beyond 10^7 entries, the PH-tree increasingly and consistently outperforms other structures in terms of space efficiency, query performance and update performance. For some highly skewed datasets, it even shows super-constant performance, becoming faster for larger datasets.
2015
Now-a-days most of the scientific and business applications require very large datasets for storage and manipulation and also high dimensionality is needed for achieving high accuracy. The high dimensionality and enormous size of such datasets pose very challenging problems in management, analysis, and retrieval of the datasets. The very large datasets crossing size even petabytes and high dimensionality its ranges vary from ten to several thousands. Most of the existing indexing structure is adequate to access vary large datasets and high dimensionality applications. This only motivate to design a new tool to access vary large datasets and high dimensionality effectively and efficiently. The main aim of this paper is to develop a new dynamic indexing structure to support vary large datasets and high dimensionality. This new structure is tree based used to facilitate efficient access. It is highly adaptable to any type of applications. The newly developed structure is based on neare...
2005
We present the interpolation search tree (ISB-tree), a new cache-aware indexing scheme that supports update operations (insertions and deletions) in O(1) worst-case (w.c.) block transfers and search operations in O(logB log n) expected block transfers, where B represents the disk block size and n denotes the number of stored elements. The expected search bound holds with high probability for a large class of (unknown) input distributions. The w.c. search bound of our indexing scheme is O(logB n) block transfers. Our update and expected search bounds constitute a considerable improvement over the O(logB n) w.c. block transfer bounds for search and update operations achieved by the B-tree and its numerous variants. This is also suggested by a set of preliminary experiments we have carried out. Our indexing scheme is based on an externalization of a main memory data structure based on interpolation search.
Distributed and Parallel Databases, 2005
Multidimensional indexing is concerned with the indexing of multi-attributed records, where queries can be applied on some or all of the attributes. Indexing multi-attributed records is referred to by the term multidimensional indexing because each record is viewed as a point in a multidimensional space with a number of dimensions that is equal to the number of attributes. The values of the point coordinates along each dimension are equivalent to the values of the corresponding attributes. In this paper, the PN-tree, a new index structure for multidimensional spaces, is presented. This index structure is an efficient structure for indexing multidimensional points and is parallel by nature. Moreover, the proposed index structure does not lose its efficiency if it is serially processed or if it is processed using a small number of processors. The PN-tree can take advantage of as many processors as the dimensionality of the space. The PN-tree makes use of B + -trees that have been developed and tested over years in many DBMSs. The PN-tree is compared to the Hybrid tree that is known for its superiority among various index structures. Experimental results show that parallel processing of the PN-tree reduces significantly the number of disk accesses involved in the search operation. Even in its serial case, the PN-tree outperforms the Hybrid tree for large database sizes.
2007
Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have been considered, high-performance environments have been used, in particular systems with a large main memory. Accessing data on disk has largely been avoided because disk operations are considered to be too slow. It has been shown, however, that using large amounts of memory is generally not an economic choice. Therefore, we propose the NV-tree, which is a very efficient disk-based data structure that can give good approximate answers to nearest neighbor queries with a single disk operation, even for very large collections of high-dimensional data. Using a single NV-tree, the returned results have high recall but contain a number of false positives. By combining two or three NV-trees, most of those false positives can be avoided while retaining the high recall. Finally, we compare the NV-tree to Locality Sensitive Hashing, a popular method for-distance search. We show that they return results of similar quality, but the NV-tree uses many fewer disk reads.
Technical Indexing of spatial database is discussed in this paper. Several methods of indexing moving objects will be presented taking into account the faults and strengths of each one, the two main types of applications that manage moving objects will be discussed. "2-level index" is one of the recent variant indexing relied on R-tree, thereafter this variant will be presented and located between types of applications, its principle performance will be explained, the main defect which is redundancy of nodes after each update will be shown, and finally our contribution to optimize it will be presented.
2001
Nowadays feature vector based similarity search is increasingly emerging in database systems. Consequently, many multidimensional data index techniques have been widely introduced to database researcher community. These index techniques are categorized into two main classes: SP (space partitioning)/KD-tree-based and DP (data partitioning)/R-tree-based. Recently, a hybrid index structure has been proposed. It combines both SP/KDtree-based and DP/R-tree-based techniques to form a new, more efficient index structure. However, weaknesses are still existing in techniques above. In this paper, we introduce a novel and flexible index structure for multidimensional data, the SH-tree (Super Hybrid tree). Theoretical analyses show that the SHtree is a good combination of both techniques with respect to both presentation and search algorithms. It overcomes the shortcomings and makes use of their positive aspects to facilitate efficient similarity searches.
International Journal of Database Management Systems
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide, traffic monitor based administrations and location-based services. Tracking the changing places of objects has turned into important issues. The moving entities send their positions to the server through a system and large amount of data is generated from these objects with high frequent updates so we need an index structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition index is used because grid-index can reduce the overlap and coverage between MBR. The query performance will be efficient by using these methods. We perform an extensive experimental study to compare the two approaches on modern hardware.
The main aim of this paper is to develop a new dynamic indexing structure to support very large datasets and high dimensionality. This new structure is tree based used to facilitate efficient access. It is highly adaptable to any type of applications. The newly developed structure is based on nearest neighbors' method with exception of linearly scan the very large datasets. The NewTree surely minimizes adverse effect of the curse of dimensionality. It means that the most existing indexing techniques degrade rapidly when dimensionality goes higher. The major drawback here is the retrieval of subsets from the huge storage system. The NewTree structure can handle very efficiently and effectively during adding new data. When the new data are added and the shape of the structure does not change. The performance of the newly developed structure can be evaluated with SR Tree, existing indexing structure. The results clearly show that the efficiency of the newly developed structure is superior in both time complexity and memory complexity than SR Tree.
2010
Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for highdimensional data is the relationship between the geometry of the data and the organization of an index structure. In this paper, we study the performance of a new index structure, called Divisive-Agglomerative Hierarchical Clustering tree (DAHC-tree), which reduces the effects imposed by the above liability. DAHC-tree is constructed by dividing and grouping the data set into compact clusters. We perform a rigorous experimental design and analyze the trade-offs involved in building such an index structure. Additionally, we present extensive experiments comparing our method against state-of-the-art of exact and approximate solutions. The conducted analysis and the reported comparative test results demonstrate that our technique significantly improves the performance of similarity queries.
In order to handle spatial data efficiently, as required in computer aided design and geo-data applications, a database system needs an mdex mechanism that ti help it retrieve data items quickly accordmg to their spatial locations However, traditional mdexmg methods are not well suited to data oblects of non-zero size located m multidimensional spaces In this paper we describe a dynarmc mdex structure called an R-tree winch meets this need, and give algorithms for searching and updatmg it. We present the results of a series of tests which indicate that the structure performs well, and conclude that it is useful for current database systems m spatial applications
2001
Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the space into partitions. A new extensible index structure, termed SP-GiST, is presented that supports this class of data structures, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST's tuning parameters.
2009
Mobile query processing is, currently, a very active research field. Range and nearest neighbor queries are commonly used in spatiotemporal databases and location based services (LBS). In this paper, we focus on finding nearest neighbors of a query point within a certain distance range. We propose a new indexing structure CN-tree, Compact N-tree, based on a recent indexing technique called N-tree. CN-tree joins efficiency of N-tree's data partitioning scheme to pertinent objects' approximation with minimal bounding rectangles of R-trees which are reported to be the best performing for range search. We show how we use the approximation in constructing CN-tree and, then, how this index can support range queries efficiently by minimizing computation of distances and avoiding overlapping of minimal bounding rectangles. The experimental results through the comparison with the well know R*-tree, show that the proposed CN-tree widely outperforms R*-tree as an in-memory index and it presents competitive performances when used as an in-disk index.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.