Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2009
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic, that is, few of them tolerate insertion of elements at reasonable cost over an existing index and only a few work efficiently in secondary memory.
2010
Metric space searching is an emerging technique to address the problem of efficient similarity searching in many applications, including multimedia databases and other repositories handling complex objects. Although promising, the metric space approach is still immature in several aspects that are well established in traditional databases. In particular, most indexing schemes are not dynamic. From the few dynamic indexes, even fewer work well in secondary memory. That is, most of them need the index in main memory in order to operate efficiently. In this paper we introduce a secondary-memory variant of the Dynamic Spatial Approximation Tree with Clusters (DSACL-tree) which has shown to be competitive in main memory. The resulting index handles well the secondary memory scenario and is competitive with the state of the art. The resulting index is a much more practical data structure that can be useful in a wide range of database applications.
2010
The metric space model allows abstracting many similarity search problems. Similarity search has multiple applications especially in the multimedia databases area. The idea is to index the database so as to accelerate similarity queries. Although there are several promising indices, few of them are dynamic, i.e., once created very few allow to perform insertions and deletions of elements at a reasonable cost.
2003
Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms. These data structures are hybrid schemes, with the full features of dynamic spatial approximation trees and able of using the available memory to improve the query time. It has been shown that they compare favorably against alternative data structures in spaces of medium difficulty. In this paper we complete and improve hybrid dynamic spatial approximation trees, by presenting a new search alternative, an algorithm to remove objects from the tree, and an improved way of managing the available memory. The result is a fully dynamic and optimized data structure for similarity searching in metric spaces.
Journal of Computer Science Technology, 2014
Metric space searching is an emerging technique to address the problem of similarity searching in many applications. In order to efficiently answer similarity queries, the database must be indexed. In some interesting real applications dynamism is an indispensable property of the index. There are very few actually dynamic indexes that support not only searches, but also insertions and deletions of elements. The dynamic spatial approximation tree (DSAT) is a data structure specially designed for searching in metric spaces, which compares favorably against other data structures in high dimensional spaces or queries with low selectivity. Insertions are efficient and easily supported in DSAT, but deletions degrade the structure over time. Several methods are proposed to handle deletions over the DSAT. One of them has shown to be superior to the others, in the sense that it permits controlling the expected deletion cost as a proportion of the insertion cost and searches does not overly degrade after several deletions. In this paper we propose and study a new alternative deletion method, based on the better existing strategy. The outcome is a fully dynamic data structure that can be managed through insertions and deletions over arbitrarily long periods of time without any significant reorganization.
2010
Similarity search in high-dimensional metric spaces is a key operation in many applications, such as multimedia databases, image retrieval, object recognition, and others. The high dimensionality of the data requires special index structures to facilitate the search. A problem regarding the creation of suitable index structures for highdimensional data is the relationship between the geometry of the data and the organization of an index structure. In this paper, we study the performance of a new index structure, called Divisive-Agglomerative Hierarchical Clustering tree (DAHC-tree), which reduces the effects imposed by the above liability. DAHC-tree is constructed by dividing and grouping the data set into compact clusters. We perform a rigorous experimental design and analyze the trade-offs involved in building such an index structure. Additionally, we present extensive experiments comparing our method against state-of-the-art of exact and approximate solutions. The conducted analysis and the reported comparative test results demonstrate that our technique significantly improves the performance of similarity queries.
IEEE Transactions on Knowledge and Data Engineering, 2017
Spatial queries including similarity search and similarity joins are useful in many areas, such as multimedia retrieval, data integration, and so on. However, they are not supported well by commercial DBMSs. This may be due to the complex data types involved and the needs for flexible similarity criteria seen in real applications. In this paper, we propose a versatile and efficient diskbased index for metric data, the Space-filling curve and Pivot-based B +-tree (SPB-tree). This index leverages the B +-tree, and uses space-filling curve to cluster data into compact regions, thus achieving storage efficiency. It utilizes a small set of so-called pivots to reduce significantly the number of distance computations when using the index. Further, it makes use of a separate random access file to support a broad range of data. By design, it is easy to integrate the SPB-tree into an existing DBMS. We present efficient algorithms for processing similarity search and similarity joins, as well as corresponding cost models based on SPB-trees. Extensive experiments using both real and synthetic data show that, compared with state-of-the-art competitors, the SPB-tree has much lower construction cost, smaller storage size, and supports more efficient similarity search and similarity joins with high accuracy cost models.
2003
Dynamic spatial approximation trees (dsa–trees) are efficient data structures for searching metric spaces. However, using enough storage, pivoting schemes beat dsa–trees in any metric space. In this paper we combine both concepts in a data structure that enjoys the features of dsa–trees and that improves query time by making the best use of the available memory. We show experimentally that our data structure is competitive for searching metric spaces.
String Processing and …, 2003
Dynamic spatial approximation trees (dsa-trees) are efficient data structures for searching metric spaces. However, using enough storage, pivoting schemes beat dsa-trees in any metric space. In this paper we combine both concepts in a data structure that enjoys the features of dsa-trees and that improves query time by making the best use of the available memory. We show experimentally that our data structure is competitive for searching metric spaces.
2007
Over the last two decades, much research effort has been spent on nearest neighbor search in high-dimensional data sets. Most of the approaches published thus far have, however, only been tested on rather small collections. When large collections have been considered, high-performance environments have been used, in particular systems with a large main memory. Accessing data on disk has largely been avoided because disk operations are considered to be too slow. It has been shown, however, that using large amounts of memory is generally not an economic choice. Therefore, we propose the NV-tree, which is a very efficient disk-based data structure that can give good approximate answers to nearest neighbor queries with a single disk operation, even for very large collections of high-dimensional data. Using a single NV-tree, the returned results have high recall but contain a number of false positives. By combining two or three NV-trees, most of those false positives can be avoided while retaining the high recall. Finally, we compare the NV-tree to Locality Sensitive Hashing, a popular method for-distance search. We show that they return results of similar quality, but the NV-tree uses many fewer disk reads.
String Processing and Information Retrieval, 2002
The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. Its main drawbacks are: costly construction time, poor performance in low dimensional spaces or queries with high selectivity, and the fact of being a static data structure, that is, once built, one cannot add or delete elements. These facts rule it out for many interesting applications. In this paper we overcome these weaknesses. We present a dynamic version of the sa-tree that handles insertions and deletions, showing experimentally that the price of adding dynamism is rather low. This is remarkable by itself since very few data structures for metric spaces are fully dynamic. In addition, we show how to obtain large improvements in construction and search time for low dimensional spaces or highly selective queries. The outcome is a much more practical data structure that can be useful in a wide range of applications.
2001
The spatial approximation tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown to compare favorably against alternative data structures in spaces of high dimension or queries with low selectivity. The main drawback of the sa-tree is that it is a static data structure, that is, once built, it is difficult to add new elements to it. This rules it out for many interesting applications. In this paper we overcome this weakness. We propose and study several methods to handle insertions in the sa-tree. Some are classical solutions well known in the data structures community, while the most promising ones have been specifically developed considering the particular properties of the sa-tree, and involve new algorithmic insights in the behavior of this data structure. As a result, we show that it is viable to modify the sa-tree so as to permit fast insertions while keeping its good search efficiency
Hybrid dynamic spatial approximation trees are recently proposed data structures for searching in metric spaces, based on combining the concepts of spatial approximation and pivot based algorithms. These data structures are hybrid schemes, with the full features of dynamic spatial approximation trees and able of using the available memory to improve the query time. It has been shown that they compare favorably against alternative data structures in spaces of medium difficulty.
1997
A new access method, called M-tree, is proposed to organize and search large data sets from a generic "metric space", i.e. where object proximity is only defined by a distance function satisfying the positivity, symmetry, and triangle inequality postulates. We detail algorithms for insertion of objects and split management, which keep the M-tree always balanced -several heuristic split alternatives are considered and experimentally evaluated. Algorithms for similarity (range and k-nearest neighbors) queries are also described.
Information Systems, 2016
Metric indices support efficient similarity searches in metric spaces. This problem is central to many applications, including multimedia databases and repositories handling complex objects. Most metric indices are designed for main memory, and also most of them are static, that is, do not support insertions and deletions of objects. In this paper we introduce new metric indices for secondary memory that support updates, that is, they are dynamic. First, we show how the dynamic and memory-based Dynamic Spatial Approximation Tree (DSAT) can be extended to operate on secondary memory. Second, we design a dynamic and secondary-memory-based version of the static List of Clusters (LC), which performs well on high-dimensional spaces. The new structure is called Dynamic LC (DLC). Finally, we combine the DLC with the in-memory version of DSAT to create a third structure, Dynamic Set of Clusters (DSC), which improves upon the other two in various cases. We compare the new structures with the state of the art, showing that they are competitive and outstand in several scenarios, especially on spaces of medium and high dimensionality.
Multimedia Tools and Applications, 2003
In order to speedup retrieval in large collections of data, index structures partition the data into subsets so that query requests can be evaluated without examining the entire collection. As the complexity of modern data types grows, metric spaces have become a popular paradigm for similarity retrieval. We propose a new index structure, called D-Index, that combines a novel clustering technique and the pivot-based distance searching strategy to speed up execution of similarity range and nearest neighbor queries for large files with objects stored in disk memories. We have qualitatively analyzed D-Index and verified its properties on actual implementation. We have also compared D-Index with other index structures and demonstrated its superiority on several real-life data sets. Contrary to tree organizations, the D-Index structure is suitable for dynamic environments with a high rate of delete/insert operations.
2002
The Spatial Approximation Tree (sa-tree) is a recently proposed data structure for searching in metric spaces. It has been shown that it compares favorably against alternative data structures in spaces of high dimension or queries with low selectivity. The main drawback of the ...
ACM Journal of Experimental Algorithmics, 2009
Proximity searching consists of retrieving from a database those elements that are similar to a query object. The usual model for proximity searching is a metric space where the distance, which models the proximity, is expensive to compute. An index uses precomputed distances to speedup query processing. Among all the known indices, the baseline for performance for about 20 years has been AESA. This index uses an iterative procedure, where at each iteration it first chooses the next promising element (“pivot”) to compare to the query, and then it discards database elements that can be proved not relevant to the query using the pivot. The next pivot in AESA is chosen as the one minimizing the sum of lower bounds to the distance to the query proved by previous pivots. In this article, we introduce the new index iAESA , which establishes a new performance baseline for metric space searching. The difference with AESA is the method to select the next pivot. In iAESA, each candidate sorts...
1997
M-tree is a dynamic access method suitable to index generic "metric spaces", where the function used to compute the distance between any two objects satisfies the positivity, symmetry, and triangle inequality postulates. The M-tree design fulfills typical requirements of multimedia applications, where objects are indexed using complex features, and similarity queries can require application of time-consuming distance functions. In this paper we describe the basic search and management algorithms of M-tree, introduce several heuristic split policies, and experimentally evaluate them, considering both I/O and CPU costs. Results also show that M-tree performs better than R * -tree on highdimensional vector spaces. * This work has been partially supported by ESPRIT LTR project no. 9141, HERMES (Foundations of High Performance Multimedia Information Management Systems). P. Zezula has also been supported by Grants GACR No. 102/96/0986 and KON-TAKT No. PM96 S028.
Information Systems, 2011
Searching in a dataset for elements that are similar to a given query element is a core problem in applications that manage complex data, and has been aided by metric access methods (MAMs). A growing number of applications require indices that must be built faster and repeatedly, also providing faster response for similarity queries. The increase in the main memory capacity and its lowering costs also motivate using memory-based MAMs. In this paper, we propose the Onion-tree, a new and robust dynamic memorybased MAM that slices the metric space into disjoint subspaces to provide quick indexing of complex data. It introduces three major characteristics: (i) a partitioning method that controls the number of disjoint subspaces generated at each node; (ii) a replacement technique that can change the leaf node pivots in insertion operations; and (iii) range and k-NN extended query algorithms to support the new partitioning method, including a new visit order of the subspaces in k-NN queries. Performance tests with both real-world and synthetic datasets showed that the Onion-tree is very compact. Comparisons of the Onion-tree with the MM-tree and a memory-based version of the Slim-tree showed that the Onion-tree was always faster to build the index. The experiments also showed that the Onion-tree significantly improved range and k-NN query processing performance and was the most efficient MAM, followed by the MM-tree, which in turn outperformed the Slim-tree in almost all the tests.
ADBIS (Local Proceedings), 2004
In this paper we introduce the Pivoting M-tree (PM-tree), a metric access method combining M-tree with the pivot-based approach. While in M-tree a metric region is represented by a hyper-sphere, in PM-tree the shape of a metric region is determined by intersection of the hyper-sphere and a set of hyper-rings. The set of hyper-rings for each metric region is related to a fixed set of pivot objects. As a consequence, the shape of a metric region bounds the indexed objects more tightly which, in turn, significantly improves the overall efficiency of similarity search. We present basic algorithms on PM-tree and two cost models for range query processing. Finally, the PM-tree efficiency is experimentally evaluated on large synthetic as well as real-world datasets.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.