Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2018, Proceedings of the 7th International Conference on Data Science, Technology and Applications
The main objective of this work is the proposal of a decentralized data structure storing a large amount of data under the assumption that it is not possible or convenient to use a single workstation to host all data. The index is distributed over a computer network and the performance of the search, insert, delete operations are close to the traditional indices that use a single workstation. It is based on k-d trees and it is distributed across a network of "peers", where each one hosts a part of the tree and uses message passing for communication between peers. In particular, we propose a novel version of the k-nearest neighbour algorithm that starts the query in a randomly chosen peer and terminates the query as soon as possible. Preliminary experiments have demonstrated that in about 65% of cases it starts a query in a random peer that does not involve the peer containing the root of the tree and in the 98% of cases it terminates the query in a peer that does not contain the root of the tree. 2 RESEARCH IDEAS AND RESULTS This section introduces the problem description and our proposal to cope with it.
2009
As one of the most important technologies for implementing large-scale distributed systems, peer-to-peer (P2P) computing has attracted much attention in both research and industrial communities, for its advantages such as high availability, high performance, and high flexibility to the dynamics of networks.
GeoInformatica, 2012
This work introduces decentralized query processing techniques based on MIDAS, a novel distributed multidimensional index. In particular, MIDAS implements a distributed k-d tree, where leaves correspond to peers, and internal nodes dictate message routing. MIDAS requires that peers maintain little network information, and features mechanisms that support fault tolerance and load balancing. The proposed algorithms process point and range queries over the multidimensional indexed space in only O(log n) hops in expectance, where n is the network size. For nearest neighbor queries, two processing alternatives are discussed. The first, termed eager processing, has low latency (expected value of O(log n) hops) but may involve a large number of peers. The second, termed iterative processing, has higher latency (expected value of O(log 2 n) hops) but involves far fewer peers. A detailed experimental evaluation demonstrates that our query processing techniques outperform existing methods for settings involving real spatial data as well as in the case of high dimensional synthetic data.
Indexing of high-dimensional data is essential for building applications such as multimedia retrieval, data mining, and spatial databases. Traditional index structures rely on centralized processing. This approach does not scale with the rapidly increasing amount of application data available on massively distributed systems like the Internet.
2007
Abstract Efficient multi-dimensional data search has received much attention in centralized systems. However, its implementation in large-scale distributed systems is not a trivial job and remains to be a challenge. In this paper, SDI, a new succinct multi-dimensional balanced tree structure based on peer-to-peer technology, is presented. With SDI structure, the query efficiency can be bounded by O (log N). Compared with previous tree-based methods, SDI has extremely low maintenance cost.
2009 29th IEEE International Conference on Distributed Computing Systems, 2009
In this paper, we study the problem of indexing multidimensional data in the P2P networks based on distributed hash tables (DHTs). We identify several design issues and propose a novel over-DHT indexing scheme called m-LIGHT. To preserve data locality, m-LIGHT employs a clever naming mechanism that gracefully maps the index tree into the underlying DHT so that it achieves efficient index maintenance and query processing. Moreover, m-LIGHT leverages a new data-aware index splitting strategy to achieve optimal load balance among peer nodes. We conduct an extensive performance evaluation for m-LIGHT. Compared to the state-of-the-art indexing schemes, m-LIGHT substantially saves the index maintenance overhead, achieves a more balanced load distribution, and improves the range query performance in both bandwidth consumption and response latency.
1996
Abstract In this paper we present a generalization of the kd tree data structure suitable for an efficient management and querying in a distributed framework. We present optimal searching algorithm for exact, partial, and range search queries. Optimality is in the sense that (1) only servers that could have k-dimensional points related to a query reply to it and that (2) the client issuing the query can deterministically know when the search is complete.
2010
This paper presents a new balanced, distributed data structure for storing data with multidimensional keys in a peer-to-peer network. It supports range queries as well as single point queries which are routed in O (logn) hops. Our structure, called SkipTree, is fully decentralized with each node being connected to O (logn) other nodes.
2005
This paper presents the SkipTree, a new balanced, distributed data structure for storing data with multidimensional keys in a peer-to-peer network. The SkipTree supports range queries as well as single point queries which are routed in O (log n) hops. SkipTree is fully decentralized with each node being connected to O (log n) other nodes. The memory usage for maintaining the links at each node is O (log n log log n) on average and O (log 2 n) in the worst case. Load balance is also guaranteed to be within a constant factor.
1998
In this paper we present a data structure for searching in multi-dimensional point sets in distributed environments and discuss its experimental evaluation also through a comparison with previous proposals. The data structure is based on an extension ofk-d trees. The technological reference context is a distributed environment where multicast (ie, restricted broadcast) is allowed, but it is also shown how to avoid using it.
International Journal of Future Computer and Communication, 2013
… of the 10th ACM workshop on Web …, 2008
In this paper we describe HiPPIS, a system that enables efficient storage and on-line querying of multidimensional data organized into concept hierarchies and dispersed over a network. Our scheme utilizes an adaptive algorithm that automatically adjusts the level of indexing according to the granularity of the incoming queries, without assuming any prior knowledge of the workload. Efficient roll-up and drill-down operations take place in order to maximize the performance by minimizing query flooding. Extensive experimental evaluations show that, on top of the advantages that a distributed storage offers, our method answers the large majority of incoming queries, both point and aggregate ones, without flooding the network. At the same time, it manages to preserve the hierarchical nature of data. These characteristics are maintained even after sudden shifts in the workload.
Lecture Notes in Computer Science, 2019
Despite the prospect of a vast Web of interlinked data, the Semantic Web today mostly fails to meet its potential. One of the main problems it faces is rooted in its current architecture, which totally relies on the availability of the servers providing access to the data. These servers are subject to failures, which often results in situations where some data is unavailable. Recent advances have proposed decentralized peer-to-peer based architectures to alleviate this problem. However, for query processing these approaches mostly rely on flooding, a standard technique for peer-to-peer systems, which can easily result in very high network traffic and hence cause high query response times. To still enable efficient query processing in such networks, this paper proposes two indexing schemes, which in a decentralized fashion aim at efficiently finding nodes with relevant data for a given query: Locational Indexes and Prefix-Partitioned Bloom Filters. Our experiments show that such indexing schemes are able to considerably speed up query processing times compared to existing approaches.
1997
Abstract In this paper we discuss some design issues concerning a semi-dynamic data structure for searching in multidimensional point sets in distributed environments. The data structure is based on an extension of kd trees and supports exact, partial, and range search queries. We assume multicast is available in our distributed environment, but discuss how to use it only when needed and investigate, through a cost-model, the best strategy to deal with range queries.
2022
Data structures known as $k$-d trees have numerous applications in scientific computing, particularly in areas of modern statistics and data science such as range search in decision trees, clustering, nearest neighbors search, local regression, and so forth. In this article we present a scalable mechanism to construct $k$-d trees for distributed data, based on approximating medians for each recursive subdivision of the data. We provide theoretical guarantees of the quality of approximation using this approach, along with a simulation study quantifying the accuracy and scalability of our proposed approach in practice.
International Journal of Computer Applications, 2013
Nowadays, DHT-based P2P technology is used as a basis in many wide spread applications because of its scalability, robustness, and load balance. Many applications, including file sharing, communication and live video streaming are in a large distributed network environment. For an efficient and effective search in large data repositories, complex query processing becomes a major issue for DHT. Towards the goal of supporting complex queries in DHT-based P2P systems, this paper focuses on the usage of k-dimensional tree to build a tree-based index. The proposed index is built without modifying the structure of the overlay network. In this paper, the load balancing among peers is also considered according to the usage of kd-tree. Therefore the performance of kd-tree is studied and show that how it can affect the proposed index over P2P network. In this paper, PlanetSim simulator is used to implement the proposed index and evaluate the performance of the index by using various metrics.
2005
Similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. Existing centralized search structures can speed-up retrieval, but they do not scale up to large volume of data because the response time is linearly increasing with the size of the searched file. In this article, we study the problem of executing the nearest neighbor (s) queries in a distributed metric structure, which is based on the P2P communication paradigm and the generalized hyperplane partitioning.
2006 IEEE International Conference on Systems, Man and Cybernetics, 2006
Broadcasting data with an index is an effective way to disseminate public information to a large clients. For a server, using multiple channels to provide services (e.g., location-based services) makes the broadcast cycle shorter than using one channel. Among location-based services, the k nearest neighbors (kNN) search is an important one and finds the k closest objects to a query point in a multi-dimensional space. This paper considers k nearest neighbors search on a broadcast R-tree in a multi-channel environment. We assume that a mobile client can only tune into a specified channel at one time instance. We study how a server generates the broadcast schedules on multiple channels and explore how a client executes the kNN search on the broadcast. Different broadcast schedules with the client kNN search processing makes different kNN search protocols. The objectives of the protocols is to minimize the latency (i.e., the time elapsed between issuing and termination of the query), tuning time (i.e., the amount of time spent on listening to the channel), and the memory usage for kNN search processing. Last, we present our experiments and the experiment results validate that our mechanisms achieve the objectives.
2005
In this paper, we propose an efficient access method, named MK-tree, to dynamically index large data sets in high dimensional spaces. It is an extension of Mtree with key dimension to improve the efficiency of space partition and reduce the response time of similarity search for high dimensional data. The main idea behind the key dimension is to make the fanout of tree larger by partitioning a subspace further into two subspaces, called a twin-node, according to the key dimension. To get a high space utilization, we conduct data reallocation within a twin-node dynamically, therefore further improve the performance of MK-tree. Our experimental results show that a higher filtering efficiency can be obtained by using the concept of key dimension for both R-neighbor search and K-nearest neighbor search.
International Journal of Software Engineering and Knowledge Engineering, 2005
In this paper we present a multi-key index model that enables us to search a record with more than one attribute values in distributed database systems. Indices provide fast and efficient access of data and so become a major aspect in centralized database systems. Most of the centralized database systems use B+ tree or other types of index structures such as bit vector, graph structure, grid file etc. But in distributed database systems no index model is found in the literature. Therefore efficient access is a major problem in distributed databases. Our proposed index model avoids the queryflooding problem of existing system and thus optimizes network bandwidth.
2004
Peer-to-peer distributed hash table (DHT) systems make it simple to discover specific data when their complete identifiers-or keys-are known in advance. In practice, however, users looking up resources stored in peer-to-peer systems often have only partial information for identifying these resources. In this paper, we describe techniques for indexing data stored in peerto-peer DHT networks, and discovering the resources that match a given user query. Our system creates multiple indexes, organized hierarchically, which permit users to locate data even using scarce information, although at the price of a higher lookup cost. The data itself is stored on only one (or few) of the nodes. Experimental evaluation demonstrates the effectiveness of our indexing techniques on a distributed peer-to-peer bibliographic database with realistic user query workloads.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.