Academia.eduAcademia.edu

Distributed Databases

3,165 papers
1,456 followers
AI Powered
Distributed databases are databases that store data across multiple physical locations, which can be on different servers or networks. They enable data to be accessed and managed concurrently by multiple users, ensuring data consistency and availability while allowing for scalability and fault tolerance in data management.
The Hadoop Distributed File System (HDFS) is designed to store very large data sets reliably, and to stream those data sets at high bandwidth to user applications. In a large cluster, thousands of servers both host directly attached... more
Peer-to-peer file-sharing networks are currently receiving much attention as a means of sharing and distributing information. However, as recent experience shows, the anonymous, open nature of these networks offers an almost ideal... more
In this paper we survey, consolidate, and present the state of the art in distributed database concurrency control. The heart of our analysts is a decomposition of the concurrency control problem into two major subproblems: read-write and... more
Finding information in a peer-to-peer system currently requires either a costly and vulnerable central index, or ooding the network with queries. In this paper we i n troduce the concept of Routing Indices RIs, which allow nodes to... more
Big data refers to data volumes in the range of exabytes (10 18 ) and beyond. Such volumes exceed the capacity of current on-line storage systems and processing systems. Data, information, and knowledge are being created and collected at... more
With the existence of many large transaction databases, the huge amounts of data, the high scalability of distributed systems, and the easy partition and distribution of a centralized database, it is important to inuestzgate eficient... more
he proliferation of file systems, navigational database systems (hierarchical and network). and relational database systems during the past three decades has created difficult problems arising from the need to access heterogeneous files... more
Intuitively, data management and data integration tools should be well-suited for exchanging information in a semantically meaningful way. Unfortunately, they suffer from two significant problems: they typically require a comprehensive... more
Replication is an area of interest to both distributed systems and databases. The solutions developed from these two perspectives are conceptually similar but differ in many aspects: model, assumptions, mechanisms, guarantees provided,... more
We introduce PlanetP, a content addressable publish/subscribe service for unstructured peer-to-peer (P2P) communities. PlanetP supports content addressing by providing: (1) a gossiping layer used to globally replicate a membership... more
A data warehouse stores materialized views over data from one or more sources in order to provide fast access to the integrated data, regardless of the availability of the data sources. Warehouse views need to be maintained in response to... more
Programme 1-Architectures parallèles, bases de données, réseaux et systèmes distribués Projet Rodin Rapport de recherche n ¢
To simplify the task of obtaining information from the vast number of information sources that are available on the World Wide Web (WWW), we are building tools to build information mediators for extracting and integrating data from... more
The limitations of traditional databases, in particular the relational model, to cover the requirements of current applications has lead the development of new database technologies. Among them, the Graph Databases are calling the... more
Pervasive computing applications are increasingly leveraging contextual information from several sources to provide users with behavior appropriate to the environment in which they reside. If these sources of contextual information are... more
This paper is concerned with discovering positive and negative association rules, a problem which has been addressed by various authors from different angles, but for which no fully satisfactory solution has yet been proposed. We... more
by Do Son
Consider a database that represents the location of moving objects, such as taxi-cabs (typical query: “retrieve the cabs that are currently within 1 mile of 33 Michigan Ave., Chicago”), or objects in a battle-field. Existing database... more
The ability to efficiently discover information using partial knowledge (for example keywords, attributes or ranges) is important in large, decentralized, resource sharing distributed environments such as computational Grids and... more
The recent advance in cloud computing and distributed web applications has created the need to store large amount of data in distributed databases that provide high availability and scalability. In recent years, a growing number of... more
Traditional methods for frequent itemset mining typically assume that data is centralized and static. Such methods impose excessive communication overhead when data is distributed, and they waste computational resources when data is... more
There are two major challenges for a high-performance remote-sensing database. First, it must provide low-latency retrieval of very large volumes of spatio-temporal data. This requires e ective declustering and placement of a... more
We propose a self-organizing archival Intermemory. That is, a noncommercial subscriber-provided distributed information storage service built on the existing Internet. Given an assumption of continued growth in the memory's total size, a... more
Several systems possess the flexibility to serve requests in more than one way. For instance, a distributed storage system storing multiple replicas of the data can serve a request from any of the multiple servers that store the requested... more
Hadoop Distributed File System (HDFS) acts as the primary storage of Hadoop and has been adopted by reputed organizations (Facebook, Yahoo! etc.) due to its portability and fault-tolerance. The existing implementation of HDFS uses... more
We present alternative designs for efficiently supporting multicast for mobile hosts on the Internet. Methods for separately supporting multicasting and mobility along with their possible interactions are briefly described, and then... more
In this paper, we present a performance comparison of database replication techniques based on total order broadcast. While the performance of total order broadcast-based replication techniques has been studied in previous papers, this... more
Distributed computer systems have been the subject of a vast amount of research. Many prototype distributed computer systems have been built at university, industrial, commercial, and government research laboratories, and production... more
Skyline query processing has received considerable attention in the recent past. Mainly, the skyline query is used to find a set of non dominated data points in a multidimensional dataset. While most previous work has assumed a... more
by Bei Yu
Most existing Peer-to-Peer (P2P) systems support only title-based searches and are limited in functionality when compared to today's search engines. In this paper, we present the design of a distributed P2P information sharing system that... more
by Walid Saad and 
1 more
Vehicle-to-roadside (V2R) communications enable vehicular networks to support a wide range of applications for enhancing the efficiency of road transportation. While existing work focused on non-cooperative techniques for V2R... more
Distributed storage systems need to store data redundantly in order to provide some fault-tolerance and guarantee system reliability. Different coding techniques have been proposed to provide the required redundancy more efficiently than... more
Recently, wireless sensor networks (WSNs) have become mature enough to go beyond being simple fine-grained continuous monitoring platforms and become one of the enabling technologies for disaster early-warning systems. Event detection... more
languages. Our approach is to allow a user to compose Boolean queries in one rich front-end language. For each user query and target source, we transform the user query into a subsuming query that can be supported by the source but that... more
This paper is concerned with accurate and efficient indexing of fingerprint images. We present a model-based approach, which efficiently retrieves correct hypotheses using novel features of triangles formed by the triplets of minutiae as... more
Power utilities globally are increasingly upgrading to Smart Grids that use bi-directional communication with the consumer to enable an information-driven approach to distributed energy management. Clouds offer features well suited for... more
Web applications are the legacy software of the future. Developed under tight schedules, with high employee turn over, and in a rapidly evolving environment, these systems are often poorly structured and poorly documented. Maintaining... more
In recent years, Massively Parallel Processors (MPPs) have gained ground enabling vast amounts of data processing. In such environments, data is partitioned across multiple compute nodes, which results in dramatic performance improvements... more
We describe the design of Mariposa, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities. The Mariposa design unifies the approaches... more
The popularity of distributed file systems continues to grow. Reasons they are preferred over traditional centralized file systems include fault tolerance, availability, scalability and performance. In addition, Peer-to-Peer (P2P) system... more
A crucial element of large web companies is their ability to collect and analyze massive amounts of data. Tuple store databases are a key enabling technology employed by many of these companies (e.g., Google Big Table and Amazon Dynamo).... more
In sensor networks, the large amount of data nsor re Wireless Se composed of a large nu activities are sometimes not negligible in energy consumption generated by sensors greatly influences the lifetime of the network. In order to manage... more
We extend the problem of association rule mininga key data mining problem -to systems in which the database is partitioned among a very large number of computers that are dispersed over a wide area. Such computing systems include GRID... more