Academia.edu no longer supports Internet Explorer.
To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to upgrade your browser.
2014, Proceedings of the 18th International Database Engineering & Applications Symposium on - IDEAS '14
One of the most demanding needs in cloud computing is that of having scalable and highly available databases. One of the ways to attend these needs is to leverage the scalable replication techniques developed in the last decade. These techniques allow increasing both the availability and scalability of databases. Many replication protocols have been proposed during the last decade. The main research challenge was how to scale under the eager replication model, the one that provides consistency across replicas. In this paper, we examine three eager database replication systems available today: Middle-R, C-JDBC and MySQL Cluster using TPC-W benchmark. We analyze their architecture, replication protocols and compare the performance both in the absence of failures and when there are failures.
Proceedings 19th IEEE Symposium on Reliable Distributed Systems SRDS-2000, 2000
Data replication is an increasingly important topic as databases are more and more deployed over clusters of workstations. One of the challenges in database replication is to introduce replication without severely affecting performance. Because of this difficulty, current database products use lazy replication, which is very efficient but can compromise consistency. As an alternative, eager replication guarantees consistency but most existing protocols have a prohibitive cost. In order to clarify the current state of the art and open up new avenues for research, this paper analyses existing eager techniques using three key parameters. In our analysis, we distinguish eight classes of eager replication protocols and, for each category, discuss its requirements, capabilities, and cost. The contribution lies in showing when eager replication is feasible and in spelling out the different aspects a database replication protocol must account for.
2000
In this paper, we explore data replication protocols that provide both fault tolerance and good performance without compromising consistency. We do this by combining transactional concurrency control with group communication primitives. In our approach, transactions are executed at only one site so that not all nodes incur in the overhead of producing results. To further reduce latency, we use an optimistic multicast technique that overlaps transaction execution with total order message delivery. The protocols we present in the paper provide correct executions while minimizing overhead and providing higher scalability.
arXiv (Cornell University), 2020
Database replication is an important component of reliable, disaster tolerant and highly available distributed systems. However, data replication also causes communication and processing overhead. Quantification of these overheads is crucial in choosing a suitable DBMS form several available options and capacity planning. In this paper, we present results from a comparative empirical analysis of replication activities of three commonly used DBMSs -MySQL, PostgreSQL and Cassandra under text as well as image traffic. In our experiments, the total traffic with two replicas (which is the norm) was as much as 300% higher than the total traffic with no replica. Furthermore, activation of the compression option for replication traffic, built in to MySQL, reduced the total network traffic by as much as 20%. We also found that average CPU utilization and memory utilization were not impacted by the number of replicas or the dataset.
13th Pacific Rim International Symposium on Dependable Computing (PRDC 2007), 2007
Databases have become a crucial component in modern information systems. At the same time, they have become the main bottleneck in most systems. Database replication protocols have been proposed to solve the scalability problem by scaling out in a cluster of sites. Current techniques have attained some degree of scalability, however there are two main limitations to existing approaches. Firstly, most solutions adopt a full replication model where all sites store a full copy of the database. The coordination overhead imposed by keeping all replicas consistent allows such approaches to achieve only medium scalabilitiy. Secondly, most replication protocols rely on the traditional consistency criterion, 1-copy-serializability, which limits concurrency, and thus scalability of the system. In this paper, we first analyze analytically the performance gains that can be achieved by various partial replication configurations, i.e., configurations where not all sites store all data. From there, we derive a partial replication protocol that provides 1-copy-snapshot isolation as correctness criterion. We have evaluated the protocol with TPC-W and the results show better scalability than full replication.
High availability (HA) of database is critical for the high availability of cloud-based applications and services. Master-slave replication has been traditionally used since long time as a solution for this. Since master-slave replication uses either asynchronous or semi-synchronous replication, the technique suffers from severe problem of data inconsistency when master crashes during a transaction. Modern cluster-based solutions address this through multi-master synchronous replication. These two HA database solutions have been investigated and compared both qualitatively and quantitatively. They are evaluated based on availability and performance through implementation using the most recent version of MariaDB server, which supports both the traditional master-slave replication, and cluster based replication via Galera cluster. The evaluation framework and methodology used in this paper would be useful for comparing and analyzing performance of different high availability database systems and solutions, and which in turn would be helpful in picking an appropriate HA database solution for a given application.
Proceedings of the 2009 EDBT/ICDT Workshops on - EDBT/ICDT '09, 2009
In distributed systems, replication is used for ensuring availability and increasing performances. However, the heavy workload of distributed systems such as web2.0 applications or Global Distribution Systems, limits the benefit of replication if its degree (i.e., the number of replicas) is not controlled. Since every replica must perform all updates eventually, there is a point beyond which adding more replicas does not increase the throughput, because every replica is saturated by applying updates. Moreover, if the replication degree exceeds the optimal threshold, the useless replica would generate an overhead due to extra communication messages. In this paper, we propose a suitable replication management solution in order to reduce useless replicas. To this end, we define two mathematical models which approximate the appropriate number of replicas to achieve a given level of performance. Moreover, we demonstrate the feasibility of our replication management model through simulation. The results expose the effectiveness of our models and their accuracy.
Proceedings 22nd International Conference on Distributed Computing Systems, 2002
The increasingly pervasive use of clusters makes replication a central element of modern information systems. Replication, however, must nowadays play a dual functionality: it must increase both the availability and the processing capacity of the application. Most existing data replication protocols cannot do this as they improve availability at the cost of scalability. In this paper we present a protocol that achieves this dual goal. The contribution is to demonstrate that data replication does not need to severely affect the overall scalability and that it can be efficiently implemented in a middleware layer. A key feature of this layer is that it only requires from the databases underneath, two commonly implemented services.
International Conference on Enterprise Information Systems, 2004
Abstract: Providing fault tolerant services is a key question among many services manufacturers. Thus, enterprisesusually acquire complex and expensive replication engines. This paper offers an interesting choice to organizationswhich can not afford such costs. RJDBC stands for a simple, easy to install middleware, placedbetween the application and the database management system, intercepting all database operations and forwardingthem among all the
2013
Eventual consistency improves the scalability of large datasets in cloud systems. We propose a novel technique for managing different levels of replica consistency in a replicated relational DBMS. To this end, data is partitioned and managed by a partial replication protocol that is able to define a hierarchy of nodes with a lazy update propagation. Nodes in different layers of the hierarchy may maintain different versions of their assigned partitions. Transactions are tagged with an allowance parameter k that specifies the maximum degree of data outdatedness tolerated by them. As a result, different degrees of transaction criticality can be set and non-critical transactions may be completed without blocking nor compromising the critical ones.
IEEE Transactions on Knowledge and Data Engineering, 2020
With the advent of the Internet and Internet-connected devices, modern business applications can experience rapid increases as well as variability in transactional workloads. Database replication has been employed to scale performance and improve availability of relational databases but past approaches have suffered from various issues including limited scalability, performance versus consistency tradeoffs, and requirements for database or application modifications. This paper presents Hihooi, a replication-based middleware system that is able to achieve workload scalability, strong consistency guarantees, and elasticity for existing transactional databases at a low cost. A novel replication algorithm enables Hihooi to propagate database modifications asynchronously to all replicas at high speeds, while ensuring that all replicas are consistent. At the same time, a fine-grained routing algorithm is used to load balance incoming transactions to available replicas in a consistent way. Our thorough experimental evaluation with several well-established benchmarks shows how Hihooi is able to achieve almost linear workload scalability for transactional databases.
2012 41st International Conference on Parallel Processing, 2012
To avoid failure and achieve higher availability, replication scheme is now widely used in distributed Cloud storage systems . However, most of them only statically replicate data on some randomly chosen nodes for a fixed number of times and it is obviously not enough for more reasonable resource allocation. Moreover, query load for Web application is highly irregular. It throws us into a dilemma to always maintain maximum number of replicas in case of explosive query load outburst or save resources with fewer replicas at the expense of performance. In this paper, we present a Resilient, Fault-tolerant and High-efficient global replication algorithm (RFH) for distributed Cloud storage systems. RFH is especially efficient facing 'flash crowd' problem. Each data partition is represented by a virtual node. Each virtual node itself decides whether to replicate, migrate or suicide by weighing up the pros and cons. It is based on the evaluation of traffic load of all nodes, and selects among physical nodes with the most traffic (traffic hub) to replicate or migrate on. After that, it takes into account blocking probability to achieve quicker response and better load balance performance. Extensive simulations have been conducted and the results have demonstrated that the proposed scheme RFH outperforms the main existing algorithms (the request-oriented algorithms [16] [5], the owner-oriented algorithms [7] [11] [12] [13] and the random algorithms [4] [21] [22] in terms of high replica utilization rate, high query efficiency and reasonable path length at a low cost while maintaining high availability.
Proceedings of the 2009 EDBT/ICDT …, 2009
In distributed systems, replication is used for ensuring availability and increasing performances. However, the heavy workload of distributed systems such as web2.0 applications or Global Distribution Systems, limits the benefit of replication if its degree (i.e., the number of replicas) is not controlled. Since every replica must perform all updates eventually, there is a point beyond which adding more replicas does not increase the throughput, because every replica is saturated by applying updates. Moreover, if the replication degree exceeds the optimal threshold, the useless replica would generate an overhead due to extra communication messages. In this paper, we propose a suitable replication management solution in order to reduce useless replicas. To this end, we define two mathematical models which approximate the appropriate number of replicas to achieve a given level of performance. Moreover, we demonstrate the feasibility of our replication management model through simulation. The results expose the effectiveness of our models and their accuracy.
Springer eBooks, 2007
Middleware database replication is a way to increase performance and tolerate failures of enterprise applications. Middleware architectures distinguish themselves by their performance, scalability and their application interface, on one hand, and the degree to which they guarantee replication consistency, on the other. Both groups of features may conflict since the latter comes with an overhead that bears on the former. In this paper, we review different techniques proposed to achieve and measure improvements of the performance, scalability and overhead introduced by different degrees of data consistency. We do so with a particular emphasis on the requirements of enterprise applications.
The Cloud is an increasingly popular platform for e-commerce applications that can be scaled on-demand in a very cost effective way. Dynamic provisioning is used to autonomously add capacity in multi-tier cloud-based applications that see workload increases. While many solutions exist to provision tiers with little or no state in applications, the database tier remains problematic for dynamic provisioning due to the need to replicate its large disk state.
As we delve deeper into the ‘Digital Age’, we witness an explosive growth in the volume, velocity, and variety of the data available on the Internet. For example, in 2012 about 2.5 quintillion bytes of data was created on a daily basis that originated from myriad of sources and applications including mobile devices, sensors, individual archives, social networks, Internet of Things, enterprises, cameras, software logs, etc. Such ‘Data Explosions’ has led to one of the most challenging research issues of the current Information and Communication Technology era: how to optimally manage (e.g., store, replicated, filter, and the like) such large amount of data and identify new ways to analyze large amounts of data for unlocking information. It is clear that such large data streams cannot be managed by setting up on-premises enterprise database systems as it leads to a large up-front cost in buying and administering the hardware and software systems. Therefore, next generation data management systems must be deployed on cloud. The cloud computing paradigm provides scalable and elastic resources, such as data and services accessible over the Internet Every Cloud Service Provider must assure that data is efficiently processed and distributed in a way that does not compromise end-users’ Quality of Service (QoS) in terms of data availability, data search delay, data analysis delay, and the like. In the aforementioned perspective, data replication is used in the cloud for improving the performance (e.g., read and write delay) of applications that access data. Through replication a data intensive application or system can achieve high availability, better fault tolerance, and data recovery. In this paper, we survey data management and replication approaches (from 2007 to 2011) that are developed by both industrial and research communities. The focus of the survey is to discuss and characterize the existing approaches of data replication and management that tackle the resource usage and QoS provisioning with different levels of efficiencies.Moreover, the breakdownof both influential expressions (data replication and management) to provide different QoS attributes is deliberated. Furthermore, the performance advantages and disadvantages of data replication and management approaches in the cloud computing environments are analyzed. Open issues and future challenges related to data consistency, scalability, load balancing, processing and placement are also reported.
Data availability is one of the key requirements for cloud storage system. Data replication has been widely used as a mean of increasing the availability in traditional distributed databases, peer-to-peer (P2P) systems, and grid systems. These strategies were all developed for specific platforms and application types and have been tailored to the characteristics of the underlying system architectures and application requirements. Cloud systems differ from these previous frameworks in that they are designed to support large numbers of customer-oriented applications, each with different quality of service (QoS) requirements and resource consumption characteristics. Aiming to provide high data availability, and improve performance and load balancing of cloud storage, efficient replication management scheme is proposed. In replication management, the system provides optimum replica number as well as weighting and balancing among the storage server nodes and experimental results prove that the proposed system can improve data availability depending on the expected availability and failure probability of each node in PC cluster.
2007
Abstract Replication is attractive for scaling databases up, as it does not require costly equipment and it enables fault tolerance. However, as the latency gap between local and remote accesses continues to widen, maintaining consistency between replicas remains a performance and complexity bottleneck. Optimistic replication (OR) addresses these problems.
International Journal of Information Systems and Social Change, 2017
This paper presents a survey of data replication strategies in cloud systems. Based on the survey and reviews of existing classifications, we propose another classification of replication strategies based on the following five dimensions: (i) static vs. dynamic, (ii) reactive vs. proactive workload balancing, (iii) provider vs. customer centric, (iv) optimal number vs. dynamic adjustment of the replica factor and (v) objective function. Ideally, a good replication strategy must simultaneously consider multiple criteria: (i) the reduction of access time, (ii) the reduction of the bandwidth consumption, (iii) the storage resource availability, (iv) a balanced workload between replicas and (v) a strategic placement algorithm including an adjusted number of replicas. Therefore, selecting a data replication strategy is a classic example of multiple criteria decision making problems. The taxonomy we present can be a useful guideline for IT managers to select the data replication strategy ...
Journal of Information Technology Research
Cloud computing has risen as a new computing paradigm providing computing, resources for networking, and storage as a service across the network. Data replication is a phenomenon which brings the available and reliable data (e.g., maybe the databases) nearer to the consumers (e.g., cloud applications) to overcome the bottleneck and is becoming a suitable solution. In this paper, the authors study the performance characteristics of a replicated database in cloud computing data centres which improves QoS by reducing communication delays. They formulate a theoretical queueing model of the replicated system by considering the arrival process as Poisson distribution for both types of client request, such as read and write applications. They solve the proposed model with the help of the recursive method, and the relevant performance matrices are derived. The evaluated results from both the mathematical model and extensive simulations help to study the unveil performance and guide the clou...
Abstract Recently, third party solutions for database replication have been enjoying an increasing popularity. Such proposals address a diversity of user requirements, namely preventing conflicting updates without the overhead of synchronous replication; clustering for scalability and availability; and heterogeneous replicas for specialized queries.
Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.