Lecture 6
Database Replication
Abdulmottaleb Elabour
[email protected]
1
What is Replication?
• Replication is a set of technologies for copying and distributing data and database
objects from one database to another and then synchronizing between databases
to maintain consistency.
• Use replication to distribute data to different locations and to remote or mobile
users over local and wide area networks wireless connections, and the Internet.
2
3
4
terminology used in SQL Server Replication.
Article
•An article is an object that can be replicated. These include tables, views, stored procedures, and
user defined functions.
Publication
•A publication is a collection of articles that exist on a publisher.
Publisher
•A publisher is a SQL Server instance that has publications which contain articles to be published to
subscribers.
Subscriber
•A subscriber subscribes to one or more publications, it will receive each article in the publication.
Distributor
•The distributor collects articles from the publisher and delivers them to the subscribers.
5
Types of SQL Server Replication
• SQL Server Snapshot Replication
• SQL Server Transactional Replication
• SQL Server Merge Replication
• Peer-to-Peer - Transactional Replication
6
SQL Server Snapshot Replication
• It does this by running something called the Snapshot agent, this can be on a schedule or it can be run
on demand.
• The snapshot agent locks the table and takes a snapshot which it stores in a snapshot folder, then it
updates the distributor with details of the snapshot.
• The distributor delivers the snapshot to the subscribers. This is useful for tables that don’t update very
often so you could schedule the snapshot to be taken daily, weekly, monthly, etc. For example, a
quarterly bonus table, in this scenario it might be worthwhile using snapshot replication to pull over
the infrequent changes to the subscribers.
Pros
•Simple to set up, low maintenance, doesn’t need a primary key on the table.
Cons
•High impact when the snapshot agent runs, high latency, modifications on the subscriber would be lost when the
new snapshot is delivered. 7
SQL Server Snapshot Replication
8
SQL Server Transactional Replication
• Transactional replication, again as the name suggests, delivers transactions.
• Unlike the snapshot agent which takes a snapshot of the whole article (table, view, etc), in
transactional replication we have a log reader agent, it reads the logs and looks for transactions
which are marked for publication, it delivers these transactions to the distributor.
• The distributor then delivers the transactions to each subscriber, in transactional replication,
there are 2 types of subscription for a subscriber, a push subscription or a pull subscription.
• In a push subscription, the agent job runs on the distributor to deliver transactions to the
subscriber.
• In a pull subscription, the agent job lives on the subscriber and pulls from the distributor.
• This is useful when the data needs to be as close to the publisher as possible on the subscribers
as transactions can be delivered within a few seconds, this way the subscribers can serve
multiple purposes without anyone needing to access the publisher.
9
SQL Server Transactional Replication
10
SQL Server Transactional Replication
Pros
•Great for highly transactional data, low latency, make modifications on subscribers that don’t get sent to the
publisher and won’t be overwritten e.g. you could add a column.
Cons
•Slightly more work to set up, more difficult to maintain, easily broken e.g. if someone inserted a row on the
subscriber to create a PK violation.
11
SQL Server Merge Replication
Merge replication is similar to transaction replication; however, it allows updates at the subscribers to be merged
with the publisher.
To track the rows, it adds a unique identifier column to your table. Subscribers can be offline and the changes are
tracked, once the subscriber is online it can send the changes to the publisher. If multiple subscribers are updating
the same rows this can cause conflicts which need to be managed. The typical scenario where this type of
replication is useful is sales agents using tablets in the field that can take sales whilst offline and merge the
changes when they get back online.
Pros
•Allows subscribers to update records, lets offline subscribers make data changes.
Cons
•More complicated to set up and maintain, creates a unique identifier column on your table, creates conflicts which
need to be managed.
12
Peer-to-Peer - Transactional Replication
• Peer-to-Peer replication is built on transactional replication. It maintains transactional replication between servers
and allows publishers and subscribers to send data to each other. Thus participating servers are updated near real-
time based on the scheduled frequency.
• Since we know the functionality of transactional replication and the role of the publisher, distributor and subscriber,
In the peer-to-peer transactional replication, each node acts as publisher and subscriber to one another.
• When data are inserted/Updated/Deleted in any of the servers (node) the other one is updated through its publisher
and subscriber.
13
Peer-to-Peer - Transactional Replication
14
Peer-to-Peer - Transactional Replication
Pros
• Read performance can be improved significantly through distributing load among the servers since the peer-to-peer
replication topology helps to have the same data set in multiple locations.
• Conflict detection is available with SQL Server 2008, so most of the issues can be resolved with minimal effort.
• All the participating servers in the P2P setup can be used for all types of activities since the servers have the same
data. If one server is down, the other nodes continue to serve the user requests. So a single point of failure issue can
be overcome.
• The peer-to-peer setup can be made as a high availability solution.
• There are administration enhancements with SQL server 2008 like nodes can be added or removed without
disturbing the existing setup.
• Schema changes can be done while database is online in SQL Server 2008.
• Replication Monitor options are enhanced and Replication Agent related issues are addressed with more specific
error messages in SQL Server 2008.
15
Peer-to-Peer - Transactional Replication
Cons:
• In the p2p setup, the write operation needs to be maintained at only one node otherwise it may lead to data
conflicts.
• Performance may be an issue when there more participating servers.
• Row and Column filter options are not available.
• Participating databases must have identical schema and data.
• Identity column usage may need manual intervention.
16
Peer-to-Peer and Ring Topology
The peer-to-peer transactional replication setup can form a ring topology. The below image shows three different
database locations with peer-to-peer transactional replication. The locations are replicating data respectively from
Location1 <- -> Location 2, Location2 <- -> Location3 and Location3 <- -> Location1. If any one of the locations is
down, the other locations can still stay synchronized, because each node acts as a publisher and a subscriber.
17
Some Considerations When Using Replication
Using SQL Server replication involves careful planning to ensure smooth operation, maintain data consistency, and
meet performance expectations. Here are some key considerations:
• Replication Type:
1. Snapshot Replication: Best for relatively static data or small datasets. It periodically copies entire datasets.
2. Transactional Replication: Ideal for real-time, high-frequency updates and systems requiring data consistency.
3. Merge Replication: Suitable for environments with occasional disconnections or conflicts (e.g., mobile
applications).
4. Peer-to-Peer Replication: Suitable when multiple geographically dispersed nodes need to handle read and write
operations simultaneously, it Helps reduce latency by routing user requests to the nearest node, also it is Useful
for systems where there is no central database, and each site needs autonomy while staying synchronized.
18
Some Considerations When Using Replication
• Network and Bandwidth:
1. Ensure sufficient network capacity to handle the data changes and replication traffic.
2. High latency or unreliable networks may impact the performance of transactional and merge replication.
• Data Volume and Schema:
1. Large datasets may impact performance during initialization.
2. Be mindful of schema changes; some alterations may require reinitialization
• Performance Optimization:
1. Monitor performance impact on the publisher and subscribers.
2. Use filtered publications to replicate only necessary data.
3. Index replicated tables appropriately.
19
Some Considerations When Using Replication
• Database design
• Each table to be replicated must have a primary key because SQL Server uses it to uniquely identify rows.
• Without a primary key, replication cannot track changes or enforce constraints.
• Indexes on views can improve replication performance, especially for filtered publications.
• Triggers can cause unexpected side effects during replication, such as duplicate data or additional load on the
server.
• If triggers are essential, ensure they are carefully tested in a replicated environment.
• Schema changes (e.g., adding or altering columns) can force reinitialization of replication, which is resource-
intensive.
• Auto-increment (IDENTITY) columns may lead to conflicts in merge replication.
• Use manual key management or GUIDs for unique identifiers in environments with multiple data writers.
• Ensure the same collation is used across the publisher, distributor, and subscribers to avoid inconsistencies during
data replication.
20
Some Considerations When Using Replication
• Using GUIDs (Globally Unique Identifiers) as primary keys in all tables during replication can be
beneficial in certain scenarios, but it also has drawbacks.
Advantages of Using GUIDs in Replication
• Uniqueness Across Databases:
GUIDs ensure globally unique values, making them ideal for scenarios like merge replication or peer-to-peer
replication, where multiple nodes can independently generate IDs.
• Conflict Avoidance:
With GUIDs, there is no risk of primary key conflicts when data is merged from different sources, even if multiple
nodes insert records simultaneously.
• Simplified Key Management:
Unlike IDENTITY columns, GUIDs don’t require centralized management or seeding to avoid duplication across
nodes.
• Supports Distributed Systems:
GUIDs are well-suited for distributed systems where new records may be created on multiple nodes or
subscribers. 21
Some Considerations When Using Replication
Disadvantages of Using GUIDs
• Storage Overhead:
GUIDs require 16 bytes (128 bits), compared to 4 bytes for INT or 8 bytes for BIGINT. This increases the size of the
database and indexes.
• Index Fragmentation:
• GUIDs are not sequential by nature, causing index fragmentation, slower query performance, and larger index sizes.
• Sequential GUIDs (e.g., NEWSEQUENTIALID() in SQL Server) can mitigate this issue but still don't guarantee strict
sequential ordering across distributed systems.
• Reduced Read/Write Performance:
Due to their size and randomness, GUIDs can slow down database operations compared to compact, sequential
numeric keys.
22
Some Considerations When Using Replication
Disadvantages of Using GUIDs
• Human Readability:
GUIDs are not human-readable, which can make debugging, logging, or manual record tracking harder compared to
simple numeric IDs.
23