Skip to content

Commit 164738b

Browse files
authored
feat: enable dry-run mode for HBase to Cloud Bigtable replication (#3532)
* Initial draft for dry-run * Adding integ tests for Dry-run mode. * Updating readme.md to enable/disable dry run mode. * Tuning logging levesl and updating readme.md * Add custom user agent to identify HBase replication writes. * Incorporating PR feedback
1 parent 7dd7653 commit 164738b

File tree

16 files changed

+974
-143
lines changed

16 files changed

+974
-143
lines changed

hbase-migration-tools/bigtable-hbase-replication/README.md

Lines changed: 54 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -177,25 +177,7 @@ to configure HBase to Cloud Bigtable replication:
177177
in HBase shell. The metrics for CBT replication will be under the “peer_id”
178178
used in the previous step.
179179

180-
## Error handling
181-
182-
HBase has push based replication. Each region server reads the WAL entries and
183-
passes them to each replication endpoint. If the replication endpoint fails to
184-
apply WAL logs, the WAL will accumulate on HBase regions servers.
185-
186-
If a Bigtable cluster is temporarily unavailable, the WAL logs will accumulate
187-
on region servers. Once the cluster becomes available again, replication can
188-
continue.
189-
190-
For any non-retryable error, like non-existent column-family, replication will
191-
pause and WAL logs will build-up. Users should monitor & alert on replication
192-
progress
193-
via [HBase replication monitoring](https://hbase.apache.org/book.html#_monitoring_replication_status)
194-
. The replication library can not skip a replication entry as a single WAL entry
195-
represents an atomic transaction. Skipping a message will result in divergence
196-
between source and target tables.
197-
198-
### Incompatible Mutations
180+
## Incompatible Mutations
199181

200182
Certain HBase delete APIs
201183
are [not supported on CBT](https://cloud.google.com/bigtable/docs/hbase-differences#mutations_and_deletions)
@@ -232,6 +214,58 @@ refer
232214
to [IncompatibleMutationAdapter](bigtable-hbase-replication-core/src/main/java/com/google/cloud/bigtable/hbase/replication/adapters/IncompatibleMutationAdapter.java)
233215
javadocs for more details.
234216

217+
### Dry run mode
218+
219+
It may be hard to determine if an application issues incompatible mutations,
220+
especially if the HBase cluster and application are owned by different teams.
221+
The replication library provides a dry-run mode to detect incompatible
222+
mutations. In dry run mode, replication library checks the mutations for
223+
incompatibility and never sends them to Cloud Bigtable. All the incompatible
224+
mutations are logged. If you are not sure about incompatible mutations, enable
225+
replication in the dry run mode and observe the incompatible mutation metrics (
226+
discussed below).
227+
228+
You should make sure that all the [prerequisites](#prerequisites) are fulfilled
229+
before enabling the dry run mode. Dry run mode can be enabled by setting the
230+
property `google.bigtable.replication.enable_dry_run` to true. It can be set
231+
in `hbase-site.xml` but we recommend setting it during peer creation.
232+
Enabling/disabling dry run mode during peer creation can avoid restarting the
233+
HBase cluster to pickup changes to `hbase-site.xml` file. Enable dry run mode by
234+
running the following command to add Cloud Bigtable replication peer (please
235+
change the endpoint class for HBase 1.x):
236+
237+
```
238+
add_peer 'peer_id',
239+
ENDPOINT_CLASSNAME=>'com.google.cloud.bigtable.hbase2_x.replication.HbaseToCloudBigtableReplicationEndpoint',
240+
CONFIG=>{'google.bigtable.replication.enable_dry_run' => 'true' }
241+
```
242+
243+
When you are ready to enable replication to Cloud Bigtable, delete this peer and
244+
create a new peer in normal mode (**do not** try to update the "dry-run" peer):
245+
246+
```
247+
remove_peer 'peer_id'
248+
add_peer 'new_peer_id', ENDPOINT_CLASSNAME=>'com.google.cloud.bigtable.hbase2_x.replication.HbaseToCloudBigtableReplicationEndpoint'
249+
```
250+
251+
## Error handling
252+
253+
HBase has push based replication. Each region server reads the WAL entries and
254+
passes them to each replication endpoint. If the replication endpoint fails to
255+
apply WAL logs, the WAL will accumulate on HBase regions servers.
256+
257+
If a Bigtable cluster is temporarily unavailable, the WAL logs will accumulate
258+
on region servers. Once the cluster becomes available again, replication can
259+
continue.
260+
261+
For any non-retryable error, like non-existent column-family, replication will
262+
pause and WAL logs will build-up. Users should monitor & alert on replication
263+
progress
264+
via [HBase replication monitoring](https://hbase.apache.org/book.html#_monitoring_replication_status)
265+
. The replication library can not skip a replication entry as a single WAL entry
266+
represents an atomic transaction. Skipping a message will result in divergence
267+
between source and target tables.
268+
235269
## Monitoring
236270

237271
The replication library will emit the metrics into HBase metric ecosystem. There
@@ -249,6 +283,7 @@ are 3 kinds of metrics that the replication library will publish:
249283

250284
Please refer to javadocs for class HBaseToCloudBigtableReplicationMetrics for
251285
list of available metrics.
286+
252287
## Troubleshooting
253288

254289
### Replication stalling

hbase-migration-tools/bigtable-hbase-replication/bigtable-hbase-1.x-replication/src/main/java/com/google/cloud/bigtable/hbase1_x/replication/HbaseToCloudBigtableReplicationEndpoint.java

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -27,16 +27,11 @@
2727
import java.util.UUID;
2828
import org.apache.hadoop.hbase.replication.BaseReplicationEndpoint;
2929
import org.apache.hadoop.hbase.wal.WAL;
30-
import org.slf4j.Logger;
31-
import org.slf4j.LoggerFactory;
3230

3331
/** Basic endpoint that listens to CDC from HBase 1.x and replicates to Cloud Bigtable. */
3432
@InternalExtensionOnly
3533
public class HbaseToCloudBigtableReplicationEndpoint extends BaseReplicationEndpoint {
3634

37-
private static final Logger LOG =
38-
LoggerFactory.getLogger(HbaseToCloudBigtableReplicationEndpoint.class);
39-
4035
private final CloudBigtableReplicator cloudBigtableReplicator;
4136
private final HBaseMetricsExporter metricsExporter;
4237

Original file line numberDiff line numberDiff line change
@@ -0,0 +1,142 @@
1+
/*
2+
* Copyright 2022 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* http://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package com.google.cloud.bigtable.hbase1_x.replication;
18+
19+
import static org.junit.Assert.assertFalse;
20+
21+
import com.google.cloud.bigtable.emulator.v2.BigtableEmulatorRule;
22+
import com.google.cloud.bigtable.hbase.BigtableConfiguration;
23+
import com.google.cloud.bigtable.hbase.replication.configuration.HBaseToCloudBigtableReplicationConfiguration;
24+
import com.google.cloud.bigtable.hbase.replication.utils.TestUtils;
25+
import java.io.IOException;
26+
import java.util.UUID;
27+
import org.apache.hadoop.conf.Configuration;
28+
import org.apache.hadoop.hbase.HBaseTestingUtility;
29+
import org.apache.hadoop.hbase.HColumnDescriptor;
30+
import org.apache.hadoop.hbase.HConstants;
31+
import org.apache.hadoop.hbase.HTableDescriptor;
32+
import org.apache.hadoop.hbase.TableName;
33+
import org.apache.hadoop.hbase.client.Connection;
34+
import org.apache.hadoop.hbase.client.Put;
35+
import org.apache.hadoop.hbase.client.ResultScanner;
36+
import org.apache.hadoop.hbase.client.Scan;
37+
import org.apache.hadoop.hbase.client.Table;
38+
import org.apache.hadoop.hbase.client.replication.ReplicationAdmin;
39+
import org.apache.hadoop.hbase.replication.ReplicationException;
40+
import org.apache.hadoop.hbase.replication.ReplicationPeerConfig;
41+
import org.junit.AfterClass;
42+
import org.junit.Before;
43+
import org.junit.BeforeClass;
44+
import org.junit.ClassRule;
45+
import org.junit.Test;
46+
import org.junit.runner.RunWith;
47+
import org.junit.runners.JUnit4;
48+
import org.slf4j.Logger;
49+
import org.slf4j.LoggerFactory;
50+
51+
@RunWith(JUnit4.class)
52+
public class HbaseToCloudBigtableReplicationEndpointDryRunTest {
53+
54+
private static final Logger LOG =
55+
LoggerFactory.getLogger(HbaseToCloudBigtableReplicationEndpointDryRunTest.class);
56+
57+
private static HBaseTestingUtility hbaseTestingUtil = new HBaseTestingUtility();
58+
private static ReplicationAdmin replicationAdmin;
59+
60+
@ClassRule
61+
public static final BigtableEmulatorRule bigtableEmulator = BigtableEmulatorRule.create();
62+
63+
private static Connection cbtConnection;
64+
private static Connection hbaseConnection;
65+
66+
private Table hbaseTable;
67+
private Table cbtTable;
68+
69+
@BeforeClass
70+
public static void setUpCluster() throws Exception {
71+
// Prepare HBase mini cluster configuration
72+
Configuration conf = hbaseTestingUtil.getConfiguration();
73+
74+
// Set CBT related configs.
75+
conf.set("google.bigtable.instance.id", "test-instance");
76+
conf.set("google.bigtable.project.id", "test-project");
77+
// This config will connect Replication endpoint to the emulator and not the prod CBT.
78+
conf.set("google.bigtable.emulator.endpoint.host", "localhost:" + bigtableEmulator.getPort());
79+
conf.setBoolean(HBaseToCloudBigtableReplicationConfiguration.ENABLE_DRY_RUN_MODE_KEY, true);
80+
81+
hbaseTestingUtil.startMiniCluster(2);
82+
replicationAdmin = new ReplicationAdmin(hbaseTestingUtil.getConfiguration());
83+
84+
cbtConnection = BigtableConfiguration.connect(conf);
85+
hbaseConnection = hbaseTestingUtil.getConnection();
86+
87+
// Setup Replication in HBase mini cluster
88+
ReplicationPeerConfig peerConfig = new ReplicationPeerConfig();
89+
peerConfig.setReplicationEndpointImpl(
90+
HbaseToCloudBigtableReplicationEndpoint.class.getTypeName());
91+
// Cluster key is required, we don't really have a clusterKey for CBT.
92+
peerConfig.setClusterKey(hbaseTestingUtil.getClusterKey());
93+
replicationAdmin.addPeer("cbt", peerConfig);
94+
95+
LOG.info("#################### SETUP COMPLETE ##############################");
96+
}
97+
98+
@AfterClass
99+
public static void tearDown() throws Exception {
100+
cbtConnection.close();
101+
hbaseConnection.close();
102+
replicationAdmin.close();
103+
hbaseTestingUtil.shutdownMiniCluster();
104+
}
105+
106+
@Before
107+
public void setupTestCase() throws IOException {
108+
// Create and set the empty tables
109+
TableName table1 = TableName.valueOf(UUID.randomUUID().toString());
110+
createTables(table1);
111+
112+
cbtTable = cbtConnection.getTable(table1);
113+
hbaseTable = hbaseConnection.getTable(table1);
114+
}
115+
116+
private void createTables(TableName tableName) throws IOException {
117+
// Create table in HBase
118+
HTableDescriptor htd = hbaseTestingUtil.createTableDescriptor(tableName.getNameAsString());
119+
HColumnDescriptor cf1 = new HColumnDescriptor(TestUtils.CF1);
120+
htd.addFamily(cf1);
121+
122+
// Enables replication to all peers, including CBT
123+
cf1.setScope(HConstants.REPLICATION_SCOPE_GLOBAL);
124+
hbaseTestingUtil.getHBaseAdmin().createTable(htd);
125+
126+
cbtConnection.getAdmin().createTable(htd);
127+
}
128+
129+
@Test
130+
public void testDryRunDoesNotReplicateToCloudBigtable()
131+
throws IOException, InterruptedException, ReplicationException {
132+
Put put = new Put(TestUtils.ROW_KEY);
133+
put.addColumn(TestUtils.CF1, TestUtils.COL_QUALIFIER, 0, TestUtils.VALUE);
134+
hbaseTable.put(put);
135+
136+
// Give enough time for replication to catch up. Nothing should be replicated as its dry-run
137+
Thread.sleep(3000);
138+
139+
ResultScanner cbtScanner = cbtTable.getScanner(new Scan());
140+
assertFalse(cbtScanner.iterator().hasNext());
141+
}
142+
}

hbase-migration-tools/bigtable-hbase-replication/bigtable-hbase-1.x-replication/src/test/java/com/google/cloud/bigtable/hbase1_x/replication/HbaseToCloudBigtableReplicationEndpointTest.java

Lines changed: 2 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,10 @@
3939
import org.apache.hadoop.hbase.client.Put;
4040
import org.apache.hadoop.hbase.client.Table;
4141
import org.apache.hadoop.hbase.client.replication.ReplicationAdmin;
42-
import org.apache.hadoop.hbase.ipc.RpcServer;
4342
import org.apache.hadoop.hbase.replication.BaseReplicationEndpoint;
4443
import org.apache.hadoop.hbase.replication.ReplicationException;
4544
import org.apache.hadoop.hbase.replication.ReplicationPeerConfig;
4645
import org.apache.hadoop.hbase.util.Bytes;
47-
import org.apache.hadoop.hbase.util.ServerRegionReplicaUtil;
4846
import org.junit.AfterClass;
4947
import org.junit.Assert;
5048
import org.junit.Before;
@@ -111,7 +109,6 @@ public boolean replicate(ReplicateContext replicateContext) {
111109
LoggerFactory.getLogger(HbaseToCloudBigtableReplicationEndpointTest.class);
112110

113111
private static HBaseTestingUtility hbaseTestingUtil = new HBaseTestingUtility();
114-
private static Configuration hbaseConfig;
115112
private static ReplicationAdmin replicationAdmin;
116113

117114
@ClassRule
@@ -129,9 +126,6 @@ public boolean replicate(ReplicateContext replicateContext) {
129126
public static void setUpCluster() throws Exception {
130127
// Prepare HBase mini cluster configuration
131128
Configuration conf = hbaseTestingUtil.getConfiguration();
132-
conf.setBoolean(HConstants.REPLICATION_ENABLE_KEY, true);
133-
conf.setBoolean(ServerRegionReplicaUtil.REGION_REPLICA_REPLICATION_CONF_KEY, true);
134-
conf.setInt(HConstants.HBASE_CLIENT_RETRIES_NUMBER, 5); // less number of retries is needed
135129

136130
// Set CBT related configs.
137131
conf.set("google.bigtable.instance.id", "test-instance");
@@ -140,8 +134,6 @@ public static void setUpCluster() throws Exception {
140134
conf.set("google.bigtable.emulator.endpoint.host", "localhost:" + bigtableEmulator.getPort());
141135

142136
hbaseTestingUtil.startMiniCluster(2);
143-
hbaseConfig = conf;
144-
hbaseConfig.setLong(RpcServer.MAX_REQUEST_SIZE, 102400);
145137
replicationAdmin = new ReplicationAdmin(hbaseTestingUtil.getConfiguration());
146138

147139
cbtConnection = BigtableConfiguration.connect(conf);
@@ -361,8 +353,8 @@ public void testMultiTableMultiColumnFamilyReplication()
361353
return TestReplicationEndpoint.replicatedEntries.get() >= 16;
362354
});
363355
TestUtils.assertTableEventuallyEquals(
364-
hbaseTable,
365-
cbtTable,
356+
hbaseTable2,
357+
cbtTable2,
366358
() -> {
367359
return TestReplicationEndpoint.replicatedEntries.get() >= 16;
368360
});

hbase-migration-tools/bigtable-hbase-replication/bigtable-hbase-2.x-replication/src/main/java/com/google/cloud/bigtable/hbase2_x/replication/HbaseToCloudBigtableReplicationEndpoint.java

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -28,16 +28,11 @@
2828
import java.util.UUID;
2929
import org.apache.hadoop.hbase.replication.BaseReplicationEndpoint;
3030
import org.apache.hadoop.hbase.wal.WAL;
31-
import org.slf4j.Logger;
32-
import org.slf4j.LoggerFactory;
3331

3432
// TODO(remove BaseReplicationEndpoint extension).
3533
@InternalExtensionOnly
3634
public class HbaseToCloudBigtableReplicationEndpoint extends BaseReplicationEndpoint {
3735

38-
private static final Logger LOG =
39-
LoggerFactory.getLogger(HbaseToCloudBigtableReplicationEndpoint.class);
40-
4136
private final CloudBigtableReplicator cloudBigtableReplicator;
4237
private final HBaseMetricsExporter metricsExporter;
4338

0 commit comments

Comments
 (0)