Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
<?xml version="1.0" encoding="UTF-8"?>
<!--
Copyright 2021 Google LCC
Copy link
Copy Markdown
Contributor

@kolea2 kolea2 Oct 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>

<parent>
<groupId>com.google.cloud.bigtable</groupId>
<artifactId>bigtable-hbase-mirroring-client-1.x-parent</artifactId>
<version>0.0.1-alpha1-SNAPSHOT</version>
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets keep the versioning consistent with the other packages. Typically, we have the same version number for all the artifacts in this repo.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in next PRs.

</parent>

<artifactId>bigtable-hbase-mirroring-client-1.x</artifactId>
<packaging>jar</packaging>
<name>${project.groupId}:${project.artifactId}</name>
<version>0.0.1-alpha1-SNAPSHOT</version>
<description>
</description>

<dependencies>
<dependency>
<groupId>org.apache.hbase</groupId>
<artifactId>hbase-shaded-client</artifactId>
<version>${hbase1.version}</version>
</dependency>

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will depend on bigtable-hbase-1x-hadoop?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We wanted to keep Mirroring Client unrelated to bigtable-hbase client, we are not using it directly anyways and we do not need this dependency. Users should load jars with libraries they want to use, we do not care as long as they implement HBase API (as does bigtable-hbase client).

<!-- Test dependencies-->
<dependency>
<groupId>io.dropwizard.metrics</groupId>
<artifactId>metrics-graphite</artifactId>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is graphite needed for testing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review all our dependencies and fix it in later PRs.

<version>${hbase1-metrics.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>${junit.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.truth</groupId>
<artifactId>truth</artifactId>
<version>1.1.2</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>com.google.api</groupId>
<artifactId>api-common</artifactId>
<version>1.10.4</version>
<scope>compile</scope>
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a bit surprising to see a compile scope under a test dependency comment section

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll review all our dependencies and fix it in later PRs.

</dependency>
<dependency>
<groupId>org.mockito</groupId>
<artifactId>mockito-core</artifactId>
<version>3.8.0</version>
<scope>test</scope>
</dependency>
</dependencies>
</project>
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
/*
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.bigtable.mirroring.hbase1_x;

import com.google.api.core.InternalApi;
import com.google.common.util.concurrent.ListenableFuture;
import com.google.common.util.concurrent.ListeningExecutorService;
import java.util.List;
import java.util.concurrent.Callable;
import org.apache.hadoop.hbase.client.Get;
import org.apache.hadoop.hbase.client.Result;
import org.apache.hadoop.hbase.client.Table;

/**
* MirroringClient verifies consistency between two databases asynchronously - after the results are
* delivered to the user. HBase Table object does not have an synchronous API, so we simulate it by
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you mean async?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will fix in future PR.

* wrapping the regular Table into AsyncTableWrapper.
*
* <p>Table instances are not thread-safe, every operation is synchronized to prevent concurrent
* accesses to the table from different threads in the executor.
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a little concerned about this approach:
if the service backing this is down or slow, won't the work queue grow unbounded w/o any backpressure?

It's also quite inefficient: you are tying up a thread, which will wait on a mutex which will be populated by an async bigtable client. In the future we might want to special this for Bigtable and use the async apis. Would be good to note it

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also please update the doc and mention that this is ok because this is only used by HBase Table implementations and those are not threadsafe (maybe reference the javadocs for Connection): https://hbase.apache.org/2.0/apidocs/org/apache/hadoop/hbase/client/Connection.html

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have (not present in this PR) a FlowController that limits number of in-flight requests and amount of memory used by the asynchronous calls. If available resources are depleted then the library will block user code calling any of API methods until resources are available.

We are aware of the inefficiency, we wanted to keep the code simple. We can add a special case for bigtable as a secondary or make it more general - add a task queue per table and use a (possibly bounded) thread pool to run tasks from those queues.

I'll add this comment in future PR.

*/
@InternalApi("For internal usage only")
public class AsyncTableWrapper {
private final Table table;
private final ListeningExecutorService executorService;

public AsyncTableWrapper(Table table, ListeningExecutorService executorService) {
this.table = table;
this.executorService = executorService;
}

public ListenableFuture<Result> get(final Get gets) {
return this.executorService.submit(
new Callable<Result>() {
@Override
public Result call() throws Exception {
synchronized (table) {
return table.get(gets);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems concerning. So at any given time, we can only have 1 outstanding get? Do you have any concerns around scaling with this arcitecture.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The table object is not thread safe, and we issue a single asynchronous operation that is synchronized on this table object per single blocking client operation. If the secondary database has the same performance as the primary, then we think that is shouldn't really matter, because this synchronized operation will more-or-less overlap with next synchronous operation issued by the user thread. If it is not the case we will hit flow controller limit anyways.

}
}
});
}

public ListenableFuture<Result[]> get(final List<Get> gets) {
return this.executorService.submit(
new Callable<Result[]>() {
@Override
public Result[] call() throws Exception {
synchronized (table) {
return table.get(gets);
}
}
});
}

public ListenableFuture<Boolean> exists(final Get get) {
return this.executorService.submit(
new Callable<Boolean>() {
@Override
public Boolean call() throws Exception {
synchronized (table) {
return table.exists(get);
}
}
});
}

public ListenableFuture<boolean[]> existsAll(final List<Get> gets) {
return this.executorService.submit(
new Callable<boolean[]>() {
@Override
public boolean[] call() throws Exception {
synchronized (table) {
return table.existsAll(gets);
}
}
});
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
/*
* Copyright 2021 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
package com.google.cloud.bigtable.mirroring.hbase1_x;

import static com.google.common.base.Preconditions.checkArgument;

import java.util.Map;
import java.util.Objects;
import java.util.regex.Pattern;
import org.apache.hadoop.conf.Configuration;

public class MirroringConfiguration extends Configuration {
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need to extend a Configuration?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no need to do it, will fix in future PRs.

Configuration primaryConfiguration;
Configuration secondaryConfiguration;
MirroringOptions mirroringOptions;

/**
* Key to set to a name of Connection class that should be used to connect to primary database. It
* is used as hbase.client.connection.impl when creating connection to primary database.
*/
public static final String MIRRORING_PRIMARY_CONNECTION_CLASS_KEY =
"google.bigtable.mirroring.primary-client.connection.impl";

/**
* Key to set to a name of Connection class that should be used to connect to secondary database.
* It is used as hbase.client.connection.impl when creating connection to secondary database.
*/
public static final String MIRRORING_SECONDARY_CONNECTION_CLASS_KEY =
"google.bigtable.mirroring.secondary-client.connection.impl";

/**
* By default all parameters from the Configuration object passed to
* ConnectionFactory#createConnection are passed to Connection instances. If this key is set, then
* only parameters that start with given prefix are passed to primary Connection. Use it if
* primary and secondary connections' configurations share a key that should have different value
* passed to each of connections, e.g. zookeeper url.
*
* <p>Prefixes should not contain dot at the end.
*/
public static final String MIRRORING_PRIMARY_CONFIG_PREFIX_KEY =
"google.bigtable.mirroring.primary-client.prefix";

/**
* If this key is set, then only parameters that start with given prefix are passed to secondary
* Connection.
*/
public static final String MIRRORING_SECONDARY_CONFIG_PREFIX_KEY =
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand the user experience here. Lets say we have an HBase primary and a Bigtable secondary (most common setup), then all the timeouts etc will be common for both, HBase will require zookeeper config while Bigtable will require projectId/InstanceId. In such scenario you will set all the configs without any prefix (primary or secondary) set. Each connection impl will take what it wants from the config and everything will work.

However, if say, we had to setup different timeout values, then we need to specify a prefix (either on primary or secondary), Does that mean that we have to copy all the configurations on that database? A full copy of un-prefixed configs and then a prefixed version with all the configs (including some that will differ)?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as long as all the keys that are common for both clients are the same then no prefix is needed. If The user would want to specify separate configuration for both the databases, we want them to make a copy and keep a prefix. We wanted the user to be explicit about which database is configured to prevent accidental misconfiguration.

"google.bigtable.mirroring.secondary-client.prefix";

public MirroringConfiguration(
Configuration primaryConfiguration,
Configuration secondaryConfiguration,
Configuration mirroringConfiguration) {
super.set("hbase.client.connection.impl", MirroringConnection.class.getCanonicalName());
this.primaryConfiguration = primaryConfiguration;
this.secondaryConfiguration = secondaryConfiguration;
this.mirroringOptions = new MirroringOptions(mirroringConfiguration);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If mirrroing conf has object for both primary and secondary conf, do we need to take them in separately?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This constructor is only for tests, should be package-private and marked as @VisibleForTesting or maybe even removed in future PRs.

}

public MirroringConfiguration(Configuration conf) {
super(conf); // Copy-constructor
// In case the user constructed MirroringConfiguration by hand.
if (conf instanceof MirroringConfiguration) {
MirroringConfiguration mirroringConfiguration = (MirroringConfiguration) conf;
this.primaryConfiguration = new Configuration(mirroringConfiguration.primaryConfiguration);
this.secondaryConfiguration =
new Configuration(mirroringConfiguration.secondaryConfiguration);
this.mirroringOptions = mirroringConfiguration.mirroringOptions;
} else {
checkParameters(conf);
this.primaryConfiguration = constructPrimaryConfiguration(conf);
this.secondaryConfiguration = constructSecondaryConfiguration(conf);
this.mirroringOptions = new MirroringOptions(conf);
}
}

private Configuration constructPrimaryConfiguration(Configuration conf) {
return constructConnectionConfiguration(
conf, MIRRORING_PRIMARY_CONNECTION_CLASS_KEY, MIRRORING_PRIMARY_CONFIG_PREFIX_KEY);
}

private Configuration constructSecondaryConfiguration(Configuration conf) {
return constructConnectionConfiguration(
conf, MIRRORING_SECONDARY_CONNECTION_CLASS_KEY, MIRRORING_SECONDARY_CONFIG_PREFIX_KEY);
}

private Configuration constructConnectionConfiguration(
Configuration conf, String connectionClassKey, String prefixKey) {
String connectionClassName = conf.get(connectionClassKey);
String prefix = conf.get(prefixKey, "");
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do this instead of hardcoding primary/secondary as the prefix?

Are you hoping this will help with swapping primary and secondary? That this will remain hbase/bigtable and you will only change the prefix keys to swap the primary database?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is exactly the reason why we have done it. We could have used bigtable. and hbase. prefixes if we hardcoded them, but we wanted to remain agnostic to underlying implementations and not to make any assumptions that only those two client will be used.

Configuration connectionConfig = extractPrefixedConfig(prefix, conf);
connectionConfig.set("hbase.client.connection.impl", connectionClassName);
return connectionConfig;
}

private static void checkParameters(Configuration conf) {
String primaryConnectionClassName = conf.get(MIRRORING_PRIMARY_CONNECTION_CLASS_KEY);
String secondaryConnectionClassName = conf.get(MIRRORING_SECONDARY_CONNECTION_CLASS_KEY);
String primaryConnectionConfigPrefix = conf.get(MIRRORING_PRIMARY_CONFIG_PREFIX_KEY, "");
String secondaryConnectionConfigPrefix = conf.get(MIRRORING_SECONDARY_CONFIG_PREFIX_KEY, "");

checkArgument(
primaryConnectionClassName != null,
String.format("Specify %s.", MIRRORING_PRIMARY_CONNECTION_CLASS_KEY));
checkArgument(
secondaryConnectionClassName != null,
String.format("Specify %s.", MIRRORING_SECONDARY_CONNECTION_CLASS_KEY));

if (Objects.equals(primaryConnectionClassName, secondaryConnectionClassName)
&& Objects.equals(primaryConnectionConfigPrefix, secondaryConnectionConfigPrefix)) {
if (primaryConnectionConfigPrefix.isEmpty()) {
throw new IllegalArgumentException(
String.format(
"Mirroring connections using the same client class requires a separate "
+ "configuration for one of them. Specify either %s or %s and use its value "
+ "as a prefix for configuration options.",
MIRRORING_PRIMARY_CONFIG_PREFIX_KEY, MIRRORING_SECONDARY_CONFIG_PREFIX_KEY));
} else {
throw new IllegalArgumentException(
String.format(
"Values of %s and %s should be different.",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use more descriptive message here and explain why we are demanding it to be different?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've created an issue for this.

MIRRORING_PRIMARY_CONFIG_PREFIX_KEY, MIRRORING_SECONDARY_CONFIG_PREFIX_KEY));
}
}
}

private static Configuration extractPrefixedConfig(String prefix, Configuration conf) {
if (prefix.isEmpty()) {
return new Configuration(conf);
}

return stripPrefixFromConfiguration(prefix, conf);
}

private static Configuration stripPrefixFromConfiguration(String prefix, Configuration config) {
Map<String, String> matchingConfigs =
config.getValByRegex("^" + Pattern.quote(prefix) + "\\..*");
Configuration newConfig = new Configuration(false);
for (Map.Entry<String, String> entry : matchingConfigs.entrySet()) {
newConfig.set(entry.getKey().substring(prefix.length() + 1), entry.getValue());
}
return newConfig;
}
}
Loading