Skip to content

Proposal: modernize the Bigtable internals #5934

@coryan

Description

@coryan

Status

In Review | Approved | In Progress | Abandoned

This proposal is under review, we have not committed to implementing it. It
needs to wait until the Async* are retired in any case.

Background

Bigtable is the first library we implemented, and its internals show it. We should consider a cleanup to incorporate the
lessons from the last 3 years into its design. We should do this without breaking customers, or at least minimizing the
breakage.

Bigtable is a "resource centric" library, there are classes to represent a Table or an Instance, whereas newer
libraries are "service centric". If we were writing Bigtable again there would be a bigtable::Client class and would
receive the name of the table in each function (as opposed to caching it in the bigtable::Table class).

Bigtable is hard to mock. The only place where application developers can mock behavior is the (confusingly
named) bigtable::DataClient class. This forces them to think of low-level gRPC concepts,
like grpc::ClientReaderInterface<T>, and they need to implement low-level chunking protocols to meet the expectations
of bigtable::Table.

The bigtable::*Client classes are pure virtual, and the majority of their member functions are protected. While
using these classes to mock behavior ranges from "awkward" to "unnecessarily difficult", some people do manage to
implement mocks through them.

Bigtable uses streaming (read) RPCs, including MutateRows() and ReadRows(). ReadRows() is complicated because a
single row (or even cell!) can be chunked across several responses. Moreover, the protocol to resume is to restart the
read from the last fully received row, there is no simple "token" to resume from the last received chunk.

We should be able to use the micro-generator for the Bigtable administrative APIs. There are a number of helper types (
builders for update and create requests), that we may want to preserve as add-ons.

Design Overview

The design is guided by two principles:

  • We should preserve the user-facing classes, bigtable::Table,
    bigtable::TableAdmin, and bigtable::InstanceAdmin. With the existing API, semantics, and performance tradeoffs.

  • We should continue to support construction from the corresponding
    std::shared_ptr<bigtable::*Client>.

With those in mind the proposal can be summarized as: re-implement everything using *Connection-style APIs, change
the bigtable::*Client to be data-only classes used to construct a *Connection.

Code sketches

A class like bigtable::Table will change in its implementation, it will hold a std::shared_ptr<>
of bigtable::Connection:

class bigtable::Table {
 public:
   Table(std::shared_ptr<Connection> c, TableName name)
     : impl_(std::move(c)), name_(std::move(name)), full_name_(name.FullName())
   {} 

 private:
   std::shared_ptr<bigtable::Connection> impl_;
   // a wrapper for the (project_id, instance_id, table_id) triplet
   TableName name_;
   // the table full name, i.e., 
   // projects/<project-id>/instances/<instance-id>/tables/<table-id>
   std::string full_name_;
};

New applications would use this constructor, and the corresponding
bigtable::MakeConnection() to create new tables.

To support existing applications the existing constructor continues to work:

class bigtable::Table {
 public:
   Table(std::shared_ptr<bigtable::DataClient> client)
     : Table(internal::MakeConnection(std::move(client))
   {}
};

To make this work, the existing bigtable::DataClient needs major changes, basically we would remove all the
protected member functions and we will deprecate functions that no longer make sense:

class bigtable::DataClient final {
 public:
   ~DataClient() = default;
   std::string const& project_id()  /* was = 0 */ {
     return project_id_;
   }
   std::string const& instance_id() /* was = 0 */ {
     return instance_id_;
   }
   [[deprecated]] virtual std::shared_ptr<grpc::Channel> Channel() /* was = 0 */;
   [[deprecated]] virtual void reset() /* was = 0 */;
   
 private:
   friend std::shared_ptr<DataClient> CreateDefaultDataClient(std::string project_id,
                                                    std::string instance_id,
                                                    ClientOptions options);

  DataClient(std::string project_id,
             std::string instance_id,
             ClientOptions options)
    : project_id_(std::move(project_id)),
      instance_id_(std::move(instance_id)),
      client_options_(options)
   {}

   friend internal::GetClientOptions(DataClient const&);
   std::string project_id_;
   std::string instance_id_;
   ClientOptions client_options_;  // or maybe the new Options class.
};

The existing MakeDefaultDataClient() continues to work, but it has a much simpler implementation:

[[deprecated("use MakeConnection instead")]]
std::shared_ptr<DataClient> CreateDefaultDataClient(std::string project_id,
                                                    std::string instance_id,
                                                    ClientOptions options) {
  return std::shared_ptr<DataClient>(new DataClient(
      std::move(project_id), std::move(instance_id), std::move(options)));
}

At this point it should be obvious how internal::MakeConnection(std::shared_ptr<DataClient>) can be implemented.

Downsides

This design breaks any application that was defining their own classes derived from DataClient (and TableAdminClient
and InstanceAdminClient). It might be better to simply define the new DataClient class as final and at least get a
clear compilation error.

It is not clear when we would remove the DataClient class, only applications that mock benefit from the new design,
and compelling them to move to a new design is counterproductive.

There are existing functions called bigtable::TableName and bigtable::InstanceName, we may want to recover those
names for classes that represent the resource names (the analog to spanner::Database).

Metadata

Metadata

Assignees

No one assigned

    Labels

    api: bigtableIssues related to the Bigtable API.api: bigtableadminIssues related to the Cloud Bigtable Admin API API.type: cleanupAn internal cleanup or hygiene concern.

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions