feat: writer client mvp by loferris · Pull Request #300 · googleapis/nodejs-bigquery-storage

loferris · 2022-10-01T00:39:41Z

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
Ensure the tests and linter pass
Code coverage does not decrease (if any source code was changed)
Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

snippet-bot · 2022-10-01T00:39:47Z

No region tags are edited in this PR.

This comment is generated by snippet-bot.
If you find problems with this result, please file an issue at:
https://github.com/googleapis/repo-automation-bots/issues.
To update this comment, add snippet-bot:force-run label or use the checkbox below:

Refresh this comment

…into writer_veneer_sandbox merging local and remote

alvarowolfx

I think we need to introduce some extra layers of abstractions to be able to support different use cases of the Storage Write API ( like creating a WriteStream manually vs using the Default Stream ) and also to reuse stream connections to send rows multiples times to it.

alvarowolfx · 2023-02-02T18:35:18Z

+    return this._client_closed;
+  }
+
+  async initializeStreamConnection(clientOptions?: CallOptions): Promise<void> {


I think we need to split this method into two actions: Create a WriteStream ( if applicable, as users can use the DefaultStream and create a "ManagedStream", which is a concept of stream connection open to append rows to it.

The ManagedStream idea came from the Go implementation. The Python implementation has a similar concept called StreamSession. That class will have a reference to a given streamId, schema/proto descriptor and will have a connection opened to append rows. That connection can be reused.

alvarowolfx · 2023-02-02T18:37:35Z

+    }
+  }
+
+  async appendRowsToStream(


This method would be part of a ManagedStream class, that will keep the connection open until we finish sending rows. The issue right now with this function is that is not reusing the connection that much, because is closing the stream when we get the first response.

This method can potentially have a behavior similar to a Promise, where users can call appendRows, returning a "pending write" promise. This way users can call this multiple times and wait for multiple pending writes. Internally the ManagedStream will reuse the same connection and handle them in sequence and mark the pending write as complete. I'll add an example of that on the tests.

alvarowolfx · 2023-02-02T18:44:28Z

+    return responses;
+  }
+
+  async closeStream(): Promise<void> {


Closing the connection/stream and committing/finalizing the stream should be separated actions, specially because for Default Stream, they are a Stream of type COMMITED and doesn't support calling finalizeWriteStream, but users would also want to close the connection to end the streaming. For user created WriteStreams of type PENDING, they will call finalize to commit the changes and then close the connection later.

alvarowolfx · 2023-02-02T18:48:14Z

+    })*/
+  });
+
+  describe('appendRowsToStream', () => {


We need two versions of this test, one for using an user created WriteStream, and another one using the DefaultStream.

it('should invoke appendRowsToStream without errors', async () => { /* Client and Proto initialization, rows creation, etc .... */ const streamName = await client.createWriteStream(); const appendRowsResponsesResult: AppendRowsResponse[] = [ { appendResult: { offset: offset, }, writeStream: streamName, }, ]; try { const managedStream = await client.createManagedStream( streamName, protoDescriptor ); const pw = await managedStream.appendRows( { serializedRows: [serializedRow1Message, serializedRow2Message], }, offset ); const result = await pw.getResult(); const responses: AppendRowsResponse[] = [ { appendResult: result.appendResult, writeStream: result.writeStream, }, ]; assert.deepEqual(appendRowsResponsesResult, responses); const rowCount = await managedStream.finalize(); managedStream.close(); assert.equal(rowCount, 2); const commitResponse = await client.batchCommitWriteStream({ parent: client.getParent(), writeStreams: [streamName], }); assert.equal(commitResponse.streamErrors?.length, 0); } finally { client.close(); } return Promise.resolve(); });

For Default Stream

it('should invoke appendRows to default stream without errors', async () => { /* Client and Proto initialization, rows creation, etc ... */ const appendRowsResponsesResult: AppendRowsResponse[] = [ { appendResult: { offset: null, }, writeStream: parent + '/streams/_default', }, ]; try { const managedStream = await client.createManagedStream( bigquerywriterModule.managedwriter.DefaultStream, protoDescriptor ); const pw = await managedStream.appendRows({ serializedRows: [serializedRow1Message, serializedRow2Message], }); const result = await pw.getResult(); const responses: AppendRowsResponse[] = [ { appendResult: result.appendResult, writeStream: result.writeStream, }, ]; assert.deepEqual(appendRowsResponsesResult, responses); managedStream.close(); client.close(); } finally { client.close(); } return Promise.resolve(); });

We can also test appending rows multiple times and waiting for the proper response:

const appendRowsResponsesResult: AppendRowsResponse[] = [ { appendResult: { offset: null, }, writeStream: parent + '/streams/_default', }, { appendResult: { offset: null, }, writeStream: parent + '/streams/_default', }, ]; try { const managedStream = await client.createManagedStream( bigquerywriterModule.managedwriter.DefaultStream, protoDescriptor ); const pw1 = await managedStream.appendRows({ serializedRows: [serializedRow1Message, serializedRow2Message], }); const pw2 = await managedStream.appendRows({ serializedRows: [serializedRow1Message, serializedRow2Message], }); const results = await Promise.all([ pw1.getResult(), pw2.getResult(), ]); const responses: AppendRowsResponse[] = results.map(result => ( { appendResult: result.appendResult, writeStream: result.writeStream, } )); assert.deepEqual(appendRowsResponsesResult, responses); managedStream.close(); client.close(); } finally { client.close(); } return Promise.resolve();

alvarowolfx · 2023-02-02T18:49:54Z

+
+// eslint-disable-next-line @typescript-eslint/no-unused-vars
+
+describe('managedwriter.WriterClient', () => {


I think for those tests, would be nice to test the backend behavior in different scenarios, not sure if we will be feasible to mock everything in. So as part of the initialization process, we need to set up a test dataset and table. For testing purposes I ended up creating them manually here, but for CI we might need to automate that, similarly on how we do for Samples testing.

alvarowolfx · 2023-02-02T18:50:32Z

+      .batchCommitWriteStreams(batchCommitWriteStreamsReq)
+      .then(result => console.log(result));
+
+    this._client_closed = true;


nit: client_closed should be clientClosed

alvarowolfx · 2023-02-02T18:50:53Z

+};
+type streamConnectionsMap = Record<string, gax.CancellableStream>;
+type StreamConnections = {
+  connection_list: StreamConnection[];


nit: connection_list should be connectionList

alvarowolfx · 2023-02-02T18:51:36Z

+  private _writeStreamType: WriteStream['type'] = 'TYPE_UNSPECIFIED';
+  private _streamId: string;
+  private _client: BigQueryWriteClient;
+  private _connections: StreamConnections;


instead of keeping the raw connection directly here, we can keep a reference to the ManagedStream as I comment more below.

…bigquery-storage into writer_veneer_sandbox merge local and remote

meredithslota · 2023-06-07T16:52:37Z

Closing in favor of #328

Lo Ferris added 5 commits August 30, 2022 15:08

writer_client_sandbox

d744328

refactored pseudo classes

a86c3ae

abstracting out helpers

db3fe28

first pass mvp

36b207f

draft writer_client

bea869c

product-auto-label Bot added size: xl Pull request size is extra large. api: bigquerystorage Issues related to the googleapis/nodejs-bigquery-storage API. labels Oct 1, 2022

product-auto-label Bot added size: l Pull request size is large. size: m Pull request size is medium. and removed size: xl Pull request size is extra large. size: l Pull request size is large. labels Oct 1, 2022

Lo Ferris added 3 commits October 4, 2022 12:40

updating v1 package with writer_client

3c9a921

finalizing draft class

1e48acc

Merge branch 'main' of github.com:googleapis/nodejs-bigquery-storage …

738357f

…into writer_veneer_sandbox merging local and remote

loferris force-pushed the writer_veneer_sandbox branch from 1f1bb09 to 738357f Compare October 4, 2022 20:39

product-auto-label Bot added size: xl Pull request size is extra large. and removed size: m Pull request size is medium. labels Oct 4, 2022

removing sandbox

ddf5329

product-auto-label Bot added size: m Pull request size is medium. and removed size: xl Pull request size is extra large. labels Oct 4, 2022

veneer refactor

3ffa986

product-auto-label Bot added size: l Pull request size is large. and removed size: m Pull request size is medium. labels Nov 2, 2022

Lo Ferris added 6 commits November 16, 2022 12:35

update

9cc6e66

working client, rough draft

18731b3

working version

009b7ef

move veneer out of v1

11bfb89

refactor of managed client

ce4fafa

rewrite to writer client

cae103d

Lo Ferris added 8 commits December 7, 2022 18:22

stashing test progress

f7f5dda

changing constructor logic

316accc

adding initialization test

84a0746

client with added abstraction for connections and main tests

f47539b

rewriting tests to load protos from JSON

e8301d5

refactor of acceptable protobuf messages

b19f095

full happy path test suite

d90e788

refactor test fixtures

a666747

product-auto-label Bot added size: xl Pull request size is extra large. and removed size: l Pull request size is large. labels Jan 13, 2023

Merge branch 'main' into writer_veneer_sandbox

37677d2

alvarowolfx suggested changes Feb 2, 2023

View reviewed changes

loferris and others added 5 commits February 2, 2023 13:37

test refactor

39c409d

Merge branch 'main' into writer_veneer_sandbox

b803316

refactor of appendRowsPending method

eacb508

Merge branch 'writer_veneer_sandbox' of github.com:googleapis/nodejs-…

073cc4e

…bigquery-storage into writer_veneer_sandbox merge local and remote

refactor of writer client

8ff80ec

alvarowolfx mentioned this pull request Feb 27, 2023

feat: storage write api veneer #328

Merged

meredithslota closed this Jun 7, 2023


		// eslint-disable-next-line @typescript-eslint/no-unused-vars

		describe('managedwriter.WriterClient', () => {

Conversation

loferris commented Oct 1, 2022

Uh oh!

snippet-bot Bot commented Oct 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

No region tags are edited in this PR.

Uh oh!

alvarowolfx left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

meredithslota commented Jun 7, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

snippet-bot Bot commented Oct 1, 2022 •

edited

Loading