Skip to content
This repository was archived by the owner on Jun 24, 2024. It is now read-only.

feat!: Multiple Chunk Upload Support#486

Merged
danielbankhead merged 19 commits intomainfrom
multi-upload-by-chunk-support
Dec 30, 2021
Merged

feat!: Multiple Chunk Upload Support#486
danielbankhead merged 19 commits intomainfrom
multi-upload-by-chunk-support

Conversation

@danielbankhead
Copy link
Copy Markdown
Contributor

@danielbankhead danielbankhead commented Dec 13, 2021

Adding support for multiple chunk uploads. Required a sizable refactor and dozens of new tests - potentially breaking change.

@product-auto-label product-auto-label Bot added the api: storage Issues related to the googleapis/gcs-resumable-upload API. label Dec 13, 2021
Comment thread src/index.ts Outdated
@ddelgrosso1
Copy link
Copy Markdown
Contributor

Just a general comment, I took a look at the newly added code in index.ts and it makes sense to me. It looks like as of now all the existing unit and conformance tests still work. Have you verified that you do indeed see multiple HTTP requests when a chunk size is supplied, i.e. against test bench?

@danielbankhead
Copy link
Copy Markdown
Contributor Author

Working on verification now, along with additional tests. Thanks for the preview review!

@generated-files-bot
Copy link
Copy Markdown

Warning: This pull request is touching the following templated files:

@danielbankhead danielbankhead marked this pull request as ready for review December 23, 2021 19:32
@danielbankhead danielbankhead requested review from a team and bcoe December 23, 2021 19:32
@danielbankhead danielbankhead changed the title feat: Init multi chunk upload support feat: Multiple Chunk Upload Support Dec 23, 2021
Comment thread package.json
Comment thread src/index.ts Outdated
request: <T = any>(
opts: GaxiosOptions
) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>;
request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was this change because it was defined as any type?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this potentially cause problems for folks who've provided a shim for authClient, like this:

// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
  projectId: 'my_project_id',
  authClient: {
    sign: () => Promise.reject('unsupported'),
    getCredentials: () => Promise.reject(),
    request: (opts, callback) => {
      return oauth2Client.request(opts, callback);
    },
    authorizeRequest: async (opts) => {
      opts = opts || {};
      const url = opts.url || opts.uri;
      const headers = await oauth2Client.getRequestHeaders(url);
      opts.headers = Object.assign(opts.headers || {}, headers);
      return opts;
    },
  },
};

Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I reverted this change with a generic <T> (without the any)

@ddelgrosso1
Copy link
Copy Markdown
Contributor

Thanks for adding all of those tests.

Comment thread test/test.ts
Comment thread src/index.ts Outdated
Copy link
Copy Markdown

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!

Comment thread README.md Outdated
Comment thread src/index.ts
Comment thread src/index.ts Outdated
Comment thread src/index.ts
Comment thread src/index.ts
Comment thread src/index.ts Outdated
Comment thread src/index.ts
// be fine for the next request.
this.unshiftChunkBuffer(dataToPrependForResending);
this.numBytesWritten -= missingBytes;
this.lastChunkSent = Buffer.alloc(0);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the buffer re-allocated for each chunk?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay-- just asking because I know in some languages repeated buffer allocs can be painful. Is there anyway to create one buffer and reuse it for each chunk? Or is that not really how it works in node?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understandable - it doesn't really work that way in Node.js (at least for Buffer.alloc(0))

Copy link
Copy Markdown

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!

Comment thread src/index.ts Outdated
Comment thread src/index.ts
@ddelgrosso1
Copy link
Copy Markdown
Contributor

I think overall the refactor looks good. There are a few failing tests in CI.

@danielbankhead
Copy link
Copy Markdown
Contributor Author

@ddelgrosso1 thanks - just fixed some tests for the #isSuccessfulResponse refactor

Copy link
Copy Markdown
Contributor

@ddelgrosso1 ddelgrosso1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I don't have any major concerns, I think the code and the tests look good. I would like some other eyes on it before merging though.

Copy link
Copy Markdown

@bcoe bcoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few thoughts:

  1. I would be tempted to call this a breaking change, since you rewrite a lot of the original implementation, along with adding the chunk upload support, I think a major bump will help protect you from any regressions, because you can make the update to @google-cloud/storage explicit, and test throughly.
  2. When you do add this to @google-cloud/storage, I would add a few integration tests -- I think you could just generate junk data, and assert that both types of uploading work effectively.
  3. I left a note about a potential refactor, this is non-blocking, but I'd be curious to see if it makes the logic read easier -- the intermingling of single/multi chunk behavior detracts from what to me looks like a great refactor of the library.

Overall, I really like your implementation, feels like a solid cleanup to the library, and you've done a great job of adding tests.

The only blocker for me, would be I'd like to advocate for the major bump to this library -- curious what others think.

Comment thread src/index.ts Outdated
request: <T = any>(
opts: GaxiosOptions
) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>;
request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would this potentially cause problems for folks who've provided a shim for authClient, like this:

// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
  projectId: 'my_project_id',
  authClient: {
    sign: () => Promise.reject('unsupported'),
    getCredentials: () => Promise.reject(),
    request: (opts, callback) => {
      return oauth2Client.request(opts, callback);
    },
    authorizeRequest: async (opts) => {
      opts = opts || {};
      const url = opts.url || opts.uri;
      const headers = await oauth2Client.getRequestHeaders(url);
      opts.headers = Object.assign(opts.headers || {}, headers);
      return opts;
    },
  },
};

Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?

Comment thread src/index.ts
*
* @returns if the request is ok to continue as-is
*/
private async ensureUploadingSameObject() {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this edge case likely to happen? and what would have happened in the previous implementation of gcs-resumbable-upload?

If this is an optimization, it might be nice to do this in follow up work just to simplify this PR.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread test/test.ts
assert.equal(data.byteLength, CHUNK_SIZE);
});

it('should prepare a valid request if the remaining data is less than `chunkSize`', async () => {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I appreciate how thorough these tests are 🎉

Comment thread src/index.ts
let cachedFirstChunk = this.get('firstChunk');
const firstChunk = chunk.slice(0, 16).valueOf();
// continue uploading next chunk
this.continueUploading();
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the logic of continueUploading hasn't changed much. Is the upload endpoint a constant URI, and it's just Content-Range that changes to support multiple chunk uploads?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct - it's mainly Content-Range and supporting N requests per Upload (required a lot of decoupling throughout the class)

Comment thread src/index.ts
resp.status === RESUMABLE_INCOMPLETE_STATUS_CODE &&
resp.headers.range;

if (shouldContinueWithNextMultiChunkRequest) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a few places like this, where we have almost completely different logic for chunk upload, vs., regular upload, but share a method.

It makes me wonder if the implementation would read slightly cleaner if we had, export class Upload extends Pumpify in index.ts, as it is today, but then introduced an additional class, export class ChunkUpload extends Upload.

It might introduce a bit of repeated logic, but I'm curious if it would make the implementation even more elegant.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, I think separating out the logic more between the two upload types would definitely make the code easier to read.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point - I figured after the refactor to support multiple requests most of the logic would be the same with the exception of 1 parameter (upstreamIterator in startUploading) and 3 if statements (including this one).

I was hoping shouldContinueWithNextMultiChunkRequest would make it easier to understand what's happening, but seemed to have backfired.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said, not blocking for me, but if you have the time to experiment with the refactor, I'd be curoius what it looks like.

Comment thread README.md Outdated
@ddelgrosso1
Copy link
Copy Markdown
Contributor

The only blocker for me, would be I'd like to advocate for the major bump to this library -- curious what others think.

I am onboard with making this a major version bump. I think this library doesn't see much usage as is. I don't think we will have to be porting many fixes backwards, which as I learned can be a painful exercise.

That said, if others object I want to just triple check that the API surface wasn't changed as a result of this refactor?

@danielbankhead danielbankhead changed the title feat: Multiple Chunk Upload Support feat!: Multiple Chunk Upload Support Dec 30, 2021
@danielbankhead
Copy link
Copy Markdown
Contributor Author

Thanks for the feedback @bcoe, @ddelgrosso1, & @tritone - labeled this as breaking.

@danielbankhead danielbankhead merged commit dba1a39 into main Dec 30, 2021
@danielbankhead danielbankhead deleted the multi-upload-by-chunk-support branch December 30, 2021 15:46
@release-please release-please Bot mentioned this pull request Dec 30, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

api: storage Issues related to the googleapis/gcs-resumable-upload API.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants