feat!: Multiple Chunk Upload Support#486
Conversation
|
Just a general comment, I took a look at the newly added code in |
|
Working on verification now, along with additional tests. Thanks for the preview review! |
|
Warning: This pull request is touching the following templated files:
|
API `writable.writableEnded` isn't available until Node 12.9.0
…load into multi-upload-by-chunk-support
| request: <T = any>( | ||
| opts: GaxiosOptions | ||
| ) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>; | ||
| request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise; |
There was a problem hiding this comment.
Was this change because it was defined as any type?
There was a problem hiding this comment.
Would this potentially cause problems for folks who've provided a shim for authClient, like this:
// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
projectId: 'my_project_id',
authClient: {
sign: () => Promise.reject('unsupported'),
getCredentials: () => Promise.reject(),
request: (opts, callback) => {
return oauth2Client.request(opts, callback);
},
authorizeRequest: async (opts) => {
opts = opts || {};
const url = opts.url || opts.uri;
const headers = await oauth2Client.getRequestHeaders(url);
opts.headers = Object.assign(opts.headers || {}, headers);
return opts;
},
},
};Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?
There was a problem hiding this comment.
Good point - I reverted this change with a generic <T> (without the any)
|
Thanks for adding all of those tests. |
tritone
left a comment
There was a problem hiding this comment.
Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!
| // be fine for the next request. | ||
| this.unshiftChunkBuffer(dataToPrependForResending); | ||
| this.numBytesWritten -= missingBytes; | ||
| this.lastChunkSent = Buffer.alloc(0); |
There was a problem hiding this comment.
Is the buffer re-allocated for each chunk?
There was a problem hiding this comment.
Yes, from upstream.
Buffer.alloc(0) creates a "Fast" buffer: https://github.com/nodejs/node/blob/e4aa575b05fcf560c6b77b0fa5036328a3417442/lib/buffer.js#L359-L366
There was a problem hiding this comment.
Okay-- just asking because I know in some languages repeated buffer allocs can be painful. Is there anyway to create one buffer and reuse it for each chunk? Or is that not really how it works in node?
There was a problem hiding this comment.
Understandable - it doesn't really work that way in Node.js (at least for Buffer.alloc(0))
tritone
left a comment
There was a problem hiding this comment.
Couple questions, I'm not a node expert so I'll rely on @ddelgrosso1 to give a more comprehensive review!
|
I think overall the refactor looks good. There are a few failing tests in CI. |
|
@ddelgrosso1 thanks - just fixed some tests for the |
ddelgrosso1
left a comment
There was a problem hiding this comment.
Overall I don't have any major concerns, I think the code and the tests look good. I would like some other eyes on it before merging though.
bcoe
left a comment
There was a problem hiding this comment.
A few thoughts:
- I would be tempted to call this a breaking change, since you rewrite a lot of the original implementation, along with adding the chunk upload support, I think a major bump will help protect you from any regressions, because you can make the update to
@google-cloud/storageexplicit, and test throughly. - When you do add this to
@google-cloud/storage, I would add a few integration tests -- I think you could just generate junk data, and assert that both types of uploading work effectively. - I left a note about a potential refactor, this is non-blocking, but I'd be curious to see if it makes the logic read easier -- the intermingling of single/multi chunk behavior detracts from what to me looks like a great refactor of the library.
Overall, I really like your implementation, feels like a solid cleanup to the library, and you've done a great job of adding tests.
The only blocker for me, would be I'd like to advocate for the major bump to this library -- curious what others think.
| request: <T = any>( | ||
| opts: GaxiosOptions | ||
| ) => Promise<GaxiosResponse<T>> | GaxiosPromise<T>; | ||
| request: (opts: GaxiosOptions) => Promise<GaxiosResponse> | GaxiosPromise; |
There was a problem hiding this comment.
Would this potentially cause problems for folks who've provided a shim for authClient, like this:
// Use the consumer client to define storageOptions and create a GCS object.
const storageOptions = {
projectId: 'my_project_id',
authClient: {
sign: () => Promise.reject('unsupported'),
getCredentials: () => Promise.reject(),
request: (opts, callback) => {
return oauth2Client.request(opts, callback);
},
authorizeRequest: async (opts) => {
opts = opts || {};
const url = opts.url || opts.uri;
const headers = await oauth2Client.getRequestHeaders(url);
opts.headers = Object.assign(opts.headers || {}, headers);
return opts;
},
},
};Wondering if it changes the function signature if anyone has attempted doing this in TypeScript?
| * | ||
| * @returns if the request is ok to continue as-is | ||
| */ | ||
| private async ensureUploadingSameObject() { |
There was a problem hiding this comment.
Is this edge case likely to happen? and what would have happened in the previous implementation of gcs-resumbable-upload?
If this is an optimization, it might be nice to do this in follow up work just to simplify this PR.
There was a problem hiding this comment.
This preserves existing behavior - I don't think this is likely to happen (https://github.com/googleapis/gcs-resumable-upload/blob/main/src/index.ts#L577-L596)
| assert.equal(data.byteLength, CHUNK_SIZE); | ||
| }); | ||
|
|
||
| it('should prepare a valid request if the remaining data is less than `chunkSize`', async () => { |
There was a problem hiding this comment.
I appreciate how thorough these tests are 🎉
| let cachedFirstChunk = this.get('firstChunk'); | ||
| const firstChunk = chunk.slice(0, 16).valueOf(); | ||
| // continue uploading next chunk | ||
| this.continueUploading(); |
There was a problem hiding this comment.
It looks like the logic of continueUploading hasn't changed much. Is the upload endpoint a constant URI, and it's just Content-Range that changes to support multiple chunk uploads?
There was a problem hiding this comment.
Correct - it's mainly Content-Range and supporting N requests per Upload (required a lot of decoupling throughout the class)
| resp.status === RESUMABLE_INCOMPLETE_STATUS_CODE && | ||
| resp.headers.range; | ||
|
|
||
| if (shouldContinueWithNextMultiChunkRequest) { |
There was a problem hiding this comment.
There are a few places like this, where we have almost completely different logic for chunk upload, vs., regular upload, but share a method.
It makes me wonder if the implementation would read slightly cleaner if we had, export class Upload extends Pumpify in index.ts, as it is today, but then introduced an additional class, export class ChunkUpload extends Upload.
It might introduce a bit of repeated logic, but I'm curious if it would make the implementation even more elegant.
There was a problem hiding this comment.
+1, I think separating out the logic more between the two upload types would definitely make the code easier to read.
There was a problem hiding this comment.
Good point - I figured after the refactor to support multiple requests most of the logic would be the same with the exception of 1 parameter (upstreamIterator in startUploading) and 3 if statements (including this one).
I was hoping shouldContinueWithNextMultiChunkRequest would make it easier to understand what's happening, but seemed to have backfired.
There was a problem hiding this comment.
As I said, not blocking for me, but if you have the time to experiment with the refactor, I'd be curoius what it looks like.
I am onboard with making this a major version bump. I think this library doesn't see much usage as is. I don't think we will have to be porting many fixes backwards, which as I learned can be a painful exercise. That said, if others object I want to just triple check that the API surface wasn't changed as a result of this refactor? |
|
Thanks for the feedback @bcoe, @ddelgrosso1, & @tritone - labeled this as breaking. |
Adding support for multiple chunk uploads. Required a sizable refactor and dozens of new tests - potentially breaking change.