Add --prefixes flag for GCS and pre-sharded bucket#432
Add --prefixes flag for GCS and pre-sharded bucket#432dbishop wants to merge 1 commit intominio:masterfrom
Conversation
|
@klauspost , I left the flag non-hidden as even though this one mentions GCS, but the other use-case was at the same tier-1 CSP where we hit the uint16 thread ID limit and needed to hit static pre-sharded top-level "folders" in our test bucket. Let me know if you still think it should be hidden. Thanks! |
|
Honestly, this seems a crazy amount of code for an extremely specific use-case. |
|
That's fair. I think the use-cases are valid, at least for using Warp for large-scale testing with CSPs. For the code volume & complexity, that naturally arose from maintaining seamless backward compatibility and not changing any existing flags or their semantics. I think the additional unit test mitigates the complexity's risk. How do you see these portions of code changing over time (i.e. how would this extra code in this patch potentially affect development velocity in the future)? I'd really like to not have to carry this patch around on our side, and I think it provides some value to all warp users (again, in some specific circumstances). Is there a simpler way for Warp to support these 2 use-cases? Thanks! |
8393575 to
daed3c6
Compare
|
@klauspost, I also rebased the PR branch to keep it clean. Thanks! |
There are 2 use-cases for a static set of object prefixes ("folders"):
1. GCS makes deleting "folders" difficult, at least with the
heirarchical optimization feature enabled, and S3 API delete calls
can't delete them. So if Warp generates a fresh one for every
thread that ever runs, it makes cleaning up the bucket super
tedious, even though it can have zero objects.
2. Testing a major CSP's object storage involved having them pre-shard
the bucket on known prefixes to ensure a valid test. We needed Warp
to support this, so patched in the support. (That patch was
hackier, and this is a better version of it, though it worked fine
for our usage.)
This change preserves the behavior of the --prefix and --noprefix flags.
We add a new flag, "--prefixes value", which is mutually exclusive with
--prefix and whose value is a comma-separated list of one or more static
prefix values. The case of --prefix and --prefixes with one value
specified are identical (and in fact, --prefix is now implemented as a
single-element --prefixes internally).
When multiple prefixes are given with --prefixes, each object within
each thread has a static prefix chosen from the list in a round-robin
fashion. Also, the listings for --list-existing include all the
prefixes, "chaining" the prefix listings together.
Use-case 1 above will be satisfied by specifying --prefixes
with the desired count of top-level "folders" and --noprefix.
Use-case 2 can be satisfied with just --prefixes (though --noprefix
doesn't hurt it).
The interactions of the flags are complicated, so a new unit test was
added to cover the various combinations.
bench.{delete,get,stat} now use a helper function with an options struct
to reduce logic and code duplication.
Cleaned up separation of concerns between generator.{generator,random},
ensuring all prefix generation logic is in generator.random.
daed3c6 to
685df20
Compare
|
@klauspost Added one more tweak to clean up the separation of concerns between generator.{generator,random}. Now all the prefix-calculation is in generator.random, not spread across the two files. This will be simpler and easier to maintain. |
|
@klauspost Hi, any thoughts on this patch after the 2 adjustments last week? Thanks, |
|
@dbishop Honestly I am not convinced, and while I do appreciate the effort without knowing more it seems to add a huge amount of complexity A) For a very edge case and B) That adds a lot of code for us to maintain. I will leave the decision to @harshavardhana - but my overall thought is that this is something you should keep in a fork. |
|
@klauspost @harshavardhana |
|
Closing as per ^^^ |
Community Contribution License
All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.
Description
We add a new flag, "--prefixes value", which is mutually exclusive with --prefix and whose value is a comma-separated list of one or more static prefix values. The case of --prefix and --prefixes with one value specified are identical (and in fact, --prefix is now implemented as a single-element --prefixes internally).
When multiple prefixes are given with --prefixes, each object within each thread has a static prefix chosen from the list in a round-robin fashion. Also, the listings for --list-existing include all the prefixes, "chaining" the prefix listings together.
This change preserves the existing behavior of the --prefix and --noprefix flags.
The interactions of the flags are complicated, so a new unit test was added to cover the various combinations.
Motivation and Context
There are 2 use-cases for a static set of object prefixes ("folders"):
Our use-case 1 is now satisfied by specifying --prefixes with the desired count of top-level "folders" and --noprefix.
Our use-case 2 is now satisfied with just --prefixes (though --noprefix doesn't hurt it).
A quick note about the copyright lines: our legal requirement is to include it in any new file and add to any files with "substantial (e.g. not a trivial bug-fix, but a contribution worthy of recognition of ownership)" modifications. We use our best technical judgement here. In this case, it's one file with a small new function, one file with the round-robin implementation, and the file with the new unit test. They don't change the licensing of the submitted code (Apache-2) or what Minio can do with it or the public later (via the AGPL license), and they don't affect any existing code's copyright.
How to test this PR?
The new unit test gives the primary confidence, though I did functionally test the new behavior with a three-phase set of warp invocations (put, get, delete) with
--prefixes=00,01,...,99with and without--noprefixand verified that the resulting bucket folders & objects in GCS behaved as expected. Without --noprefix, the extra per-client-thread sub folders were left lying around as expected and I had to clean those up using the GCS console.Types of changes
Checklist:
commit-idorPR #here)