Skip to content

Add --prefixes flag for GCS and pre-sharded bucket#432

Closed
dbishop wants to merge 1 commit intominio:masterfrom
dbishop:prefixes-flag
Closed

Add --prefixes flag for GCS and pre-sharded bucket#432
dbishop wants to merge 1 commit intominio:masterfrom
dbishop:prefixes-flag

Conversation

@dbishop
Copy link
Copy Markdown
Contributor

@dbishop dbishop commented Nov 20, 2025

Community Contribution License

All community contributions in this pull request are licensed to the project maintainers
under the terms of the Apache 2 license.
By creating this pull request I represent that I have the right to license the
contributions to the project maintainers under the Apache 2 license.

Description

We add a new flag, "--prefixes value", which is mutually exclusive with --prefix and whose value is a comma-separated list of one or more static prefix values. The case of --prefix and --prefixes with one value specified are identical (and in fact, --prefix is now implemented as a single-element --prefixes internally).

When multiple prefixes are given with --prefixes, each object within each thread has a static prefix chosen from the list in a round-robin fashion. Also, the listings for --list-existing include all the prefixes, "chaining" the prefix listings together.

This change preserves the existing behavior of the --prefix and --noprefix flags.

The interactions of the flags are complicated, so a new unit test was added to cover the various combinations.

Motivation and Context

There are 2 use-cases for a static set of object prefixes ("folders"):

  1. GCS makes deleting "folders" difficult, at least with the heirarchical optimization feature enabled, and S3 API delete calls can't delete them. So if Warp generates a fresh one for every thread that ever runs, it makes cleaning up the bucket super tedious, even though the bucket has zero objects.
  2. Testing a major CSP's object storage involved having them pre-shard the bucket on known prefixes to ensure a valid test. We needed Warp to support this, so patched in the support. (That patch was hackier, and this is a better version of it, though it worked fine for our usage.)

Our use-case 1 is now satisfied by specifying --prefixes with the desired count of top-level "folders" and --noprefix.

Our use-case 2 is now satisfied with just --prefixes (though --noprefix doesn't hurt it).

A quick note about the copyright lines: our legal requirement is to include it in any new file and add to any files with "substantial (e.g. not a trivial bug-fix, but a contribution worthy of recognition of ownership)" modifications. We use our best technical judgement here. In this case, it's one file with a small new function, one file with the round-robin implementation, and the file with the new unit test. They don't change the licensing of the submitted code (Apache-2) or what Minio can do with it or the public later (via the AGPL license), and they don't affect any existing code's copyright.

How to test this PR?

The new unit test gives the primary confidence, though I did functionally test the new behavior with a three-phase set of warp invocations (put, get, delete) with --prefixes=00,01,...,99 with and without --noprefix and verified that the resulting bucket folders & objects in GCS behaved as expected. Without --noprefix, the extra per-client-thread sub folders were left lying around as expected and I had to clean those up using the GCS console.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Optimization (provides speedup with no functional changes)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

  • Fixes a regression (If yes, please add commit-id or PR # here)
  • Unit tests added/updated
  • Internal documentation updated
  • Create a documentation update request here

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Nov 20, 2025

@klauspost , I left the flag non-hidden as even though this one mentions GCS, but the other use-case was at the same tier-1 CSP where we hit the uint16 thread ID limit and needed to hit static pre-sharded top-level "folders" in our test bucket.

Let me know if you still think it should be hidden.

Thanks!

@klauspost
Copy link
Copy Markdown
Collaborator

Honestly, this seems a crazy amount of code for an extremely specific use-case.

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Nov 20, 2025

That's fair. I think the use-cases are valid, at least for using Warp for large-scale testing with CSPs. For the code volume & complexity, that naturally arose from maintaining seamless backward compatibility and not changing any existing flags or their semantics.

I think the additional unit test mitigates the complexity's risk. How do you see these portions of code changing over time (i.e. how would this extra code in this patch potentially affect development velocity in the future)?

I'd really like to not have to carry this patch around on our side, and I think it provides some value to all warp users (again, in some specific circumstances).

Is there a simpler way for Warp to support these 2 use-cases?

Thanks!

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Nov 24, 2025

@klauspost,
Hi, I did find an additional improvement to be had in refactoring the logic for list-existing between bench.{delete,get,stat} and did so with a helper function and options struct. I think that's a strict improvement in complexity and duplicated logic and offsets the fact that the single location of list-existing logic now handles multiple prefixes.

I also rebased the PR branch to keep it clean.

Thanks!

There are 2 use-cases for a static set of object prefixes ("folders"):
 1. GCS makes deleting "folders" difficult, at least with the
    heirarchical optimization feature enabled, and S3 API delete calls
    can't delete them.  So if Warp generates a fresh one for every
    thread that ever runs, it makes cleaning up the bucket super
    tedious, even though it can have zero objects.
 2. Testing a major CSP's object storage involved having them pre-shard
    the bucket on known prefixes to ensure a valid test.  We needed Warp
    to support this, so patched in the support.  (That patch was
    hackier, and this is a better version of it, though it worked fine
    for our usage.)

This change preserves the behavior of the --prefix and --noprefix flags.

We add a new flag, "--prefixes value", which is mutually exclusive with
--prefix and whose value is a comma-separated list of one or more static
prefix values.  The case of --prefix and --prefixes with one value
specified are identical (and in fact, --prefix is now implemented as a
single-element --prefixes internally).

When multiple prefixes are given with --prefixes, each object within
each thread has a static prefix chosen from the list in a round-robin
fashion.  Also, the listings for --list-existing include all the
prefixes, "chaining" the prefix listings together.

Use-case 1 above will be satisfied by specifying --prefixes
with the desired count of top-level "folders" and --noprefix.

Use-case 2 can be satisfied with just --prefixes (though --noprefix
doesn't hurt it).

The interactions of the flags are complicated, so a new unit test was
added to cover the various combinations.

bench.{delete,get,stat} now use a helper function with an options struct
to reduce logic and code duplication.

Cleaned up separation of concerns between generator.{generator,random},
ensuring all prefix generation logic is in generator.random.
@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Nov 24, 2025

@klauspost Added one more tweak to clean up the separation of concerns between generator.{generator,random}. Now all the prefix-calculation is in generator.random, not spread across the two files. This will be simpler and easier to maintain.

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Dec 1, 2025

@klauspost Hi, any thoughts on this patch after the 2 adjustments last week?

Thanks,
Darrell

@klauspost
Copy link
Copy Markdown
Collaborator

klauspost commented Dec 2, 2025

@dbishop Honestly I am not convinced, and while I do appreciate the effort without knowing more it seems to add a huge amount of complexity A) For a very edge case and B) That adds a lot of code for us to maintain.

I will leave the decision to @harshavardhana - but my overall thought is that this is something you should keep in a fork.

@dbishop
Copy link
Copy Markdown
Contributor Author

dbishop commented Dec 2, 2025

@klauspost @harshavardhana
Hedging my bets on this PR, I created #434 to just cover the --list-existing refactoring. I think that's an objective improvement in maintainability, even if not considering this patch.

@klauspost
Copy link
Copy Markdown
Collaborator

Closing as per ^^^

@klauspost klauspost closed this Dec 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants