Skip to content

Conversation

@kkrugler
Copy link
Contributor

Description

Support a new, optional pushFileNamePattern parameter in the pushJobSpec section of the job yaml. This will filter segments in the outputDir that are pushed to Pinot.

Upgrade Notes

Does this PR prevent a zero down-time upgrade? (Assume upgrade order: Controller, Broker, Server, Minion)

  • Yes (Please label as backward-incompat, and complete the section below on Release Notes)

Does this PR fix a zero-downtime upgrade introduced earlier?

  • Yes (Please label this as backward-incompat, and complete the section below on Release Notes)

Does this PR otherwise need attention when creating release notes? Things to consider:

  • New configuration options
  • Deprecation of configurations
  • Signature changes to public methods/interfaces
  • New plugins added or old plugins removed
  • Yes (Please label this PR as release-notes and complete the section on Release Notes)

Release Notes

Support for name-based filtering of segments being pushed to the Pinot cluster.

Documentation

In the https://docs.pinot.apache.org/configuration-reference/job-specification#push-job-spec section, add:

pushFileNamePattern | segment name pattern for which segments to push, supported glob and regex patterns. E.g. 
'glob:stats_* will push all segment files under the outputDirURI whose names start with 'stats_'. 
'glob:*2022-01*' will push all the segment files under the outputDirURI whose names contain '2022-01'.

@kkrugler
Copy link
Contributor Author

See also #8141

@Jackie-Jiang Jackie-Jiang added the release-notes Referenced by PRs that need attention when compiling the next release notes label Feb 14, 2022
@Jackie-Jiang Jackie-Jiang merged commit f12e625 into apache:master Feb 14, 2022
@kkrugler
Copy link
Contributor Author

Thanks @Jackie-Jiang !

@kai11
Copy link

kai11 commented Apr 5, 2023

"glob:*2023-04*" don't work; However, "glob:**2023-04*" does.
We use hdfs for deep storage, so segment names there are full paths like hdfs://hostname/pinot/pinot-segments/table/table_2023-04_00.tar.gz , not table_2023-04_00.tar.gz
PS. This might be hdfs-specific.
PPS. PR to update documentation: pinot-contrib/pinot-docs#160

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

release-notes Referenced by PRs that need attention when compiling the next release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants