Delay backporting until 7 days after merge and skip reverted PRs#100880
Delay backporting until 7 days after merge and skip reverted PRs#100880
Conversation
Add `ci/jobs/scripts/backport_labels.py` — a new step in the `CherryPick` workflow that runs before `cherry_pick.py`. It searches for merged PRs carrying a backport label (`pr-must-backport`, `pr-must-backport-force`, `pr-critical-bugfix`, `vX.Y-must-backport`) that were merged more than 7 days ago and updated in the last 30 days. For each candidate it checks whether a merged "Revert #N" PR exists: if so, all backport labels are removed; otherwise the `ready-for-backport` label is added. `cherry_pick.py` is updated to require `label:ready-for-backport` in its search query, so only pre-vetted PRs are processed. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
|
Workflow [PR], commit [2aa2cf4] Summary: ✅ AI ReviewSummaryThis PR introduces a Findings
Tests
ClickHouse Rules
Performance & Safety
Final Verdict
|
| from ci.jobs.scripts.workflow_hooks.pr_labels_and_category import Labels | ||
| from ci.praktika.utils import Shell | ||
|
|
||
| READY_DELAY_DAYS = 7 |
There was a problem hiding this comment.
Maybe 7 days is too long, from the other side it is more safer, I'm OK with it for now
The previous search query `"Revert" "#{pr_number}"` searched in title and
body, so it could match the original PR itself (whose body often references
its own number). This caused valid PRs to be mistakenly treated as reverted
and have their backport labels removed.
Three protections added to `find_revert_pr`:
- `in:title` — scope the search to titles only.
- Self-match exclusion — skip results whose number equals the original PR.
- Title validation — require the matched title to start with `Revert ` and
contain the exact `#<N>` token (prevents `#1234` matching a search for
`#123`).
Also improve logging: print each label action in non-dry-run mode and show
the revert PR number, title, and URL when one is found.
Fixes: #100880 (comment)
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
ClickHouse revert PRs do not contain the original PR number in the title.
The actual format is:
Title: Revert "<original title>"
Body: Reverts ClickHouse/ClickHouse#<N>
Switch the search to look for `Reverts {repo}#{pr_number}` in the body
(the canonical GitHub revert body format), and validate the title starts
with `Revert "` to confirm it is a GitHub-generated revert PR.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
leshikus
left a comment
There was a problem hiding this comment.
This is a functionality requested by dev team - has to be implemented; all comments are nice to have
| READY_DELAY_DAYS = 7 | ||
|
|
||
|
|
||
| def gh_search(query: str, per_page: int = 100, max_results: int = 1000) -> List[Dict[str, Any]]: |
| page = 1 | ||
| while True: | ||
| output = Shell.get_output( | ||
| f"gh api -X GET search/issues " |
There was a problem hiding this comment.
the request looks generic enough and not specific for backporting; probably should be a part of some library
| 3. Have not been reverted (no merged "Revert #<N>" PR found). | ||
| 4. Do not already carry `ready-for-backport` or `pr-backports-created`. | ||
|
|
||
| The label is later consumed by `cherry_pick.py`, which skips PRs without it. |
There was a problem hiding this comment.
maybe it's better to use "used"; does cherry_pick.py delete this label? even if it does, it's better to use a simpler language than "consumed"
| return bool(gh_search(query, per_page=10, max_results=10)) | ||
|
|
||
|
|
||
| def mark_ready(repo: str, pr_number: int, dry_run: bool) -> None: |
There was a problem hiding this comment.
this function adds a label - maybe it's better to call adding a label add_<label>_label
| ) | ||
|
|
||
|
|
||
| def remove_backport_labels( |
There was a problem hiding this comment.
the function probably removes any labels; here goes the same comment as for the previous PR regarding PR cache - it's more conventional to have add_label and remove_label for PR object than re-implementing these functions each time
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
When running from the repo root, `praktika` (at ci/praktika/) was not
on sys.path, causing an import failure. Add `sys.path.append("ci")` so
that both `praktika.utils` and the transitive `praktika` imports inside
`pr_labels_and_category` resolve correctly.
Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Replaces cwd-relative sys.path.append with paths derived from the script's own location, so imports work regardless of which directory the script is launched from. Co-Authored-By: Claude Sonnet 4.6 (1M context) <[email protected]>
Aligns the search window with cherry_pick.py so PRs merged up to 90 days ago are eligible, preventing silent skips for PRs with no recent activity.
- Add unit tests covering: exact body match, self-match exclusion,
title mismatch, empty results, multiple candidates.
- Update module docstring to reflect actual revert detection logic
(title Revert "..." + body Reverts {repo}#<N>).
Addresses:
#100880 (comment)
#100880 (comment)
Was accidentally dropped during refactoring, causing NameError on startup. Fixes: https://github.com/ClickHouse/ClickHouse/pull/100880/changes#r3000535407
Switch from get_output (swallows non-zero exit) to get_output_or_raise so that rate-limit, auth, or network failures raise instead of silently returning an empty candidate list and skipping all backport labeling. Fixes: https://github.com/ClickHouse/ClickHouse/pull/100880/changes/BASE..8e3fbdf745bc94d75e119dc5fef1f2df58efa13b#r3000875077
|
|
||
|
|
||
| if __name__ == "__main__": | ||
| sys.exit(main()) |
There was a problem hiding this comment.
label:{label_filter} is built from a comma-joined list of backport labels, but in GitHub search commas in label: act as AND (see existing note in tests/ci/cherry_pick.py around line 680). That makes this query require all backport labels at once, so candidates can be empty and PRs never get ready-for-backport.
Please build OR semantics instead (e.g. separate searches per label and union by PR number, or an explicit OR expression in q).
Add -label:pr-cherrypick and -label:pr-backport to the search query so that backport/cherry-pick PRs derived from the original are never picked up as candidates for the ready-for-backport label.
|
Is it possible to bypass this, and create backport immediately? |
- get_release_branches: switch to get_output_or_raise and raise on empty output, consistent with gh_search; avoids silently dropping all version-specific backport labels on API failure - remove_backport_labels: rename loop variable l -> lbl (shadows builtin) - Move pathlib import into the stdlib block
| print(f" DRY RUN: would add {Labels.READY_FOR_BACKPORT!r} to PR #{pr_number}") | ||
| return | ||
| print(f" Adding {Labels.READY_FOR_BACKPORT!r} to PR #{pr_number}") | ||
| Shell.run( |
There was a problem hiding this comment.
mark_ready ignores failures from gh pr edit: Shell.run defaults to strict=False, and the return code is not checked. If GitHub API calls fail (permissions, rate limit, transient network), this step silently continues and exits successfully, so PRs may never receive ready-for-backport even though the workflow reports success.
Please make this fail-fast (for example, Shell.run(..., strict=True)) or check the return code and raise. The same issue exists in remove_backport_labels.
The merged:2020-01-01.. lower bound was unnecessary — the updated:>=90d
filter already limits candidates to recently active PRs in steady state.
Use merged:..{cutoff} to express only the upper bound (not yet eligible).
| return | ||
| print(f" Removing labels {to_remove} from PR #{pr_number}") | ||
| Shell.run( | ||
| f"gh pr edit {pr_number} {remove_args} --repo {repo}", |
There was a problem hiding this comment.
remove_backport_labels ignores failures from gh pr edit: Shell.run defaults to strict=False, and the return code is not checked.
If the GitHub call fails (permissions/rate limit/transient network), the script silently continues and leaves stale backport labels on reverted PRs, so they can still be picked up by downstream automation.
Please make this call strict (or check the return code and raise) the same way as other required workflow operations.
add label: ready-for-backport |
PRs marked for backporting were previously picked up by
cherry_pick.pyimmediately after merge. This had two problems:Changes
A new step is added to the
CherryPickworkflow (runs hourly) before the cherry-pick job. It runsci/jobs/scripts/backport_labels.py, which:pr-must-backport,pr-must-backport-force,pr-critical-bugfix,vX.Y-must-backport) that were merged more than 7 days ago and updated in the last 30 days.ready-for-backportlabel.cherry_pick.pyis updated to requirelabel:ready-for-backportin its search query, so only pre-vetted PRs are processed.Changelog category (leave one):
Changelog entry:
Backporting is now delayed until 7 days after a PR is merged, and reverted PRs are automatically detected and have their backport labels removed before any cherry-pick branches are created.
Documentation entry for user-facing changes