chore: add AI review triage skill#1441
Conversation
Add a repository-local Codex skill for repeatedly triaging AI-generated PR review feedback. The workflow captures comment retrieval methods, actionable classification, default fix targets, and safe handling rules for GitHub review threads. Validation: checked staged files and searched for TODO/stale classification text.
📝 WalkthroughWalkthroughAdds a new "AI Review Triage" agent skill that defines an end-to-end workflow for resolving a target PR, collecting review context, classifying AI-generated review comments, verifying impactful claims, and choosing/performing edits with reporting requirements. ChangesAI Review Triage Skill
Sequence Diagram(s)sequenceDiagram
participant User as User
participant Agent as AI Agent
participant GH as GitHub (API / Connector)
participant CLI as gh CLI
User->>Agent: Request triage (PR number/URL or current branch)
Agent->>GH: Try connector APIs to fetch PR, threads, reviews
GH-->>Agent: PR & review data / auth error
alt GitHub connector failed
Agent->>CLI: Use `gh` CLI (GraphQL/REST) fallback
CLI-->>Agent: PR & review data or auth/network error
alt auth/network error
Agent->>User: Report blocker, request re-authentication
else
Agent->>Agent: Classify comments, verify CI-impacting claims
Agent->>User: Propose actions (Required/Should/Worth)
alt user approves changes
Agent->>GH: Post edits/comments or leave replies/resolutions
Agent->>User: Report addressed/ignored items and validation results
else
Agent->>User: Output triage-only report (Japanese-first when applicable)
end
end
else
Agent->>Agent: Classify & verify, then follow same action/reporting flow
end
Estimated code review effort🎯 2 (Simple) | ⏱️ ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. 👉 Get your free trial and get 200 agent minutes per Slack user (a $50 value). Review rate limit: 4/5 reviews remaining, refill in 12 minutes. Comment |
There was a problem hiding this comment.
Pull request overview
Adds a repository-local Codex skill to consistently triage AI-generated PR review feedback (e.g., Copilot/CodeRabbit), including guidance on retrieving review context and a clear actionability classification scheme.
Changes:
- Introduces a new skill definition (
gh-ai-review-triage) with agent UI metadata. - Documents an end-to-end triage workflow: retrieving review threads/timelines, classifying comments, verifying claims, and deciding when to act.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
.agents/skills/gh-ai-review-triage/agents/openai.yaml |
Defines the skill’s display metadata and default prompt for the OpenAI agent integration. |
.agents/skills/gh-ai-review-triage/SKILL.md |
Provides the triage workflow, retrieval methods, classifications, and expected output format. |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
.agents/skills/gh-ai-review-triage/SKILL.md (4)
76-83: ⚖️ Poor tradeoffOptional: make heuristics slightly more deterministic (reduce “AI style bias”).
The heuristics (e.g., “CodeRabbit ‘Nitpick’ sections are optional by default… promote to
Worth Fixingwhen safe…”) are reasonable, but they can produce inconsistent outcomes across PRs because “safe” is subjective.
Optional improvement: tie “safe” to concrete checks you expect the agent to run (tests/format/lint, no behavioral change, etc.), so the promotion rule is less opinion-based.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 76 - 83, The heuristics are subjective around promoting CodeRabbit "Nitpick" sections to `Worth Fixing`; update the SKILL.md heuristics to make promotion deterministic by adding concrete checklist criteria: require passing CI/tests, no behavioral changes (e.g., only formatting/lint/test-only changes), and low-risk scope (single file or non-public API), and prefer automated format/lint/tool confirmation (e.g., cargo clippy/pytest/formatter) before promoting; specifically alter the "CodeRabbit 'Nitpick' sections are optional..." sentence to reference these concrete checks so the agent promotes only when the checks (tests/format/lint/tool output and scope) are satisfied.
19-29: ⚡ Quick winAdd guidance for rate limits/pagination/partial retrieval failures.
Step 3 covers “auth/network fails” for
gh, but doesn’t mention what to do for common operational issues like GitHub rate limiting, pagination cutoffs, or partial retrieval (e.g., connector returns fewer items than expected).
Because this skill relies on collecting “review threads + merged timelines,” partial data can silently skew classification. Consider adding a rule like:
- If results are incomplete due to rate limit/pagination, report “data may be incomplete” and either retry/backoff (if the tool supports it) or fall back to the next retrieval method; otherwise default to conservative classification (favor
Ignore/Optionalwhen evidence is missing).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 19 - 29, Add explicit handling for rate limits, pagination, and partial retrieval failures in Step 3: when using connectors or tools like _list_pull_request_review_threads, _fetch_pr_comments, get_pull_request_comments, get_pull_request_reviews, or the gh-based flow (gh pr view / graphql / REST endpoints), detect and surface incomplete results (rate-limit responses, truncated pages, or fewer items than expected), and implement a retry/backoff where supported or automatically fall back to the next retrieval method; if retries/fallbacks are exhausted, emit a clear “data may be incomplete” warning and default to conservative classification (favor Ignore/Optional) rather than guessing from partial data.
24-28: 💤 Low valueFix repeated “If …” phrasing in retrieval step (lint/readability).
Lines 25-28 repeat the same sentence starter (“If …”), which is already flagged by static analysis hints and reduces clarity. Reword to mix sentence starters (e.g., “Prefer…”, “As a fallback…”, “When…”) while keeping the logic identical.
🛠️ Proposed doc wording tweak
-3. Use the best available retrieval method. - - If GitHub connector tools are available, prefer `_list_pull_request_review_threads` for inline threads and `_fetch_pr_comments` for the merged PR timeline. - - If the GitHub MCP server is available, use `get_pull_request_comments` and `get_pull_request_reviews` as a fallback; note that flat comments may not fully preserve thread resolution state. - - If only `gh` is available, use `gh pr view` to resolve the PR, then `gh api graphql` for `reviewThreads` and REST endpoints for `/pulls/{number}/comments`, `/pulls/{number}/reviews`, and `/issues/{number}/comments`. - - If `gh` auth or network access fails, report the blocker and ask for re-authentication or permission instead of guessing from incomplete data. +3. Use the best available retrieval method. + - When GitHub connector tools are available, prefer `_list_pull_request_review_threads` for inline threads and `_fetch_pr_comments` for the merged PR timeline. + - If the GitHub MCP server is available, use `get_pull_request_comments` and `get_pull_request_reviews` as a fallback; note that flat comments may not fully preserve thread resolution state. + - When only `gh` is available, use `gh pr view` to resolve the PR, then `gh api graphql` for `reviewThreads` and REST endpoints for `/pulls/{number}/comments`, `/pulls/{number}/reviews`, and `/issues/{number}/comments`. + - If `gh` auth or network access fails, report the blocker and ask for re-authentication or permission instead of guessing from incomplete data.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 24 - 28, The retrieval step bullet list that starts "Use the best available retrieval method" repeats "If ..." at the start of three consecutive bullets; reword for readability while keeping logic identical: change the first to "Prefer the GitHub connector tools: use _list_pull_request_review_threads for inline threads and _fetch_pr_comments for the merged PR timeline," change the second to "As a fallback when the GitHub MCP server is available, use get_pull_request_comments and get_pull_request_reviews (note flat comments may not preserve thread resolution state)," change the third to "When only gh is available, use gh pr view to resolve the PR, then gh api graphql for reviewThreads and REST endpoints /pulls/{number}/comments, /pulls/{number}/reviews, and /issues/{number}/comments," and leave the final sentence about gh auth/network failures unchanged.
42-52: ⚡ Quick winClarify “triage-only” vs “write-action” gating and output expectations.
You already say “Do not post replies, resolve threads, or submit reviews … unless the user explicitly requests that write action.” However, the spec doesn’t clearly define what “explicitly requests” means in terms of inputs/mode (e.g., a boolean like
mode: triage-onlyvsmode: edit), nor what the agent should do when the user asks for “fix required items” but not “post to GitHub.”
This can lead to inconsistent behavior: the agent might edit locally but still avoid writing (or vice versa) depending on ambiguous interpretation.Suggestion: add a short “Mode contract” under step 6/7, e.g.:
- Triaging mode: no GitHub writes; only report classification + suggested fixes.
- Edit mode: apply code changes locally and show a validation plan; only push/commit or comment when user explicitly asks.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 42 - 52, Add a short "Mode contract" under steps 6 and 7 that formally defines the two modes and their behaviors: "Triaging mode" (no GitHub writes; only classify items and provide suggested fixes), and "Edit mode" (apply code changes locally, run validation steps, and show a validation/commit plan but do not push or comment unless the user explicitly requests write actions). Also document the input signal expected to select a mode (e.g., a clear flag like mode: triage-only vs mode: edit or explicit phrases such as "only triage" / "apply edits locally") and clarify handling when the user asks to "fix required items" but not to post to GitHub (apply edits locally and report what was changed and which items were skipped from Should/Worth Fixing). Update the prose around steps 6 and 7 to reference this "Mode contract" and the exact user signals that trigger write actions so behavior is unambiguous.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In @.agents/skills/gh-ai-review-triage/SKILL.md:
- Around line 30-47: The doc currently allows promotion to "Required" before
verification—update SKILL.md to require verification evidence before classifying
something as Required: amend Step 5 (and reference Step 4/Step 6) to state "Only
mark Required if the issue is verifiable with reproduction/test evidence or
deterministically verifiable from available data; if verification isn't
possible, downgrade to Should Fix with a 'needs verification' tag (or Ignore if
clearly stale/duplicate)." Ensure the new sentence is inserted near Step 5's
verification guidance and cross-referenced in Step 4 and Step 6 so agents use
verification feasibility when deciding fix targets.
---
Nitpick comments:
In @.agents/skills/gh-ai-review-triage/SKILL.md:
- Around line 76-83: The heuristics are subjective around promoting CodeRabbit
"Nitpick" sections to `Worth Fixing`; update the SKILL.md heuristics to make
promotion deterministic by adding concrete checklist criteria: require passing
CI/tests, no behavioral changes (e.g., only formatting/lint/test-only changes),
and low-risk scope (single file or non-public API), and prefer automated
format/lint/tool confirmation (e.g., cargo clippy/pytest/formatter) before
promoting; specifically alter the "CodeRabbit 'Nitpick' sections are
optional..." sentence to reference these concrete checks so the agent promotes
only when the checks (tests/format/lint/tool output and scope) are satisfied.
- Around line 19-29: Add explicit handling for rate limits, pagination, and
partial retrieval failures in Step 3: when using connectors or tools like
_list_pull_request_review_threads, _fetch_pr_comments,
get_pull_request_comments, get_pull_request_reviews, or the gh-based flow (gh pr
view / graphql / REST endpoints), detect and surface incomplete results
(rate-limit responses, truncated pages, or fewer items than expected), and
implement a retry/backoff where supported or automatically fall back to the next
retrieval method; if retries/fallbacks are exhausted, emit a clear “data may be
incomplete” warning and default to conservative classification (favor
Ignore/Optional) rather than guessing from partial data.
- Around line 24-28: The retrieval step bullet list that starts "Use the best
available retrieval method" repeats "If ..." at the start of three consecutive
bullets; reword for readability while keeping logic identical: change the first
to "Prefer the GitHub connector tools: use _list_pull_request_review_threads for
inline threads and _fetch_pr_comments for the merged PR timeline," change the
second to "As a fallback when the GitHub MCP server is available, use
get_pull_request_comments and get_pull_request_reviews (note flat comments may
not preserve thread resolution state)," change the third to "When only gh is
available, use gh pr view to resolve the PR, then gh api graphql for
reviewThreads and REST endpoints /pulls/{number}/comments,
/pulls/{number}/reviews, and /issues/{number}/comments," and leave the final
sentence about gh auth/network failures unchanged.
- Around line 42-52: Add a short "Mode contract" under steps 6 and 7 that
formally defines the two modes and their behaviors: "Triaging mode" (no GitHub
writes; only classify items and provide suggested fixes), and "Edit mode" (apply
code changes locally, run validation steps, and show a validation/commit plan
but do not push or comment unless the user explicitly requests write actions).
Also document the input signal expected to select a mode (e.g., a clear flag
like mode: triage-only vs mode: edit or explicit phrases such as "only triage" /
"apply edits locally") and clarify handling when the user asks to "fix required
items" but not to post to GitHub (apply edits locally and report what was
changed and which items were skipped from Should/Worth Fixing). Update the prose
around steps 6 and 7 to reference this "Mode contract" and the exact user
signals that trigger write actions so behavior is unambiguous.
🪄 Autofix (Beta)
✅ Autofix completed
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: a4fb9e4a-c1b6-455d-ad83-cb385d9000e6
📒 Files selected for processing (2)
.agents/skills/gh-ai-review-triage/SKILL.md.agents/skills/gh-ai-review-triage/agents/openai.yaml
|
Note Autofix is a beta feature. Expect some limitations and changes as we gather feedback and continue to improve it. Fixes Applied SuccessfullyFixed 1 file(s) based on 1 unresolved review comment. Files modified:
Commit: The changes have been pushed to the Time taken: |
Fixed 1 file(s) based on 1 unresolved review comment. Co-authored-by: CodeRabbit <[email protected]>
There was a problem hiding this comment.
🧹 Nitpick comments (2)
.agents/skills/gh-ai-review-triage/SKILL.md (2)
56-68: ⚡ Quick winMake “needs verification” evidence gaps explicit in Japanese output
When a comment is downgraded to
Should Fixwithneeds verification, Step 5 already defines the downgrade behavior, but the Japanese triage output format doesn’t explicitly require stating what was missing (what claim couldn’t be verified, and why). Add a short requirement under “対応する/しない理由” so agent outputs stay auditable and consistent.🛠️ Proposed addition to Output Format
- 対応する/しない理由: - ... + - 特に「needs verification」でダウングレードした場合は、どの主張をどこまで検証できた/できなかったか(不足しているエビデンス/再現手段)を明記する。🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 56 - 68, Update the Japanese triage output template in SKILL.md (the "When only triaging, summarize in Japanese" block) so that under the "対応する/しない理由:" section the agent must explicitly state any "needs verification" evidence gaps when a comment is downgraded to "Should Fix"; specify the exact claim or item that could not be verified and the reason (e.g., missing log, unclear reproduction steps, absent test), and require a one-line phrase like "検証不足: <missing claim> — <reason>" to ensure auditable, consistent outputs.
24-29: ⚡ Quick winStyle: avoid repeated “If …” starters in Step 3
Step 3 has multiple consecutive bullets starting with “If …”, which makes the instruction block harder to scan and matches the provided LanguageTool hint. Rewording to “When/Otherwise/If” keeps the logic identical while improving readability.
🛠️ Proposed doc rewording
3. Use the best available retrieval method. - - If GitHub connector tools are available, prefer `_list_pull_request_review_threads` for inline threads and `_fetch_pr_comments` for the merged PR timeline. - - If the GitHub MCP server is available, use `get_pull_request_comments` and `get_pull_request_reviews` as a fallback; note that flat comments may not fully preserve thread resolution state. - - If only `gh` is available, use `gh pr view` to resolve the PR, then `gh api graphql` for `reviewThreads` and REST endpoints for `/pulls/{number}/comments`, `/pulls/{number}/reviews`, and `/issues/{number}/comments`. + - When GitHub connector tools are available, prefer `_list_pull_request_review_threads` for inline threads and `_fetch_pr_comments` for the merged PR timeline. + - Otherwise, if the GitHub MCP server is available, use `get_pull_request_comments` and `get_pull_request_reviews` as a fallback; note that flat comments may not fully preserve thread resolution state. + - Otherwise, if only `gh` is available, use `gh pr view` to resolve the PR, then `gh api graphql` for `reviewThreads` and REST endpoints for `/pulls/{number}/comments`, `/pulls/{number}/reviews`, and `/issues/{number}/comments`. - If `gh` auth or network access fails, report the blocker and ask for re-authentication or permission instead of guessing from incomplete data.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.agents/skills/gh-ai-review-triage/SKILL.md around lines 24 - 29, Rewrite Step 3 to avoid repeated “If …” sentence starts by rephrasing the bullets to vary the lead words while preserving logic: e.g., start the first bullet with “Prefer” when recommending `_list_pull_request_review_threads` and `_fetch_pr_comments`, change the MCP fallback to “Otherwise, use” for `get_pull_request_comments` and `get_pull_request_reviews`, use “When only `gh` is available” for the `gh pr view` / `gh api graphql` / REST endpoints (`/pulls/{number}/comments`, `/pulls/{number}/reviews`, `/issues/{number}/comments`, `reviewThreads`), and keep the last bullet as “If `gh` auth or network access fails” but reword slightly to “If auth or network fails, report the blocker…” so the block reads with varied openings and remains functionally identical to the original.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.agents/skills/gh-ai-review-triage/SKILL.md:
- Around line 56-68: Update the Japanese triage output template in SKILL.md (the
"When only triaging, summarize in Japanese" block) so that under the
"対応する/しない理由:" section the agent must explicitly state any "needs verification"
evidence gaps when a comment is downgraded to "Should Fix"; specify the exact
claim or item that could not be verified and the reason (e.g., missing log,
unclear reproduction steps, absent test), and require a one-line phrase like
"検証不足: <missing claim> — <reason>" to ensure auditable, consistent outputs.
- Around line 24-29: Rewrite Step 3 to avoid repeated “If …” sentence starts by
rephrasing the bullets to vary the lead words while preserving logic: e.g.,
start the first bullet with “Prefer” when recommending
`_list_pull_request_review_threads` and `_fetch_pr_comments`, change the MCP
fallback to “Otherwise, use” for `get_pull_request_comments` and
`get_pull_request_reviews`, use “When only `gh` is available” for the `gh pr
view` / `gh api graphql` / REST endpoints (`/pulls/{number}/comments`,
`/pulls/{number}/reviews`, `/issues/{number}/comments`, `reviewThreads`), and
keep the last bullet as “If `gh` auth or network access fails” but reword
slightly to “If auth or network fails, report the blocker…” so the block reads
with varied openings and remains functionally identical to the original.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yml
Review profile: CHILL
Plan: Pro
Run ID: 9f8797ed-2e34-4c41-8839-e6ab502228b3
📒 Files selected for processing (1)
.agents/skills/gh-ai-review-triage/SKILL.md
Summary
gh.Required,Should Fix,Worth Fixing,Optional,Ignore) so useful AI review comments can be addressed consistently while noise is left alone.Related Issues
N/A
Type of Change
fix/branch)feat/branch)refactor/branch)docs/branch)chore/branch)Screenshots / Videos
N/A
Test Plan
Validation performed:
rg -n "TODO|\[TODO|Recommended|only what is necessary|必須対応" .agents/skills/gh-ai-review-triageChecklist
npm run lint && npm run format/cargo tauri-lint && cargo tauri-fmt)npm test/cargo tauri-test)Summary by CodeRabbit
New Features
Documentation