Skip to content

Commit 70f34bf

Browse files
Require real behavior proof for external PRs (#77622)
* ci: require real behavior proof for external PRs * fix: tighten real behavior proof heuristics * fix: reject test-only real behavior proof labels --------- Co-authored-by: Peter Steinberger <[email protected]>
1 parent d02fbc6 commit 70f34bf

10 files changed

Lines changed: 671 additions & 11 deletions

.github/pull_request_template.md

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -35,6 +35,18 @@ If this PR fixes a plugin beta-release blocker, title it `fix(<plugin-id>): beta
3535
- Related #
3636
- [ ] This PR fixes a bug or regression
3737

38+
## Real behavior proof (required for external PRs)
39+
40+
External contributors must show after-fix evidence from a real OpenClaw setup. Unit tests, mocks, lint, typechecks, snapshots, and CI are supplemental only. Screenshots are encouraged even for CLI, console, text, or log changes; terminal screenshots and copied live output count.
41+
42+
- Behavior or issue addressed:
43+
- Real environment tested:
44+
- Exact steps or command run after this patch:
45+
- Evidence after fix (screenshot, recording, terminal capture, console output, redacted runtime log, linked artifact, or copied live output):
46+
- Observed result after fix:
47+
- What was not tested:
48+
- Before evidence (optional but encouraged):
49+
3850
## Root Cause (if applicable)
3951

4052
For bug fixes or regressions, explain why this happened, not just what changed. Otherwise write `N/A`. If the cause is unclear, write `Unknown`.

.github/workflows/auto-response.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@ on:
66
issue_comment:
77
types: [created]
88
pull_request_target: # zizmor: ignore[dangerous-triggers] maintainer-owned label automation; trusted base checkout only, no untrusted PR code execution
9-
types: [opened, edited, synchronize, reopened, labeled]
9+
types: [opened, edited, synchronize, reopened, labeled, unlabeled]
1010

1111
env:
1212
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
name: Real behavior proof
2+
3+
on:
4+
pull_request_target: # zizmor: ignore[dangerous-triggers] trusted base checkout only; no untrusted PR code execution
5+
types: [opened, edited, synchronize, reopened, ready_for_review, labeled, unlabeled]
6+
7+
env:
8+
FORCE_JAVASCRIPT_ACTIONS_TO_NODE24: "true"
9+
10+
concurrency:
11+
group: ${{ github.workflow }}-${{ github.event.pull_request.number || github.ref || github.run_id }}
12+
cancel-in-progress: true
13+
14+
permissions: {}
15+
16+
jobs:
17+
real-behavior-proof:
18+
name: Real behavior proof
19+
permissions:
20+
contents: read
21+
pull-requests: read
22+
runs-on: ubuntu-24.04
23+
steps:
24+
- uses: actions/checkout@v6
25+
with:
26+
ref: ${{ github.event.pull_request.base.sha }}
27+
persist-credentials: false
28+
- name: Check real behavior proof
29+
run: node scripts/github/real-behavior-proof-check.mjs

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ Docs: https://docs.openclaw.ai
1111
### Changes
1212

1313
- Gateway/Windows: bind the default loopback gateway listener only to `127.0.0.1` on Windows so libuv's dual-stack `::1` behavior cannot wedge localhost HTTP requests. (#69701, fixes #69674) Thanks @SARAMALI15792.
14+
- Contributor PRs: require external pull requests to include after-fix real behavior proof from a real OpenClaw setup, with terminal screenshots, console output, redacted runtime logs, linked artifacts, and copied live output treated as valid evidence while unit tests, mocks, lint, typechecks, snapshots, and CI remain supplemental only.
1415
- Plugins/migration: emit catalog-backed install hints when `plugins.entries` or `plugins.allow` references an official external plugin that is not installed, so upgraded configs point operators to `openclaw plugins install <spec>` instead of telling them to remove valid plugin config. (#77483) Thanks @hclsys.
1516
- OpenAI/Codex media: advertise Codex audio transcription in runtime and manifest metadata and route active Codex chat models to the OpenAI transcription default instead of sending chat model ids to audio transcription. Thanks @vincentkoc.
1617
- Dependencies: refresh runtime and provider packages including Pi 0.73.0, ACPX adapters, OpenAI, Anthropic, Slack, and TypeScript native preview, while keeping the Bedrock runtime installer override pinned below the Windows ARM Node 24 npm resolver failure.

CONTRIBUTING.md

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -100,6 +100,7 @@ For coordinated change sets that genuinely need more than 20 PRs, join the **#cl
100100
## Before You PR
101101

102102
- Test locally with your OpenClaw instance
103+
- External PRs must include a filled **Real behavior proof** section in the PR body. Show the real setup you tested, the exact command or steps you ran after the patch, after-fix evidence, the observed result, and anything you did not test. Screenshots, recordings, terminal screenshots, console output, copied live output, linked artifacts, and redacted runtime logs all count. Unit tests, mocks, snapshots, lint, typechecks, and CI are useful but do not satisfy this requirement by themselves. Maintainers may apply `proof: override` only when the proof gate should not apply.
103104
- Run tests: `pnpm build && pnpm check && pnpm test`
104105
- For iterative local commits, `scripts/committer --fast "message" <files...>` passes `FAST_COMMIT=1` through to the pre-commit hook so it skips the repo-wide `pnpm check`. Only use it when you've already run equivalent targeted validation for the touched surface.
105106
- For extension/plugin changes, run the fast local lane first:
@@ -160,7 +161,7 @@ Built with Codex, Claude, or other AI tools? **Awesome - just mark it!**
160161
Please include in your PR:
161162

162163
- [ ] Mark as AI-assisted in the PR title or description
163-
- [ ] Note the degree of testing (untested / lightly tested / fully tested)
164+
- [ ] Include human-run real behavior proof from your own setup. AI-generated tests, mocks, lint, typechecks, and CI output are supplemental only; they do not prove the fix works for users.
164165
- [ ] Include prompts or session logs if possible (super helpful!)
165166
- [ ] Confirm you understand what the code does
166167
- [ ] If you have access to Codex, run `codex review --base origin/main` locally and address the findings before asking for review

scripts/github/barnacle-auto-response.mjs

Lines changed: 57 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,13 @@
11
// Barnacle owns deterministic GitHub triage and auto-response behavior.
22

3+
import {
4+
MOCK_ONLY_PROOF_LABEL,
5+
NEEDS_REAL_BEHAVIOR_PROOF_LABEL,
6+
PROOF_OVERRIDE_LABEL,
7+
evaluateRealBehaviorProof,
8+
labelsForRealBehaviorProof,
9+
} from "./real-behavior-proof-policy.mjs";
10+
311
const activePrLimit = 20;
412

513
const thirdPartyExtensionMessage =
@@ -134,6 +142,18 @@ export const managedLabelSpecs = {
134142
color: "C5DEF5",
135143
description: "Candidate: PR template appears mostly untouched.",
136144
},
145+
[NEEDS_REAL_BEHAVIOR_PROOF_LABEL]: {
146+
color: "C5DEF5",
147+
description: "Candidate: external PR needs after-fix proof from a real setup.",
148+
},
149+
[MOCK_ONLY_PROOF_LABEL]: {
150+
color: "C5DEF5",
151+
description: "Candidate: PR proof only shows tests, mocks, snapshots, lint, typecheck, or CI.",
152+
},
153+
[PROOF_OVERRIDE_LABEL]: {
154+
color: "C2E0C6",
155+
description: "Maintainer override for the external PR real behavior proof gate.",
156+
},
137157
"triage: dirty-candidate": {
138158
color: "C5DEF5",
139159
description: "Candidate: broad unrelated surfaces; may need splitting or cleanup.",
@@ -154,6 +174,8 @@ export const candidateLabels = {
154174
docsDiscoverability: "triage: docs-discoverability",
155175
testOnlyNoBug: "triage: test-only-no-bug",
156176
refactorOnly: "triage: refactor-only",
177+
needsRealBehaviorProof: NEEDS_REAL_BEHAVIOR_PROOF_LABEL,
178+
mockOnlyProof: MOCK_ONLY_PROOF_LABEL,
157179
dirtyCandidate: "triage: dirty-candidate",
158180
riskyInfra: "triage: risky-infra",
159181
externalPluginCandidate: "triage: external-plugin-candidate",
@@ -196,10 +218,23 @@ const maintainerAuthorLabel = "maintainer";
196218
const privilegedAuthorAssociations = new Set(["OWNER", "MEMBER", "COLLABORATOR"]);
197219
const privilegedRepositoryRoles = new Set(["admin", "maintain", "write"]);
198220
const candidateLabelValues = Object.values(candidateLabels);
221+
const proofCandidateLabelValues = [NEEDS_REAL_BEHAVIOR_PROOF_LABEL, MOCK_ONLY_PROOF_LABEL];
199222
const noisyPrMessage =
200223
"Closing this PR because it looks dirty (too many unrelated or unexpected changes). This usually happens when a branch picks up unrelated commits or a merge went sideways. Please recreate the PR from a clean branch.";
201224

202225
const candidateActionRules = [
226+
{
227+
label: candidateLabels.needsRealBehaviorProof,
228+
close: true,
229+
message:
230+
"Closing this PR because it does not include real behavior proof. Please reopen or resubmit with after-fix evidence from a real OpenClaw setup; terminal screenshots, console output, redacted logs, recordings, linked artifacts, and copied live output count. Unit tests, mocks, snapshots, lint, typechecks, and CI are supplemental only.",
231+
},
232+
{
233+
label: candidateLabels.mockOnlyProof,
234+
close: true,
235+
message:
236+
"Closing this PR because the proof only shows tests, mocks, snapshots, lint, typechecks, or CI. Please reopen or resubmit with after-fix evidence from a real OpenClaw setup; terminal screenshots, console output, redacted logs, recordings, linked artifacts, and copied live output count.",
237+
},
203238
{
204239
label: candidateLabels.dirtyCandidate,
205240
close: true,
@@ -438,6 +473,14 @@ export function classifyPullRequestCandidateLabels(pullRequest, files) {
438473
labelsToAdd.push(candidateLabels.blankTemplate);
439474
}
440475

476+
labelsToAdd.push(
477+
...labelsForRealBehaviorProof(
478+
evaluateRealBehaviorProof({
479+
pullRequest,
480+
}),
481+
),
482+
);
483+
441484
const docsOnly = filenames.every(isMarkdownOrDocsFile);
442485
const docsSignal =
443486
/\b(add|adds|update|updates|fix|fixes|improve|cleanup|clean up|typo|readme|docs?|documentation|translation|translate)\b/i.test(
@@ -718,14 +761,18 @@ async function addMissingLabels(github, context, core, issueNumber, labels, labe
718761

719762
async function applyPullRequestCandidateLabels(github, context, core, pullRequest, labelSet) {
720763
const files = await listPullRequestFiles(github, context, pullRequest);
721-
await addMissingLabels(
722-
github,
723-
context,
724-
core,
725-
pullRequest.number,
726-
classifyPullRequestCandidateLabels(pullRequest, files),
727-
labelSet,
764+
const classifiedLabels = classifyPullRequestCandidateLabels(
765+
{
766+
...pullRequest,
767+
labels: [...labelSet].map((name) => ({ name })),
768+
},
769+
files,
728770
);
771+
const staleProofLabels = proofCandidateLabelValues.filter(
772+
(label) => labelSet.has(label) && !classifiedLabels.includes(label),
773+
);
774+
await removeLabels(github, context, pullRequest.number, staleProofLabels, labelSet);
775+
await addMissingLabels(github, context, core, pullRequest.number, classifiedLabels, labelSet);
729776
}
730777

731778
function isAutomationUser(user, fallbackLogin = "") {
@@ -931,7 +978,9 @@ export async function runBarnacleAutoResponse({ github, context, core = console
931978
const isLabelEvent = context.payload.action === "labeled";
932979
const isPrCandidateEvent =
933980
pullRequest &&
934-
["opened", "edited", "synchronize", "reopened", "labeled"].includes(context.payload.action);
981+
["opened", "edited", "synchronize", "reopened", "labeled", "unlabeled"].includes(
982+
context.payload.action,
983+
);
935984
if (!hasTriggerLabel && !isLabelEvent && !isPrCandidateEvent) {
936985
return;
937986
}
Lines changed: 34 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
#!/usr/bin/env node
2+
import { readFileSync } from "node:fs";
3+
import { evaluateRealBehaviorProof } from "./real-behavior-proof-policy.mjs";
4+
5+
function escapeCommandValue(value) {
6+
return String(value)
7+
.replace(/%/g, "%25")
8+
.replace(/\r/g, "%0D")
9+
.replace(/\n/g, "%0A")
10+
.replace(/:/g, "%3A");
11+
}
12+
13+
const eventPath = process.env.GITHUB_EVENT_PATH;
14+
if (!eventPath) {
15+
console.error("::error title=Real behavior proof failed::GITHUB_EVENT_PATH is not set.");
16+
process.exit(1);
17+
}
18+
19+
const event = JSON.parse(readFileSync(eventPath, "utf8"));
20+
const pullRequest = event.pull_request;
21+
if (!pullRequest) {
22+
console.log("No pull_request payload found; skipping real behavior proof gate.");
23+
process.exit(0);
24+
}
25+
26+
const evaluation = evaluateRealBehaviorProof({ pullRequest });
27+
if (evaluation.passed) {
28+
console.log(evaluation.reason);
29+
process.exit(0);
30+
}
31+
32+
const message = `${evaluation.reason} Add after-fix evidence from a real OpenClaw setup in the PR body. Screenshots, recordings, terminal screenshots, console output, redacted runtime logs, linked artifacts, or copied live output count. Unit tests, mocks, snapshots, lint, typechecks, and CI are supplemental only. A maintainer can apply proof: override when appropriate.`;
33+
console.error(`::error title=Real behavior proof required::${escapeCommandValue(message)}`);
34+
process.exit(1);

0 commit comments

Comments
 (0)