Skip to content

Commit 7e9b163

Browse files
authored
Merge branch 'antalya-26.1' into frontport/antalya-26.1/rendezvous_hashing
2 parents 8b0a220 + 18ac1a4 commit 7e9b163

File tree

6 files changed

+328
-4
lines changed

6 files changed

+328
-4
lines changed

.cursor/rules/audit-review.mdc

Lines changed: 145 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,145 @@
1+
---
2+
description: Standardize deep feature-audit output and defect reporting
3+
alwaysApply: true
4+
---
5+
6+
# Feature Audit Reporting Standard
7+
8+
Use this format when the user asks for a deep audit, fault injection, or review of any feature/change.
9+
10+
## Required Output
11+
12+
- Report **confirmed defects only** first.
13+
- Classify each finding as **High**, **Medium**, or **Low**.
14+
- For each finding include:
15+
- short title,
16+
- concrete impact,
17+
- exact file/function reference,
18+
- brief proof sketch tied to code path,
19+
- at least one **code snippet** that demonstrates the defect condition.
20+
- Include an **Assumptions & Limits** section for static reasoning:
21+
- what was not executed at runtime,
22+
- what could not be proven without dynamic testing.
23+
- Include **audit confidence**:
24+
- overall confidence (High/Medium/Low),
25+
- what additional evidence would raise confidence.
26+
27+
## Severity Rubric (Required)
28+
29+
- Use consistent severity scoring with these dimensions:
30+
- impact on correctness/security/availability,
31+
- likelihood under realistic workloads,
32+
- blast radius,
33+
- exploitability (if security-relevant).
34+
- Default guidance:
35+
- **High**: crash/UB/data corruption/auth bypass/deadlock with realistic trigger.
36+
- **Medium**: incorrect behavior or reliability risk requiring narrower preconditions.
37+
- **Low**: diagnostics/consistency/maintainability issues without direct correctness break.
38+
39+
## Required Analysis Dimensions
40+
41+
- For very large PRs, require **functional partitioning** before deep analysis:
42+
- split review scope into functionality/workstream partitions,
43+
- run the full audit loop per partition,
44+
- produce per-partition findings and coverage,
45+
- deduplicate cross-partition findings by root cause,
46+
- end with cross-partition interaction risks and overall summary.
47+
- Compute and document a **call graph** for changed code before defect analysis:
48+
- entrypoints,
49+
- dispatch/validation chain,
50+
- state/cache/storage interactions,
51+
- integration boundaries (network/filesystem/external services),
52+
- error/exception propagation paths.
53+
- Cover full transition flow when relevant:
54+
- entrypoint -> processing -> state updates -> outputs/side effects.
55+
- Include **logical testing of all code paths**:
56+
- enumerate reachable branches in changed logic,
57+
- define expected outcome per branch (success, handled failure, fail-open/fail-closed, exception),
58+
- include malformed input, timeout/integration failure, and concurrency/timing branches.
59+
- Include **fault category planning before injection**:
60+
- define feature-specific logical fault categories from the reviewed code,
61+
- list category scope and key transitions affected,
62+
- then execute fault injection **category by category** and report findings per category.
63+
- Require a **full fault-category completion matrix** on every deep audit:
64+
- include all generated categories,
65+
- process categories one-by-one in explicit order,
66+
- mark each category as Executed / Not Applicable / Deferred,
67+
- record pass/fail outcome and defects found per category,
68+
- provide justification for every Not Applicable or Deferred category.
69+
- Include **invariant-first analysis**:
70+
- define key invariants that must always hold,
71+
- map each critical transition to invariant preservation checks.
72+
- Include **interleaving analysis** for multithreaded paths:
73+
- document at least several plausible thread interleavings for shared-state transitions.
74+
- Include a **transition mapping** from defect to state transition.
75+
- Include a **logical fault-injection mapping** that shows which injected condition triggers each defect.
76+
- Include integration impact checks:
77+
- config/load-time behavior,
78+
- protocol/API behavior,
79+
- concurrency/timing behavior,
80+
- observability/logging behavior.
81+
- Include **coverage accounting**:
82+
- call-graph nodes covered vs not covered,
83+
- transitions reviewed vs not reviewed,
84+
- fault categories executed vs skipped (with reason).
85+
- Include an explicit **coverage stop condition**:
86+
- coverage is complete only when each in-scope call-graph node/transition/category is either reviewed or marked skipped with justification.
87+
- Include **error-contract consistency checks**:
88+
- equivalent faults should have equivalent outcomes where intended (reject/exception/error code).
89+
- Include **performance/resource failure checks**:
90+
- high-cardinality input, memory pressure implications, retry storms, lock contention hotspots.
91+
- Include **rollback/partial-update checks**:
92+
- for each mutation sequence, verify state remains consistent if exceptions/cancellation occur mid-path.
93+
- For C++ codebases, include **major C++ bug-type coverage**:
94+
- memory lifetime/use-after-free/use-after-move,
95+
- iterator/reference invalidation,
96+
- data races and lock-order/deadlock risks,
97+
- exception-safety/partial-update hazards,
98+
- integer overflow/underflow and signedness errors,
99+
- ownership/resource leaks (RAII violations),
100+
- undefined behavior from invalid casts/aliasing/lifetime.
101+
102+
## Guardrails
103+
104+
- Do not mix confirmed defects with hypotheticals.
105+
- Mark uncertain items explicitly as “not confirmed”.
106+
- Use fail-open/fail-closed language for security-sensitive paths when applicable.
107+
- Keep summaries concise and actionable.
108+
- For each confirmed defect, include minimal evidence schema:
109+
- trigger condition,
110+
- affected transition,
111+
- why this is a defect (not just a design preference).
112+
- For each confirmed defect, also include:
113+
- smallest logical reproduction steps,
114+
- likely fix direction (one line),
115+
- regression test direction (one line),
116+
- affected subsystem and blast radius,
117+
- code evidence snippet(s) from the referenced file(s).
118+
- Deduplicate findings by root cause:
119+
- one primary defect per root cause, with secondary manifestations listed under it.
120+
- If no defects are found, explicitly report residual risks and untested paths.
121+
122+
## Canonical Report Order (Required)
123+
124+
1. Scope and partitions (if large PR)
125+
2. Call graph
126+
3. Transition matrix
127+
4. Logical code-path testing summary
128+
5. Fault categories and category-by-category injection results
129+
6. Confirmed defects (High/Medium/Low)
130+
7. Coverage accounting + stop-condition status
131+
8. Assumptions & Limits
132+
9. Confidence rating and confidence-raising evidence
133+
10. Residual risks and untested paths
134+
135+
## Multithreaded DB Priority
136+
137+
For this repository, prioritize concurrency/locking defects early in review because they can cause correctness failures, hangs, and crashes under production load.
138+
139+
- Always check for:
140+
- unsynchronized shared-state access and data races,
141+
- lock-order inversions and deadlock potential,
142+
- iterator/reference invalidation across concurrent mutation,
143+
- exception paths that leave shared state partially updated,
144+
- shutdown/reload races and stale-pointer/lifetime hazards.
145+
- Escalate race/deadlock/crash findings with high severity by default unless strong evidence shows limited impact.
Lines changed: 179 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,179 @@
1+
---
2+
name: audit-review
3+
description: Perform deep feature audits with transition-matrix and logical fault-injection validation. Use when reviewing complex changes, regressions, state-machine behavior, config interactions, API/protocol flows, and concurrency-sensitive logic.
4+
---
5+
6+
# Audit Review
7+
8+
## Purpose
9+
10+
Run a repeatable deep audit for any feature and report confirmed defects with severity.
11+
Default mode is static reasoning unless runtime execution is explicitly performed.
12+
13+
## Workflow
14+
15+
1. If PR scope is large, partition by functionality/workstream first:
16+
- define partitions and boundaries,
17+
- review each partition independently with the full workflow below,
18+
- track per-partition findings and coverage,
19+
- deduplicate cross-partition findings by root cause,
20+
- finish with cross-partition interaction risks.
21+
2. Build call graph first:
22+
- user/system entrypoints (API, RPC, CLI, worker, scheduler)
23+
- dispatch and validation layers
24+
- state/storage/cache interactions
25+
- downstream integrations (network, filesystem, service calls)
26+
- exception and error-propagation paths
27+
3. Build transition matrix:
28+
- request/event entry -> processing stages -> state changes -> outputs/side effects
29+
- define key invariants and annotate where each transition must preserve them
30+
4. Perform logical testing of all code paths:
31+
- enumerate all reachable branches in changed logic,
32+
- record expected branch outcomes (success, handled failure, fail-open/fail-closed, exception),
33+
- include happy path, malformed input, integration timeout/failure, and concurrency/timing branches.
34+
5. Define logical fault categories from the code under review:
35+
- derive categories from actual components, transitions, and dependencies in scope,
36+
- document category boundary and affected states/transitions,
37+
- prioritize categories by risk and blast radius.
38+
6. Run logical fault injection category-by-category:
39+
- execute one category at a time,
40+
- for each category cover success/failure/edge/concurrency paths as applicable,
41+
- record pass/fail-open/fail-closed/exception behavior per injected fault.
42+
- maintain a category completion matrix with status:
43+
- Executed / Not Applicable / Deferred,
44+
- outcome,
45+
- defects found,
46+
- justification for Not Applicable or Deferred.
47+
7. Confirm each finding with code-path evidence.
48+
8. Produce coverage accounting:
49+
- reviewed vs unreviewed call-graph nodes,
50+
- reviewed vs unreviewed transitions,
51+
- executed vs skipped fault categories (with reasons).
52+
- mark coverage complete only when every in-scope node/transition/category is reviewed or explicitly skipped with justification.
53+
9. For multithreaded/shared-state paths, perform interleaving analysis:
54+
- write several plausible thread interleavings per critical transition,
55+
- identify race/deadlock/lifetime hazards per interleaving.
56+
10. For mutation-heavy paths, perform rollback/partial-update analysis:
57+
- reason about exception/cancellation at intermediate points,
58+
- verify state invariants still hold.
59+
60+
## C++ Bug-Type Coverage (Required for C++ audits)
61+
62+
- memory lifetime defects (use-after-free/use-after-move/dangling refs)
63+
- iterator/reference invalidation
64+
- data races and lock-order/deadlock risks
65+
- exception-safety and partial-update rollback hazards
66+
- integer overflow/underflow and signedness conversion bugs
67+
- ownership/resource leaks (RAII violations)
68+
- undefined behavior from invalid casts/aliasing/lifetime misuse
69+
70+
## Multithreaded Database Emphasis
71+
72+
For ClickHouse-style multithreaded systems, prioritize these checks before lower-risk issues:
73+
74+
1. Shared mutable state touched by multiple threads without clear synchronization.
75+
2. Lock hierarchy consistency and potential lock-order inversion/deadlock cycles.
76+
3. Cross-thread lifetime safety (dangling references/pointers after erase/reload/shutdown).
77+
4. Concurrent container mutation + iterator/reference use.
78+
5. Exception/cancellation paths that can leave locks/state inconsistent.
79+
80+
## Output Contract
81+
82+
- Start with confirmed defects only.
83+
- Group by severity: High, Medium, Low.
84+
- For each defect include:
85+
- title,
86+
- impact,
87+
- file/function anchor,
88+
- fault-injection trigger,
89+
- transition mapping,
90+
- why it is a defect (not a design preference),
91+
- smallest logical repro steps,
92+
- likely fix direction (short, concrete: 2-4 bullets or sentences),
93+
- regression test direction (short, concrete: 2-4 bullets or sentences),
94+
- affected subsystem and blast radius,
95+
- at least one code snippet proving the defect.
96+
- Separate “not confirmed” or “needs runtime proof” from confirmed defects.
97+
- Include an **Assumptions & Limits** section for static reasoning.
98+
- Include an overall **confidence rating** and what additional evidence would raise confidence.
99+
- If no defects are found, include residual risks and untested paths.
100+
- For large PRs, include per-partition findings/coverage and final cross-partition risk summary.
101+
- Include a fault-category completion matrix for every deep audit.
102+
103+
### Canonical report order
104+
105+
1. Scope and partitions (if large PR)
106+
2. Call graph
107+
3. Transition matrix
108+
4. Logical code-path testing summary
109+
5. Fault categories and category-by-category injection results
110+
6. Confirmed defects (High/Medium/Low)
111+
7. Coverage accounting + stop-condition status
112+
8. Assumptions & Limits
113+
9. Confidence rating and confidence-raising evidence
114+
10. Residual risks and untested paths
115+
116+
## Standard Audit Report Template (Default: Pointed PR Style)
117+
118+
Default report style should match concise PR review comments:
119+
- fail-first and action-oriented,
120+
- only confirmed defects (no pass-by-pass narrative),
121+
- one short summary line when there are no confirmed defects.
122+
123+
Use the compact template below by default. Use the full 10-section canonical format only when explicitly requested.
124+
125+
```markdown
126+
Audit update for PR #<id> (<short title/scope>):
127+
128+
Confirmed defects:
129+
130+
- **<Severity>: <short defect title>**
131+
- Impact: <concrete user/system impact>
132+
- Anchor: `<file>` / `<function or code path>`
133+
- Trigger: <smallest condition that triggers defect>
134+
- Why defect: <1-2 lines, behavior not preference>
135+
- Fix direction (short): <2-4 bullets or sentences>
136+
- Regression test direction (short): <2-4 bullets or sentences including positive and edge/failure cases>
137+
- Evidence:
138+
```start:end:path
139+
// minimal proving snippet
140+
```
141+
142+
<repeat per defect, sorted High -> Medium -> Low>
143+
144+
Coverage summary:
145+
- Scope reviewed: <partitions or key areas, one line>
146+
- Categories failed: <count/list>
147+
- Categories passed: <count only>
148+
- Assumptions/limits: <one line>
149+
```
150+
151+
## Severity Rubric
152+
153+
- High: realistic trigger can cause crash/UB/data corruption/auth bypass/deadlock.
154+
- Medium: correctness/reliability issue with narrower trigger conditions.
155+
- Low: diagnostics/consistency issues without direct correctness break.
156+
157+
## Checklist
158+
159+
- Verify call graph is explicitly documented before defect analysis.
160+
- Verify invariants are explicitly listed and checked against transitions.
161+
- Verify fail-open vs fail-closed behavior where security-sensitive.
162+
- Verify logical branch coverage for all changed code paths.
163+
- Verify fault categories are explicitly defined from the reviewed code before injection starts.
164+
- Verify category-by-category execution and reporting completeness.
165+
- Verify full fault-category completion matrix is present and complete.
166+
- Verify concurrency and cache/state transition paths.
167+
- Verify multithreaded interleavings are explicitly analyzed for critical shared-state paths.
168+
- Verify rollback/partial-update safety under exception/cancellation points.
169+
- Verify major C++ bug classes are explicitly covered (or marked not applicable).
170+
- Verify race/deadlock/crash class defects are prioritized and explicitly reported.
171+
- Verify error-contract consistency across equivalent fault paths.
172+
- Verify performance/resource failure classes were considered.
173+
- Verify findings are deduplicated by root cause.
174+
- Verify coverage accounting is present (covered vs skipped with reason).
175+
- Verify stop-condition criteria for coverage completion are explicitly satisfied.
176+
- Verify every confirmed defect includes code evidence snippets.
177+
- Verify parser/config/runtime consistency.
178+
- Verify protocol/API parity across entrypoints.
179+
- Verify no sensitive-data leakage in logs/errors.

.github/workflows/merge_queue.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -182,7 +182,7 @@ jobs:
182182
fi
183183
184184
fast_test:
185-
runs-on: [self-hosted, altinity-on-demand, altinity-func-tester]
185+
runs-on: [self-hosted, altinity-on-demand, altinity-builder]
186186
needs: [config_workflow, dockers_build_amd, dockers_build_arm]
187187
if: ${{ !cancelled() && !contains(needs.*.outputs.pipeline_status, 'failure') && !contains(needs.*.outputs.pipeline_status, 'undefined') && !contains(fromJson(needs.config_workflow.outputs.data).workflow_config.cache_success_base64, 'RmFzdCB0ZXN0') }}
188188
name: "Fast test"

.github/workflows/pull_request.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -241,7 +241,7 @@ jobs:
241241
fi
242242
243243
fast_test:
244-
runs-on: [self-hosted, altinity-on-demand, altinity-func-tester]
244+
runs-on: [self-hosted, altinity-on-demand, altinity-builder]
245245
needs: [config_workflow, dockers_build_amd, dockers_build_arm, dockers_build_multiplatform_manifest]
246246
if: ${{ !cancelled() && !contains(needs.*.outputs.pipeline_status, 'failure') && !contains(needs.*.outputs.pipeline_status, 'undefined') && !contains(fromJson(needs.config_workflow.outputs.data).workflow_config.cache_success_base64, 'RmFzdCB0ZXN0') }}
247247
name: "Fast test"

.github/workflows/pull_request_community.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -87,7 +87,7 @@ jobs:
8787
fi
8888
8989
fast_test:
90-
runs-on: [self-hosted, altinity-on-demand, altinity-func-tester]
90+
runs-on: [self-hosted, altinity-on-demand, altinity-builder]
9191
needs: [config_workflow]
9292
if: ${{ !cancelled() && !contains(needs.*.outputs.pipeline_status, 'failure') && !contains(needs.*.outputs.pipeline_status, 'undefined') && !contains(fromJson(needs.config_workflow.outputs.data).workflow_config.cache_success_base64, 'RmFzdCB0ZXN0') }}
9393
name: "Fast test"

ci/defs/job_configs.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -137,7 +137,7 @@ class JobConfigs:
137137
)
138138
fast_test = Job.Config(
139139
name=JobNames.FAST_TEST,
140-
runs_on=RunnerLabels.AMD_LARGE,
140+
runs_on=RunnerLabels.BUILDER_AMD,
141141
command="python3 ./ci/jobs/fast_test.py",
142142
# --network=host required for ec2 metadata http endpoint to work
143143
# --root/--privileged/--cgroupns=host is required for clickhouse-test --memory-limit

0 commit comments

Comments
 (0)