synthetic: enforce single shared root cause for connection leak scenario (009)#942
Conversation
Greptile SummaryThis PR tightens scenario 009's validation by replacing coarse
Confidence Score: 4/5Safe to merge after resolving the substring-matching contradiction between the forbidden keyword 'independent faults' and its appearance in the golden model response's causal chain. One P1 finding: the golden model_response causal chain contains 'independent faults' verbatim, which is also in forbidden_keywords. Because the scorer uses plain substring matching, an agent that correctly says 'these are NOT independent faults' would be incorrectly penalized. All remaining observations are P2 or lower, so the score is 4 rather than 5. tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/answer.yml — the forbidden_keywords list and model_response causal chain are logically inconsistent. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[Agent Final State\nroot_cause / validated_claims\nnon_validated_claims / causal_chain] --> B[score_result]
B --> C{required_keywords\nall present?}
C -- No --> F1[FAIL: missing keywords]
C -- Yes --> D{forbidden_categories\nhit?}
D -- Yes --> F2[FAIL: forbidden category]
D -- No --> E{forbidden_keywords\nsubstring match?}
E -- Yes --> F3[FAIL: forbidden keyword\ne.g. 'independent faults'\neven in negation context]
E -- No --> G{required_evidence\npresent?}
G -- No --> F4[FAIL: missing evidence]
G -- Yes --> PASS[PASS]
style F3 fill:#f66,color:#fff
Reviews (1): Last reviewed commit: "synthetic: fix shared root cause reasoni..." | Re-trigger Greptile |
| - CPU | ||
| - root | ||
| forbidden_keywords: | ||
| - independent faults |
There was a problem hiding this comment.
Forbidden keyword contradicts golden model response
The term independent faults appears verbatim in this scenario's own model_response causal chain: "The dual signal … are not two independent faults — they are the same leaked-connection fault expressing itself through two metrics." Because score_result uses plain substring matching (_normalize_text(kw) in normalized_output), an agent that produces this ideal phrasing will still trigger the forbidden keyword check and fail the scenario, even though it is reasoning correctly. The same risk applies to contributing cause — a response saying "storage is not a contributing cause" would also be penalized.
The model_response should be updated to avoid these exact phrases, or the forbidden keywords should be reworded to phrases that won't appear in correct negations (e.g., "two separate root causes" or "storage contributes to" rather than bare substrings like "independent faults" and "contributing cause").
|
@cerencamkiran looks good. could you fix the review comments? |
|
Verified locally: Greptile P1 is genuinely fixed. Grepped the final model_response for every new forbidden substring (two independent problems, separate root causes, two root causes) zero matches. Earlier independent faults had 1 match in the gold via "not two independent faults"; new phrasing avoids that. LGTM. |
|
Thanks for validating this @hamzzaaamalik. I’ll follow up on the matcher and make it negation-aware as a next step. |
|
btw @muddlebee, i already fixed them |
|
⚡ LGTM → Merged. @cerencamkiran, your work is in. Every commit counts — thank you for this one. 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |
|
@cerencamkiran thank you so much. Good to see so much of valuable contributions coming from you.. |
|
Thank you kindly @muddlebee, really appreciate it! I’ve really been enjoying the project and learning a lot while contributing. :) |




Fixes #605
Summary
Improves reasoning quality and validation for scenario 009 (connection exhaustion + CPU saturation).
This scenario requires identifying a single shared root cause (connection pool leak) rather than incorrectly splitting symptoms into independent issues.
Motivation
Previously, the agent could:
This PR ensures:
Changes
1. Validation refinement
required_keywordsto capture core reasoning signals (connection leak, idle sessions, causal linkage)forbidden_keywordsto prevent:2. QA documentation
QA_VALIDATION.mdResult
PASS 009-dual-fault-connection-cpu
The agent now:
Repro