synthetic: enforce single shared root cause for connection leak scenario (009) by cerencamkiran · Pull Request #942 · Tracer-Cloud/opensre

cerencamkiran · 2026-04-25T11:16:56Z

Fixes #605

Summary

Improves reasoning quality and validation for scenario 009 (connection exhaustion + CPU saturation).

This scenario requires identifying a single shared root cause (connection pool leak) rather than incorrectly splitting symptoms into independent issues.

Motivation

Previously, the agent could:

treat connection exhaustion and CPU saturation as separate problems
or fail to link CPU load to leaked connections

This PR ensures:

correct causal reasoning (single root cause)
stricter validation aligned with the scenario’s intent

Changes

1. Validation refinement

adjusted required_keywords to capture core reasoning signals (connection leak, idle sessions, causal linkage)
removed overly strict phrases that caused false negatives
added forbidden_keywords to prevent:
- splitting into multiple root causes
- treating CPU as an independent issue

2. QA documentation

added QA_VALIDATION.md
explicitly defines:
- expected reasoning
- causal chain
- failure modes

Result

PASS 009-dual-fault-connection-cpu
The agent now:

identifies the connection pool leak as the single root cause
correctly explains CPU saturation as a downstream effect
links both symptoms through a clear causal chain

Repro

python -m tests.synthetic.rds_postgres.run_suite --scenario 009-dual-fault-connection-cpu --mock-grafana

…o 009

cerencamkiran · 2026-04-25T11:20:01Z

BEFORE

greptile-apps · 2026-04-25T11:20:11Z

Greptile Summary

This PR tightens scenario 009's validation by replacing coarse required_keywords with more precise signals (connection pool leak, Client:ClientRead) and introducing forbidden_keywords to prevent the agent from splitting correlated symptoms into independent root causes. A new QA_VALIDATION.md file documents the expected causal chain and passing criteria.

P1 — forbidden keyword self-contradiction: independent faults is listed in forbidden_keywords, but the scenario's own model_response causal chain reads \"are not two independent faults\". Because score_result uses plain substring matching (_normalize_text(kw) in normalized_output), an agent that produces this ideal phrasing will trigger a false failure. The same risk exists for contributing cause (a correct response saying "storage is not a contributing cause" would also be penalized). The model response or the forbidden phrase wording should be adjusted so correct negations are not inadvertently caught.

Confidence Score: 4/5

Safe to merge after resolving the substring-matching contradiction between the forbidden keyword 'independent faults' and its appearance in the golden model response's causal chain.

One P1 finding: the golden model_response causal chain contains 'independent faults' verbatim, which is also in forbidden_keywords. Because the scorer uses plain substring matching, an agent that correctly says 'these are NOT independent faults' would be incorrectly penalized. All remaining observations are P2 or lower, so the score is 4 rather than 5.

tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/answer.yml — the forbidden_keywords list and model_response causal chain are logically inconsistent.

Important Files Changed

Filename	Overview
tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/answer.yml	Adds forbidden_keywords and refines required_keywords for scenario 009; P1 issue: the golden model_response causal chain contains "independent faults" — a substring that appears in forbidden_keywords — so a correct agent response matching the ideal output would fail the forbidden keyword check due to plain substring matching without negation awareness.
tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/QA_VALIDATION.md	New documentation file; clearly defines expected reasoning, causal chain, and failure modes for scenario 009; no code issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent Final State\nroot_cause / validated_claims\nnon_validated_claims / causal_chain] --> B[score_result]
    B --> C{required_keywords\nall present?}
    C -- No --> F1[FAIL: missing keywords]
    C -- Yes --> D{forbidden_categories\nhit?}
    D -- Yes --> F2[FAIL: forbidden category]
    D -- No --> E{forbidden_keywords\nsubstring match?}
    E -- Yes --> F3[FAIL: forbidden keyword\ne.g. 'independent faults'\neven in negation context]
    E -- No --> G{required_evidence\npresent?}
    G -- No --> F4[FAIL: missing evidence]
    G -- Yes --> PASS[PASS]
    style F3 fill:#f66,color:#fff

_{Reviews (1): Last reviewed commit: "synthetic: fix shared root cause reasoni..." | Re-trigger Greptile}

greptile-apps · 2026-04-25T11:20:15Z

+  - CPU
+  - root
+forbidden_keywords:
+  - independent faults


Forbidden keyword contradicts golden model response

The term independent faults appears verbatim in this scenario's own model_response causal chain: "The dual signal … are not two independent faults — they are the same leaked-connection fault expressing itself through two metrics." Because score_result uses plain substring matching (_normalize_text(kw) in normalized_output), an agent that produces this ideal phrasing will still trigger the forbidden keyword check and fail the scenario, even though it is reasoning correctly. The same risk applies to contributing cause — a response saying "storage is not a contributing cause" would also be penalized.

The model_response should be updated to avoid these exact phrases, or the forbidden keywords should be reworded to phrases that won't appear in correct negations (e.g., "two separate root causes" or "storage contributes to" rather than bare substrings like "independent faults" and "contributing cause").

cerencamkiran · 2026-04-25T11:32:08Z

AFTER

muddlebee · 2026-04-27T15:34:48Z

@cerencamkiran looks good. could you fix the review comments?

hamzzaaamalik · 2026-04-27T15:35:19Z

Verified locally:

Greptile P1 is genuinely fixed. Grepped the final model_response for every new forbidden substring (two independent problems, separate root causes, two root causes) zero matches. Earlier independent faults had 1 match in the gold via "not two independent faults"; new phrasing avoids that.
All 6 required_keywords appear in the gold model_response.
QA_VALIDATION.md reads cleanly and matches the YAML rules.
No production code touched.
Nit (not blocking): the matcher in run_suite.py:275 is pure substring "these are not two independent problems" would still false-fail. Worth a follow-up to make it negation-aware, but the new phrases are unusual enough to be a reasonable pragmatic fix.

LGTM.

cerencamkiran · 2026-04-27T15:45:48Z

Thanks for validating this @hamzzaaamalik. I’ll follow up on the matcher and make it negation-aware as a next step.

cerencamkiran · 2026-04-28T16:31:49Z

btw @muddlebee, i already fixed them

github-actions · 2026-04-28T17:00:09Z

⚡ LGTM → Merged. @cerencamkiran, your work is in. Every commit counts — thank you for this one.

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

muddlebee · 2026-04-28T17:01:06Z

@cerencamkiran thank you so much. Good to see so much of valuable contributions coming from you..

cerencamkiran · 2026-04-28T17:03:51Z

Thank you kindly @muddlebee, really appreciate it! I’ve really been enjoying the project and learning a lot while contributing. :)

synthetic: fix shared root cause reasoning and validation for scenari…

7a0fa88

…o 009

greptile-apps Bot reviewed Apr 25, 2026

View reviewed changes

fix: avoid negation-sensitive forbidden keyword matches

fe4bb18

muddlebee assigned hamzzaaamalik and unassigned hamzzaaamalik Apr 27, 2026

muddlebee merged commit 5db5049 into Tracer-Cloud:main Apr 28, 2026
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

synthetic: enforce single shared root cause for connection leak scenario (009)#942

synthetic: enforce single shared root cause for connection leak scenario (009)#942
muddlebee merged 2 commits intoTracer-Cloud:mainfrom
cerencamkiran:fix-009-shared-root-cause

cerencamkiran commented Apr 25, 2026

Uh oh!

cerencamkiran commented Apr 25, 2026

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Uh oh!

greptile-apps Bot Apr 25, 2026

Uh oh!

cerencamkiran commented Apr 25, 2026

Uh oh!

muddlebee commented Apr 27, 2026

Uh oh!

hamzzaaamalik commented Apr 27, 2026 •

edited

Loading

Uh oh!

cerencamkiran commented Apr 27, 2026 •

edited

Loading

Uh oh!

cerencamkiran commented Apr 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

muddlebee commented Apr 28, 2026

Uh oh!

cerencamkiran commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

cerencamkiran commented Apr 25, 2026

Summary

Motivation

Changes

1. Validation refinement

2. QA documentation

Result

Repro

Uh oh!

cerencamkiran commented Apr 25, 2026

Uh oh!

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

Confidence Score: 4/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 25, 2026

Choose a reason for hiding this comment

Uh oh!

cerencamkiran commented Apr 25, 2026

Uh oh!

muddlebee commented Apr 27, 2026

Uh oh!

hamzzaaamalik commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cerencamkiran commented Apr 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cerencamkiran commented Apr 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented Apr 28, 2026

Uh oh!

muddlebee commented Apr 28, 2026

Uh oh!

cerencamkiran commented Apr 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hamzzaaamalik commented Apr 27, 2026 •

edited

Loading

cerencamkiran commented Apr 27, 2026 •

edited

Loading