Skip to content

synthetic: enforce single shared root cause for connection leak scenario (009)#942

Merged
muddlebee merged 2 commits intoTracer-Cloud:mainfrom
cerencamkiran:fix-009-shared-root-cause
Apr 28, 2026
Merged

synthetic: enforce single shared root cause for connection leak scenario (009)#942
muddlebee merged 2 commits intoTracer-Cloud:mainfrom
cerencamkiran:fix-009-shared-root-cause

Conversation

@cerencamkiran
Copy link
Copy Markdown
Contributor

Fixes #605

Summary

Improves reasoning quality and validation for scenario 009 (connection exhaustion + CPU saturation).

This scenario requires identifying a single shared root cause (connection pool leak) rather than incorrectly splitting symptoms into independent issues.

Motivation

Previously, the agent could:

  • treat connection exhaustion and CPU saturation as separate problems
  • or fail to link CPU load to leaked connections

This PR ensures:

  • correct causal reasoning (single root cause)
  • stricter validation aligned with the scenario’s intent

Changes

1. Validation refinement

  • adjusted required_keywords to capture core reasoning signals (connection leak, idle sessions, causal linkage)
  • removed overly strict phrases that caused false negatives
  • added forbidden_keywords to prevent:
    • splitting into multiple root causes
    • treating CPU as an independent issue

2. QA documentation

  • added QA_VALIDATION.md
  • explicitly defines:
    • expected reasoning
    • causal chain
    • failure modes

Result

PASS 009-dual-fault-connection-cpu
The agent now:

  • identifies the connection pool leak as the single root cause
  • correctly explains CPU saturation as a downstream effect
  • links both symptoms through a clear causal chain

Repro

python -m tests.synthetic.rds_postgres.run_suite --scenario 009-dual-fault-connection-cpu --mock-grafana

@cerencamkiran
Copy link
Copy Markdown
Contributor Author

BEFORE
Ekran görüntüsü 2026-04-25 133453
Ekran görüntüsü 2026-04-25 133508

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 25, 2026

Greptile Summary

This PR tightens scenario 009's validation by replacing coarse required_keywords with more precise signals (connection pool leak, Client:ClientRead) and introducing forbidden_keywords to prevent the agent from splitting correlated symptoms into independent root causes. A new QA_VALIDATION.md file documents the expected causal chain and passing criteria.

  • P1 — forbidden keyword self-contradiction: independent faults is listed in forbidden_keywords, but the scenario's own model_response causal chain reads \"are not two independent faults\". Because score_result uses plain substring matching (_normalize_text(kw) in normalized_output), an agent that produces this ideal phrasing will trigger a false failure. The same risk exists for contributing cause (a correct response saying "storage is not a contributing cause" would also be penalized). The model response or the forbidden phrase wording should be adjusted so correct negations are not inadvertently caught.

Confidence Score: 4/5

Safe to merge after resolving the substring-matching contradiction between the forbidden keyword 'independent faults' and its appearance in the golden model response's causal chain.

One P1 finding: the golden model_response causal chain contains 'independent faults' verbatim, which is also in forbidden_keywords. Because the scorer uses plain substring matching, an agent that correctly says 'these are NOT independent faults' would be incorrectly penalized. All remaining observations are P2 or lower, so the score is 4 rather than 5.

tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/answer.yml — the forbidden_keywords list and model_response causal chain are logically inconsistent.

Important Files Changed

Filename Overview
tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/answer.yml Adds forbidden_keywords and refines required_keywords for scenario 009; P1 issue: the golden model_response causal chain contains "independent faults" — a substring that appears in forbidden_keywords — so a correct agent response matching the ideal output would fail the forbidden keyword check due to plain substring matching without negation awareness.
tests/synthetic/rds_postgres/009-dual-fault-connection-cpu/QA_VALIDATION.md New documentation file; clearly defines expected reasoning, causal chain, and failure modes for scenario 009; no code issues.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Agent Final State\nroot_cause / validated_claims\nnon_validated_claims / causal_chain] --> B[score_result]
    B --> C{required_keywords\nall present?}
    C -- No --> F1[FAIL: missing keywords]
    C -- Yes --> D{forbidden_categories\nhit?}
    D -- Yes --> F2[FAIL: forbidden category]
    D -- No --> E{forbidden_keywords\nsubstring match?}
    E -- Yes --> F3[FAIL: forbidden keyword\ne.g. 'independent faults'\neven in negation context]
    E -- No --> G{required_evidence\npresent?}
    G -- No --> F4[FAIL: missing evidence]
    G -- Yes --> PASS[PASS]
    style F3 fill:#f66,color:#fff
Loading

Reviews (1): Last reviewed commit: "synthetic: fix shared root cause reasoni..." | Re-trigger Greptile

- CPU
- root
forbidden_keywords:
- independent faults
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Forbidden keyword contradicts golden model response

The term independent faults appears verbatim in this scenario's own model_response causal chain: "The dual signal … are not two independent faults — they are the same leaked-connection fault expressing itself through two metrics." Because score_result uses plain substring matching (_normalize_text(kw) in normalized_output), an agent that produces this ideal phrasing will still trigger the forbidden keyword check and fail the scenario, even though it is reasoning correctly. The same risk applies to contributing cause — a response saying "storage is not a contributing cause" would also be penalized.

The model_response should be updated to avoid these exact phrases, or the forbidden keywords should be reworded to phrases that won't appear in correct negations (e.g., "two separate root causes" or "storage contributes to" rather than bare substrings like "independent faults" and "contributing cause").

@cerencamkiran
Copy link
Copy Markdown
Contributor Author

AFTER
Ekran görüntüsü 2026-04-25 142821
Ekran görüntüsü 2026-04-25 142850

@muddlebee
Copy link
Copy Markdown
Collaborator

@cerencamkiran looks good. could you fix the review comments?

@hamzzaaamalik
Copy link
Copy Markdown
Collaborator

hamzzaaamalik commented Apr 27, 2026

Verified locally:

Greptile P1 is genuinely fixed. Grepped the final model_response for every new forbidden substring (two independent problems, separate root causes, two root causes) zero matches. Earlier independent faults had 1 match in the gold via "not two independent faults"; new phrasing avoids that.
All 6 required_keywords appear in the gold model_response.
QA_VALIDATION.md reads cleanly and matches the YAML rules.
No production code touched.
Nit (not blocking): the matcher in run_suite.py:275 is pure substring "these are not two independent problems" would still false-fail. Worth a follow-up to make it negation-aware, but the new phrases are unusual enough to be a reasonable pragmatic fix.

LGTM.

@cerencamkiran
Copy link
Copy Markdown
Contributor Author

cerencamkiran commented Apr 27, 2026

Thanks for validating this @hamzzaaamalik. I’ll follow up on the matcher and make it negation-aware as a next step.

@cerencamkiran
Copy link
Copy Markdown
Contributor Author

btw @muddlebee, i already fixed them

@muddlebee muddlebee merged commit 5db5049 into Tracer-Cloud:main Apr 28, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

LGTM → Merged. @cerencamkiran, your work is in. Every commit counts — thank you for this one.


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

@muddlebee
Copy link
Copy Markdown
Collaborator

@cerencamkiran thank you so much. Good to see so much of valuable contributions coming from you..

@cerencamkiran
Copy link
Copy Markdown
Contributor Author

Thank you kindly @muddlebee, really appreciate it! I’ve really been enjoying the project and learning a lot while contributing. :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[synthetic-qa] 009-dual-fault-connection-cpu: Validate agent finds single shared root cause

3 participants