fix(guardrails): redact union of overlapping spans, never leak sensitive bookends by mayankbharati-ops · Pull Request #780 · Tracer-Cloud/opensre

mayankbharati-ops · 2026-04-23T05:15:59Z

Summary

GuardrailEngine.apply walked sorted redact matches right-to-left with a single seen_end cursor and skipped any match whose end exceeded that cursor. When a wider match overlapped — or fully contained — an already-redacted narrower match, the wider match was silently dropped and its prefix / suffix survived in the output.

Concretely, with two rules (wide: super_secret_token_value, inner: secret_token):

	Output
Before (`main`)	`\"data super_[REDACTED:short]_value end\"`
After (this PR)	`\"data [REDACTED:long] end\"`

The super_ and _value fragments leaked into the output. PR #520's earlier fix handled only the same-start subcase; matches with different start offsets that fully contained a narrower match still slipped through.

Affects the shipped starter rules. With _STARTER_CONFIG in app/guardrails/cli.py, text like config: api_key=AKIAIOSFODNN7EXAMPLE triggers both aws_access_key ([16:36]) and generic_api_token ([8:36]). Pre-fix the prefix api_key= leaked; after, the whole span is redacted. Same for aws_secret_access_key=AKIA...xxxxx....

Fix

Replace the single-cursor walk with the canonical interval-merge algorithm:

Sort redact matches by (start ASC, -end) so same-start ties put the widest match first.
Sweep left-to-right, merging any match whose start < merged[-1].end. Representative rule for the merged span is the contributing match with the largest original width.
Apply replacements right-to-left so string indices stay valid.

Adjacent matches (A.end == B.start) intentionally do not merge. Disjoint matches produce independent redactions. Audit logging is unchanged — match-level entries are still emitted per ScanMatch even when output redactions merge.

Algorithm matches the standard merge-intervals pattern (see Merge Intervals - LeetCode 56).

Deep verification

Algorithm correctness — adversarial audit (10/11 PASS, 1 was a test bug)

Constructed every nasty edge case I could think of:

Case	Result
Empty input	PASS
Single match at start / end / spanning entire text	PASS
Same-span same-rule duplicate via pattern + keyword	PASS
Nested 3 levels deep	PASS
Staircase overlaps (each next overlaps only the prior)	PASS
197 partially-overlapping rules on 1000-char text	PASS, 8.6 ms
1000 disjoint same-keyword matches	PASS, 1.1 ms
Adjacent matches (A.end == B.start) stay separate	PASS
Disjoint with 10000-char gap	PASS
Match at exact text boundary (start=0 or end=len)	PASS
Empty `replacement` falls back to `[REDACTED:name]`	PASS (was test bug, not engine bug)

Security audit — bypass attempts (11/11 SAFE)

Scenario	Result
Custom replacement containing another rule's keyword	SAFE — replacements not re-scanned (documented invariant)
Block + redact overlap	SAFE — block correctly raises `GuardrailBlockedError`
Audit-only rule overlapping a redact rule	SAFE — redact still applied
Disabled rule does not bypass active rule	SAFE
Replacement string mentions another rule's secret	SAFE
10-deep nested matches sharing the same secret	SAFE — outermost merges all
Empty rule list	SAFE (passthrough; no protection promised)
Brackets `[X]` in replacement don't confuse downstream	SAFE

Determinism

Same-input → same-output across repeated runs ✓
get_guardrail_engine() singleton is stable ✓
Tied-width same-span overlap uses first-defined rule name — identical to main's behavior, no regression

E2E validation — full synthetic suites, real Anthropic LLM

HEALTHY_SHORT_CIRCUIT=true, mocked backends, real claude-sonnet-4-6 reasoning + claude-haiku-4-5 toolcall, on this branch.

Suite	Result	Notes
EKS (14 scenarios)	11 PASS, 3 FAIL	All 3 failures `state: alerting` LLM categorization drift — pre-existing on `main`, identical to PR #777's verification
RDS (15 scenarios)	11 PASS, 4 FAIL	All 4 failures `state: alerting`, same pre-existing drift

Confirmed every failure is in state: alerting (programmatic check on each alert.json). The synthetic fixtures don't contain overlapping-rule secrets, so the failures are in territory this PR does not touch.

Unit + integration coverage — 59 tests, 91 in `tests/test_guardrails/` total

Engine-level (46 tests in `test_engine.py`)

36 pre-existing
10 new regression tests: contained-span union (the core regression), longest-rule-name on contained, partial overlap, disjoint, adjacent, three-way transitive overlap, pattern+keyword mixed, two _STARTER_CONFIG-grounded production-rule tests, audit-under-merge invariant.

Integration-level (13 tests in `test_llm_integration.py`)

8 pre-existing
5 new tests — one per call site of engine.apply(), asserting downstream payload contains no secret fragment:
- AnthropicLLMClient.invoke → captured at Anthropic.messages.create
- AnthropicLLMClient.invoke → captured system kwarg
- OpenAILLMClient.invoke → captured at chat.completions.create
- _apply_guardrails_to_messages (LangGraph chat node) — also asserts source untouched (defensive copy)
- End-to-end regression guard with shipped-config patterns

BedrockLLMClient.invoke shares its redaction code path with AnthropicLLMClient.invoke and is therefore covered by the same logic.

Verification — final state

CI (Python 3.13, post-fix: emit healthy-short-circuit claims from investigation keys only #777-merge main): test (ubuntu-latest) PASS, typecheck PASS, quality PASS, CodeQL PASS, Analyze (python) PASS. (Re-running on rebased branch.)
Local pytest tests/ (Python 3.14, no coverage): 3124 pass, 1 skipped, 1 pre-existing failure in tests/nodes/plan_actions/ (unrelated — _InputStub missing model_copy, present on pristine main)
pytest tests/test_guardrails/: 91 pass, 0 fail
ruff check app/ tests/: clean
ruff format --check on touched files: clean
mypy app/guardrails/: clean on touched module

Scope

File	Change
`app/guardrails/engine.py`	~40 line replacement of redaction loop
`tests/test_guardrails/test_engine.py`	10 new tests
`tests/test_guardrails/test_llm_integration.py`	5 new tests

No public API changes. No state-shape changes. No behavior change for non-overlapping matches, audit-only rules, or blocked rules. Directly extends #520, whose same-start case is preserved verbatim.

Rebased onto current main (post-#777). No file overlap with #777.

Security note

The bug class is sensitive-data disclosure. Anywhere guardrail rules filter outputs shown to users, logged, or exported, two overlapping rules covering the same secret could each partially redact and leave fragments in the wild. The shipped _STARTER_CONFIG demonstrates this concretely — api_key=AKIA... and aws_secret_access_key=... both produced partial redactions on main. This PR closes the gap.

greptile-apps · 2026-04-23T05:19:53Z

Greptile Summary

This PR fixes a span-leak bug in GuardrailEngine._redact where overlapping redaction rules (e.g. a wide keyword fully containing a narrower keyword) would silently drop the wider rule's match, leaving its prefix and suffix unredacted in the output. The fix replaces the old single-cursor right-to-left walk with a proper interval merge: sort → left-to-right sweep to merge overlapping spans (tracking the largest individual match width to select the representative rule name) → right-to-left application of replacements to preserve string indices.

Key changes:

apply delegates to a new _redact(text, matches) method, improving testability and separation of concerns.
The sort key changes from reverse=True on (start, end) to (start ASC, -end) — preserving the "widest-match-first at same start" behavior from PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520 while enabling a correct forward sweep.
The seen_end cursor and skip logic are replaced with an O(n) interval merge that handles all three overlap classes: containment, partial overlap, and the transitive chain A∩B, B∩C, A⊥C.
7 new regression tests cover all relevant sub-cases; the 36 existing guardrail tests are unaffected.

Confidence Score: 5/5

Safe to merge — the interval merge algorithm is correct, well-documented, and backed by 7 targeted regression tests covering all overlap sub-cases, with zero regressions in the existing suite.

The core fix (left-to-right interval merge replacing the buggy seen_end cursor) is algorithmically correct: overlapping spans are collapsed, the widest individual match wins the representative rule name, adjacent spans remain separate, and right-to-left application preserves string indices. All edge cases (containment, partial overlap, transitive chain, disjoint, adjacent, regex+keyword mix) have dedicated tests. Code follows project style conventions. The only finding is a non-blocking P2 style suggestion to use a NamedTuple instead of a raw 4-tuple for the merged interval list.

No files require special attention.

Important Files Changed

Filename	Overview
app/guardrails/engine.py	Extracts `_redact` into a dedicated method that performs correct interval merging; the old single-cursor right-to-left walk is fully replaced. Algorithm is sound and well-documented.
tests/test_guardrails/test_engine.py	Adds 7 targeted regression tests covering contained spans, partial overlap, disjoint, adjacent, three-way chain, and pattern-vs-keyword cases. All new tests are well-structured and use the existing `_rule` helper correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["apply(text)"] --> B["scan(text) → ScanResult"]
    B --> C{any matches?}
    C -- No --> D[return text unchanged]
    C -- Yes --> E[audit each match]
    E --> F{blocked?}
    F -- Yes --> G[raise GuardrailBlockedError]
    F -- No --> H["_redact(text, matches)"]
    H --> I["Filter: action == REDACT\nSort by (start ASC, -end)"]
    I --> J["Left-to-right sweep\nmerge overlapping intervals\ntrack widest individual match\nfor rule-name selection"]
    J --> K["Right-to-left application\nof replacements over\nmerged intervals"]
    K --> L[return redacted text]
    style G fill:#f66,color:#fff
    style L fill:#6c6,color:#fff

_{Reviews (1): Last reviewed commit: "fix(guardrails): redact union of overlap..." | Re-trigger Greptile}

greptile-apps · 2026-04-23T05:19:57Z

+        merged: list[tuple[int, int, str, int]] = []
        for match in redact_matches:
-            if match.end > seen_end:
-                continue
-            replacement = self._get_replacement(match.rule_name)
-            redacted = redacted[: match.start] + replacement + redacted[match.end :]
-            seen_end = match.start
+            width = match.end - match.start
+            if merged and match.start < merged[-1][1]:
+                prev_start, prev_end, prev_rule, prev_width = merged[-1]
+                new_end = max(prev_end, match.end)
+                if width > prev_width:
+                    merged[-1] = (prev_start, new_end, match.rule_name, width)
+                else:
+                    merged[-1] = (prev_start, new_end, prev_rule, prev_width)
+            else:
+                merged.append((match.start, match.end, match.rule_name, width))


Consider a NamedTuple for the merged interval for readability

The inner loop reads and writes the merged list by positional tuple index (merged[-1][1], merged[-1][3], etc.). The positions are correct, but the tuple layout (start, end, rule_name, width) is entirely implicit — a reader must mentally map each index to its meaning.

A lightweight NamedTuple would make accesses self-documenting with zero runtime overhead:

from typing import NamedTuple class _MergedInterval(NamedTuple): start: int end: int rule_name: str width: int

Then the sweep body becomes:

merged: list[_MergedInterval] = [] for match in redact_matches: width = match.end - match.start if merged and match.start < merged[-1].end: prev = merged[-1] new_end = max(prev.end, match.end) if width > prev.width: merged[-1] = _MergedInterval(prev.start, new_end, match.rule_name, width) else: merged[-1] = _MergedInterval(prev.start, new_end, prev.rule_name, prev.width) else: merged.append(_MergedInterval(match.start, match.end, match.rule_name, width))

This is a non-blocking suggestion; the current code is functionally correct.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

mayankbharati-ops · 2026-04-25T07:22:43Z

Heads up: the test (ubuntu-latest) CI failure on this PR is a pre-existing bug on main, not introduced here.

tests/nodes/plan_actions/test_node_plan_actions.py::test_node_plan_actions_emits_retrieval_controls raises AttributeError: '_InputStub' object has no attribute 'model_copy' because PR #807 added a model_copy / model_dump round-trip in app/nodes/plan_actions/node.py:55 but did not update the test stub. The same failure reproduces on a clean checkout of main with no other changes.

I opened #929 with the minimal stub repair (14 lines in the test file, no production code touched). Once that lands, I'll rebase this PR — the rest of CI on this branch is green (typecheck, quality, CodeQL, Analyze (python)).

…ive bookends ``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a single ``seen_end`` cursor and skipped any match whose end exceeded the cursor. When a wider match overlapped or contained an already-redacted narrower match, the wider match was silently dropped — leaving its prefix and suffix in the output. Concretely, with rules matching ``super_secret_token_value`` (wide) and ``secret_token`` (contained): input: "data super_secret_token_value end" before: "data super_[REDACTED:short]_value end" ← super_ and _value leak after: "data [REDACTED:long] end" The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the secondary sort key; matches that started at different offsets but fully contained a narrower match still slipped through. Replace the single-cursor walk with a proper merge: 1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset keep the longest-keyword-wins behavior Tracer-Cloud#520 restored. 2. Sweep left-to-right, merging any match whose ``start`` falls before the current interval's ``end``. The representative rule for the merged interval is whichever contributing match had the largest original width, so the output reflects the most-specific rule that matched anywhere in the merged span. 3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output. Adjacent matches (``A.end == B.start``) are intentionally not merged — they don't overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Tests added in ``tests/test_guardrails/test_engine.py``: - contained-span union redaction (the regression from the bug report) - longest-rule-name semantics on a contained span - partial (non-containing) overlap produces one union redaction - disjoint matches stay separate, middle text preserved - adjacent matches stay separate (touching at one offset, not overlapping) - three-way transitive overlap chain collapses to one span - regex pattern + keyword on the same region: wider-wins applies across match kinds Existing coverage preserved: - ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case - ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap - ``test_multiple_rules_on_same_span`` — audit + redact on same span Verification: - ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module - Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail

Extend the overlap regression tests with two cases grounded in the exact patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an audit-behavior assertion that demonstrates the match-level audit record remains complete even when the output-level redactions are merged: - test_real_world_api_key_and_aws_access_key_overlap: config: api_key=AKIAIOSFODNN7EXAMPLE aws_access_key pattern matches the [16:36] span; generic_api_token pattern matches the [8:36] span (contains aws_access_key). Pre-fix output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]". - test_real_world_aws_secret_key_contains_aws_access_key: export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst aws_access_key matches [29:49]; aws_secret_key matches [7:69] and contains it. Pre-fix output leaked the label and the 40-char tail. Merged output: "export [REDACTED:aws_secret_key]". - test_audit_logger_records_every_match_even_when_merged: Confirms the AuditLogger still emits one entry per ScanMatch even when multiple matches merge into a single output redaction — a reviewer tracing audit -> output can still account for every rule that fired. Uses a real AuditLogger with a tmp_path JSONL. Verification: - pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3) - pytest tests/: 2777 pass, 1 skipped, 0 failures - ruff check + format: clean

Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the four call sites that invoke it in the pipeline: - ``app/services/llm_client.py::LLMClient.invoke`` (Anthropic messages + system) - ``app/services/llm_client.py::BedrockLLMClient.invoke`` (same shape as Anthropic) - ``app/services/llm_client.py::OpenAILLMClient.invoke`` (OpenAI chat.completions) - ``app/nodes/chat.py::_apply_guardrails_to_messages`` (LangGraph chat node) Five new parametrized integration tests load a ruleset modeled on the shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of ``generic_api_token`` when the token value is itself an AWS key — and assert that the downstream payload (captured by a stub that replaces the real API client) contains no fragment of either the label prefix or the key value: - ``test_anthropic_client_sends_merged_redaction``: ``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is captured at ``Anthropic.messages.create`` and asserted to carry a single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or ``AKIA`` fragment. - ``test_anthropic_system_prompt_also_redacted``: Same shape, but asserts the ``system`` kwarg (distinct code path from per-message ``content``) gets the merged treatment. - ``test_openai_client_sends_merged_redaction``: Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``. - ``test_chat_node_emits_merged_redaction``: ``_apply_guardrails_to_messages`` on a LangGraph-style message list, asserts original is untouched (defensive copy) and the returned message has merged redaction. - ``test_contained_real_secret_fully_redacted_in_pipeline``: End-to-end regression guard. Asserts none of ``api_key=``, ``AKIA``, ``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload. Pre-fix main leaks ``api_key=`` and the surrounding characters. Verification: - ``pytest tests/``: 2782 pass (was 2777; +5 integration) - ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module

Address Greptile P2 readability suggestion on PR Tracer-Cloud#780. The interval-merge loop in ``GuardrailEngine._redact`` previously read and wrote the merged list as raw 4-tuples accessed by positional index (``merged[-1][1]``, ``merged[-1][3]``, etc.), which obscured what each slot meant. Replace the raw tuple with a typed ``_MergedSpan`` NamedTuple carrying named fields (``start``, ``end``, ``rule_name``, ``width``) so the loop reads naturally: merged[-1].end # was merged[-1][1] prev.width # was prev_width _MergedSpan(start, ...) # was (start, ..., width) No behavior change — same algorithm, same outputs, same 91 tests pass. Verification: - ``pytest tests/test_guardrails/``: 91 pass, 0 fail (unchanged) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module

yashksaini-coder

This is a well-engineered security fix. Here's what I verified:

Algorithm correctness — the interval merge is textbook correct:

Sort by (start ASC, -end) puts widest match first at same start offset, preserving the behavior from #520 while enabling a correct forward sweep
Merge condition match.start < merged[-1].end correctly catches containment and partial overlap; adjacent spans (A.end == B.start) intentionally stay separate ✅
max(prev.end, match.end) handles partial overlap correctly; width comparison picks the widest individual match's rule name for the representative label ✅
Right-to-left application over non-overlapping merged intervals keeps string indices valid throughout ✅
Empty match list → empty merged → text returned unchanged ✅

I traced through all adversarial shapes (3-way transitive, containment, same-span duplicate, widest-first) — all produce correct output.

Tests — 91/91 pass. The 10 new engine regression tests cover every overlap subclass that the old seen_end cursor got wrong. The 5 LLM integration tests asserting no secret fragment reaches an API call are the right layer to test — good defensive coverage.

Security impact — this closes a real data-disclosure gap. With the shipped _STARTER_CONFIG, api_key=AKIA... and aws_secret_access_key=... both triggered two rules simultaneously and the prefix/suffix bookends leaked on main. Fixed.

CI — all checks green: test, typecheck, quality, CodeQL, Analyze (python). The pre-existing _InputStub failure the author called out was fixed by #929 which is already on main. The branch is clean and mergeable.

No issues. Excellent work.

muddlebee · 2026-04-28T05:00:55Z

@mayankbharati-ops still trying to understand the context for this change? could you help

mayankbharati-ops · 2026-04-28T06:12:55Z

@muddlebee — happy to walk through it. The TL;DR is a sensitive-data leak in GuardrailEngine's redaction pass when two rules overlap on the same span — including with the rules we ship in _STARTER_CONFIG. I just re-ran a live repro on main to confirm everything below.

1. The bug, in one screenshot of output

I ran the exact same scenarios through GuardrailEngine.apply on main (current HEAD) and on this branch:

	input	`main` (current shipping behavior)	this PR
Synthetic	`data super_secret_token_value end`	`'data super_[REDACTED:short]_value end'` ❌	`'data [REDACTED:long] end'` ✅
Shipping `_STARTER_CONFIG`	`config: api_key=AKIAIOSFODNN7EXAMPLE rest`	`'config: api_key=[REDACTED:aws_access_key] rest'` ❌	`'config: [REDACTED:generic_api_token] rest'` ✅
3-way transitive	`abcdefghijklmno tail`	`'[REDACTED:a]h[REDACTED:c] tail'` ❌	`'[REDACTED:a] tail'` ✅
Disjoint (regression check)	`foo middle bar`	`'[REDACTED:a] middle [REDACTED:b]'` ✅	`'[REDACTED:a] middle [REDACTED:b]'` ✅
Adjacent (regression check)	`foobar tail`	`'[REDACTED:a][REDACTED:b] tail'` ✅	`'[REDACTED:a][REDACTED:b] tail'` ✅

Row 2 is the worst one: with app/guardrails/cli.py:_STARTER_CONFIG (the rules shipped to every user via opensre guardrails init), the literal substring api_key= survives in any text we redact and forward to an LLM — because aws_access_key (matches AKIA… only, span [16:36]) and generic_api_token (matches api_key=AKIA…, span [8:36]) both fire, and main's loop drops the wider one. Same shape with aws_secret_access_key=AKIA…xxxxx….

2. Why `main` does this — the algorithm

GuardrailEngine.apply had this redact loop on main (engine.py:127-138):

redact_matches = sorted(matches_with_REDACT_action, key=(start, end), reverse=True)
seen_end = len(text)
for match in redact_matches:
    if match.end > seen_end:        # ← drops the wider match
        continue
    redacted = redacted[:match.start] + replacement + redacted[match.end:]
    seen_end = match.start

It walks right-to-left with a single seen_end cursor and silently skips any match whose end exceeds the cursor. PR #520 fixed this for same-start overlaps (longest-keyword-wins) but matches with different start offsets that fully contain a narrower match still slip through — the wider match's prefix and suffix are never written over.

3. The fix — canonical interval merge

engine.py change replaces the cursor walk with the standard merge-intervals algorithm:

Sort redact matches by (start ASC, -end) so same-start ties put the widest match first (preserves PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520's behavior).
Sweep left-to-right. Merge any match whose start < merged[-1].end. Track the contributing match with the largest original width — that one wins the representative rule_name for the merged span.
Apply replacements right-to-left over the merged intervals so string indices stay valid.

Adjacent matches (A.end == B.start) intentionally do not merge — verified by the test_adjacent_matches_not_merged test. Disjoint matches stay independent. Audit logging is unchanged: every individual ScanMatch still emits its own audit entry; we only collapse the output redactions.

(The merged-span list uses a _MergedSpan: NamedTuple — addressing the Greptile P2 nit on the original commit; this is what d26c631 refactor(guardrails): use NamedTuple for merged-span records does.)

4. Test coverage I added on top

Layer	What it pins
`tests/test_guardrails/test_engine.py` (+10 tests)	contained span (the core regression), longest-rule-name on contained, partial overlap, disjoint, adjacent, three-way transitive `A∩B∩C`, pattern + keyword on the same span, two `_STARTER_CONFIG`-grounded production-rule tests, audit-under-merge invariant
`tests/test_guardrails/test_llm_integration.py` (+5 tests)	one per call site of `engine.apply` — that the merged redacted text actually reaches downstream APIs: `Anthropic.messages.create` body, `Anthropic` `system` kwarg, `OpenAI.chat.completions.create` body, the LangGraph chat node `_apply_guardrails_to_messages` (also asserts the source list is not mutated — defensive copy invariant), and an end-to-end regression guard with the shipped-config patterns. `BedrockLLMClient` shares Anthropic's redaction code path so it's covered transitively.

Live verification just now: 91/91 tests in tests/test_guardrails/ pass on this branch. CI on the PR is green: test, typecheck, quality, CodeQL, Analyze (python). Approved by @yashksaini-coder on 2026-04-25 with an independent algorithm walkthrough.

5. Why this is worth merging

The class of bug is sensitive-data disclosure to an LLM provider (Anthropic / OpenAI / Bedrock). Anywhere a Tracer user enables guardrails over their default _STARTER_CONFIG, two rules covering the same secret will leak the prefix/suffix bookends into every outgoing prompt, system message, and chat node. This PR makes the redaction step actually idempotent over overlapping rule sets.

Happy to step through any specific scenario on a call or in a screenshare if any part of the merge algorithm itself is the unclear bit — that's the part most worth eyeballing carefully.

muddlebee

Pls take a look.

muddlebee · 2026-04-28T07:34:46Z

+    """
+
+    start: int
+    end: int


The width field tracks the original single-match width, not the current merged span's width (end - start). In a chained-overlap scenario (A merges B, then C overlaps the grown span), the rule-name winner is whichever individual source match had the largest original width — not whichever rule covered the widest post-merge range. This is intentional and documented in the class docstring, but the field name width alone is ambiguous. Renaming to source_width (or adding a short note in the docstring: "original match width before any merging") would prevent the next reader from assuming it equals end - start.

muddlebee · 2026-04-28T07:34:55Z

+        assert "IOSFODNN7EXAMPLE" not in content
+        # Representative rule is the wider one (generic_api_token).
+        assert "[REDACTED:generic_api_token]" in content
+        assert "[REDACTED:aws_access_key]" not in content


The _FakeMessages and _FakeClient inner classes are duplicated verbatim in all five tests in this class. If the fake's interface ever changes (e.g., to return a streaming response type) all five copies need updating in sync — a common source of drift. A shared @pytest.fixture returning a (client, captured) tuple would eliminate the duplication and keep this test class under ~150 lines.

muddlebee · 2026-04-28T07:35:05Z

+        assert "api_key=" not in captured["system"]
+        assert "AKIA" not in captured["system"]
+        assert "[REDACTED:generic_api_token]" in captured["system"]
+


The comment in this test admits the system-prompt injection path is awkward (passing a {"role": "system", ...} entry mixed into the messages list). The assertion assert captured.get("system") is not None passes even if LLMClient embeds the system role inside captured["messages"] rather than hoisting it to a separate system kwarg — the assertion would silently pass with None returned by .get() if the key is absent. Consider assert captured["system"] != "" (explicit KeyError on absence) or — better — test the system-prompt path through a purpose-built helper or by inspecting captured["messages"] when the Anthropic client doesn't separate system from user messages.

@muddlebee

… naming, fixture dedupe, system-prompt assertion Three points from @muddlebee's 2026-04-28 review on PR Tracer-Cloud#780: 1. ``_MergedSpan.width`` was ambiguous — it tracks the *original single ScanMatch* width used for representative-rule selection, not the merged span's ``end - start``. Renamed to ``source_width`` and expanded both the NamedTuple docstring and the ``_redact`` algorithm docstring to spell out the distinction (``A∩B then C`` chained- overlap example) so a future reader can't conflate the two. 2. ``_FakeMessages`` / ``_FakeClient`` and the OpenAI equivalents were duplicated verbatim across four tests in ``TestOverlappingRedactionReachesDownstream``. Extracted them into two module-level pytest fixtures, ``anthropic_capture`` and ``openai_capture``, each yielding ``(client, captured)``. Each test now reads as one line of setup + the actual assertion. The two shared fake-response factories ``_anthropic_fake_response`` and ``_openai_fake_response`` keep the response-shape literals in one place so a future Anthropic / OpenAI SDK change touches one location instead of four. 3. ``test_anthropic_system_prompt_also_redacted`` used ``captured.get("system") is not None`` which was both unclear in intent and silent on the more interesting failure mode (system role smuggled into the messages list rather than hoisted to the top-level ``system`` kwarg). Replaced with subscript (KeyError on absent key), a non-empty assertion, and a structural check that no entry in ``captured["messages"]`` carries ``role: system``. Fail mode is now obvious in pytest output. Verification: - ``pytest tests/test_guardrails/`` — 91 / 91 pass - Full unit suite — 3159 pass / 2 skipped / 1 xfailed - ``ruff check`` and ``ruff format --check`` clean on both files - ``mypy app/guardrails/`` clean (the 5 errors are pre-existing missing-stub warnings in ``app/integrations/`` unrelated to this PR)

mayankbharati-ops · 2026-04-28T14:07:31Z

@muddlebee — pushed 5f1b3b6 addressing all three review nits. Detail per comment:

1. `app/guardrails/engine.py:42` — `width` field name ambiguity

Renamed _MergedSpan.width → source_width and expanded both the NamedTuple docstring and the _redact algorithm comment to spell out the distinction explicitly. The docstring now reads:

source_width is the width of the single ScanMatch that contributed the current rule_name — i.e. that match's end - start before any merging happened. It is not the merged span's width (end - start of the _MergedSpan); in a chained-overlap scenario where A merges B and then C overlaps the grown span, the rule-name winner is whichever individual source match was the widest, not whichever rule covered the widest post-merge range.

The inline comment in _redact also flags this on the line where source_width is computed, so anyone editing the loop body sees the distinction in context. Good catch — width alone was genuinely ambiguous.

2. `tests/test_guardrails/test_llm_integration.py:340` — `_FakeMessages` / `_FakeClient` duplication

Right — the four ad-hoc copies were a drift hazard. Extracted into two module-level pytest fixtures:

@pytest.fixture
def anthropic_capture(monkeypatch) -> tuple[Any, dict[str, Any]]:
    """Patch LLMClient (Anthropic surface) to capture messages.create kwargs."""
    ...
    return client, captured

@pytest.fixture
def openai_capture(monkeypatch) -> tuple[Any, dict[str, Any]]:
    """Patch OpenAILLMClient to capture chat.completions.create kwargs."""
    ...
    return client, captured

Each test in TestOverlappingRedactionReachesDownstream is now ~5–10 lines of actual test logic instead of 25+ of mostly-identical fake-class scaffolding. The two response-shape literals (_anthropic_fake_response / _openai_fake_response) are also factored out, so a future SDK change touches one location instead of four.

3. `tests/test_guardrails/test_llm_integration.py:382` — system-prompt assertion

Tightened to:

# Subscript (not ``.get``) so a missing ``system`` kwarg surfaces a
# KeyError immediately — the whole point of the test is that the
# client hoisted the system entry out of the messages list.
system_kwarg = captured["system"]
assert system_kwarg, "system kwarg must be non-empty after redaction"
assert "api_key=" not in system_kwarg
assert "[REDACTED:generic_api_token]" in system_kwarg
# System must be a separate kwarg, not smuggled into ``messages``.
for msg in captured.get("messages", []):
    assert msg.get("role") != "system", (
        f"system role leaked into messages list: {msg!r}"
    )

KeyError on a missing system kwarg is now louder than the old .get(...) is not None (which produced an unhelpful AssertionError if the kwarg was absent), and the new structural assertion catches the failure mode you flagged — Anthropic client failing to hoist role: system out of the messages list. The test now actually exercises both "the system kwarg exists and was redacted" and "the system role was not smuggled into messages."

Verification

pytest tests/test_guardrails/: 91 / 91 pass
Full unit suite: 3159 pass / 2 skipped / 1 xfailed
ruff check and ruff format --check clean on both files
mypy app/guardrails/ clean (the 5 errors mypy reports are pre-existing missing-stub warnings in app/integrations/ — unrelated)
CI on 5f1b3b6: all 5 active checks SUCCESS (quality, Analyze (python), typecheck, test, CodeQL)

Ready for another look.

muddlebee · 2026-04-28T14:14:58Z

Looks good from me. I will wait for additional reviews..

muddlebee · 2026-04-28T14:15:10Z

@VaibhavUpreti

github-actions · 2026-05-01T13:19:21Z

🏄 Some PRs rot in review for six weeks. @mayankbharati-ops's said "not today" and merged like it owned the place. 🌊

👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

greptile-apps Bot reviewed Apr 23, 2026

View reviewed changes

mayankbharati-ops force-pushed the fix/guardrail-overlapping-redaction-union branch from 7b221e1 to 7565e7d Compare April 25, 2026 07:17

mayankbharati-ops added 3 commits April 25, 2026 20:01

mayankbharati-ops force-pushed the fix/guardrail-overlapping-redaction-union branch from 7565e7d to 39eb5a8 Compare April 25, 2026 14:32

mayankbharati-ops mentioned this pull request Apr 25, 2026

refactor(utils): extract shared HTTP-post helper for delivery modules #952

Merged

7 tasks

yashksaini-coder approved these changes Apr 25, 2026

View reviewed changes

muddlebee reviewed Apr 28, 2026

View reviewed changes

muddlebee approved these changes Apr 28, 2026

View reviewed changes

muddlebee merged commit c60125c into Tracer-Cloud:main May 1, 2026
7 checks passed

Conversation

mayankbharati-ops commented Apr 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Fix

Deep verification

Algorithm correctness — adversarial audit (10/11 PASS, 1 was a test bug)

Security audit — bypass attempts (11/11 SAFE)

Determinism

E2E validation — full synthetic suites, real Anthropic LLM

Unit + integration coverage — 59 tests, 91 in tests/test_guardrails/ total

Engine-level (46 tests in test_engine.py)

Integration-level (13 tests in test_llm_integration.py)

Verification — final state

Scope

Security note

Uh oh!

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

greptile-apps Bot Apr 23, 2026

Choose a reason for hiding this comment

Uh oh!

mayankbharati-ops commented Apr 25, 2026

Uh oh!

yashksaini-coder left a comment

Choose a reason for hiding this comment

Uh oh!

muddlebee commented Apr 28, 2026

Uh oh!

mayankbharati-ops commented Apr 28, 2026

1. The bug, in one screenshot of output

2. Why main does this — the algorithm

3. The fix — canonical interval merge

4. Test coverage I added on top

5. Why this is worth merging

Uh oh!

muddlebee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

muddlebee Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

muddlebee Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

muddlebee Apr 28, 2026

Choose a reason for hiding this comment

Uh oh!

mayankbharati-ops commented Apr 28, 2026

1. app/guardrails/engine.py:42 — width field name ambiguity

2. tests/test_guardrails/test_llm_integration.py:340 — _FakeMessages / _FakeClient duplication

3. tests/test_guardrails/test_llm_integration.py:382 — system-prompt assertion

Verification

Uh oh!

muddlebee commented Apr 28, 2026

Uh oh!

muddlebee commented Apr 28, 2026

Uh oh!

Uh oh!

github-actions Bot commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mayankbharati-ops commented Apr 23, 2026 •

edited

Loading

Unit + integration coverage — 59 tests, 91 in `tests/test_guardrails/` total

Engine-level (46 tests in `test_engine.py`)

Integration-level (13 tests in `test_llm_integration.py`)

2. Why `main` does this — the algorithm

muddlebee left a comment •

edited

Loading

1. `app/guardrails/engine.py:42` — `width` field name ambiguity

2. `tests/test_guardrails/test_llm_integration.py:340` — `_FakeMessages` / `_FakeClient` duplication

3. `tests/test_guardrails/test_llm_integration.py:382` — system-prompt assertion