fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780
Conversation
Greptile SummaryThis PR fixes a span-leak bug in Key changes:
Confidence Score: 5/5Safe to merge — the interval merge algorithm is correct, well-documented, and backed by 7 targeted regression tests covering all overlap sub-cases, with zero regressions in the existing suite. The core fix (left-to-right interval merge replacing the buggy No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["apply(text)"] --> B["scan(text) → ScanResult"]
B --> C{any matches?}
C -- No --> D[return text unchanged]
C -- Yes --> E[audit each match]
E --> F{blocked?}
F -- Yes --> G[raise GuardrailBlockedError]
F -- No --> H["_redact(text, matches)"]
H --> I["Filter: action == REDACT\nSort by (start ASC, -end)"]
I --> J["Left-to-right sweep\nmerge overlapping intervals\ntrack widest individual match\nfor rule-name selection"]
J --> K["Right-to-left application\nof replacements over\nmerged intervals"]
K --> L[return redacted text]
style G fill:#f66,color:#fff
style L fill:#6c6,color:#fff
Reviews (1): Last reviewed commit: "fix(guardrails): redact union of overlap..." | Re-trigger Greptile |
| merged: list[tuple[int, int, str, int]] = [] | ||
| for match in redact_matches: | ||
| if match.end > seen_end: | ||
| continue | ||
| replacement = self._get_replacement(match.rule_name) | ||
| redacted = redacted[: match.start] + replacement + redacted[match.end :] | ||
| seen_end = match.start | ||
| width = match.end - match.start | ||
| if merged and match.start < merged[-1][1]: | ||
| prev_start, prev_end, prev_rule, prev_width = merged[-1] | ||
| new_end = max(prev_end, match.end) | ||
| if width > prev_width: | ||
| merged[-1] = (prev_start, new_end, match.rule_name, width) | ||
| else: | ||
| merged[-1] = (prev_start, new_end, prev_rule, prev_width) | ||
| else: | ||
| merged.append((match.start, match.end, match.rule_name, width)) |
There was a problem hiding this comment.
Consider a
NamedTuple for the merged interval for readability
The inner loop reads and writes the merged list by positional tuple index (merged[-1][1], merged[-1][3], etc.). The positions are correct, but the tuple layout (start, end, rule_name, width) is entirely implicit — a reader must mentally map each index to its meaning.
A lightweight NamedTuple would make accesses self-documenting with zero runtime overhead:
from typing import NamedTuple
class _MergedInterval(NamedTuple):
start: int
end: int
rule_name: str
width: intThen the sweep body becomes:
merged: list[_MergedInterval] = []
for match in redact_matches:
width = match.end - match.start
if merged and match.start < merged[-1].end:
prev = merged[-1]
new_end = max(prev.end, match.end)
if width > prev.width:
merged[-1] = _MergedInterval(prev.start, new_end, match.rule_name, width)
else:
merged[-1] = _MergedInterval(prev.start, new_end, prev.rule_name, prev.width)
else:
merged.append(_MergedInterval(match.start, match.end, match.rule_name, width))This is a non-blocking suggestion; the current code is functionally correct.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
7b221e1 to
7565e7d
Compare
|
Heads up: the
I opened #929 with the minimal stub repair (14 lines in the test file, no production code touched). Once that lands, I'll rebase this PR — the rest of CI on this branch is green ( |
…ive bookends ``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a single ``seen_end`` cursor and skipped any match whose end exceeded the cursor. When a wider match overlapped or contained an already-redacted narrower match, the wider match was silently dropped — leaving its prefix and suffix in the output. Concretely, with rules matching ``super_secret_token_value`` (wide) and ``secret_token`` (contained): input: "data super_secret_token_value end" before: "data super_[REDACTED:short]_value end" ← super_ and _value leak after: "data [REDACTED:long] end" The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the secondary sort key; matches that started at different offsets but fully contained a narrower match still slipped through. Replace the single-cursor walk with a proper merge: 1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset keep the longest-keyword-wins behavior Tracer-Cloud#520 restored. 2. Sweep left-to-right, merging any match whose ``start`` falls before the current interval's ``end``. The representative rule for the merged interval is whichever contributing match had the largest original width, so the output reflects the most-specific rule that matched anywhere in the merged span. 3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output. Adjacent matches (``A.end == B.start``) are intentionally not merged — they don't overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Tests added in ``tests/test_guardrails/test_engine.py``: - contained-span union redaction (the regression from the bug report) - longest-rule-name semantics on a contained span - partial (non-containing) overlap produces one union redaction - disjoint matches stay separate, middle text preserved - adjacent matches stay separate (touching at one offset, not overlapping) - three-way transitive overlap chain collapses to one span - regex pattern + keyword on the same region: wider-wins applies across match kinds Existing coverage preserved: - ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case - ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap - ``test_multiple_rules_on_same_span`` — audit + redact on same span Verification: - ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module - Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail
Extend the overlap regression tests with two cases grounded in the exact patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an audit-behavior assertion that demonstrates the match-level audit record remains complete even when the output-level redactions are merged: - test_real_world_api_key_and_aws_access_key_overlap: config: api_key=AKIAIOSFODNN7EXAMPLE aws_access_key pattern matches the [16:36] span; generic_api_token pattern matches the [8:36] span (contains aws_access_key). Pre-fix output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]". - test_real_world_aws_secret_key_contains_aws_access_key: export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst aws_access_key matches [29:49]; aws_secret_key matches [7:69] and contains it. Pre-fix output leaked the label and the 40-char tail. Merged output: "export [REDACTED:aws_secret_key]". - test_audit_logger_records_every_match_even_when_merged: Confirms the AuditLogger still emits one entry per ScanMatch even when multiple matches merge into a single output redaction — a reviewer tracing audit -> output can still account for every rule that fired. Uses a real AuditLogger with a tmp_path JSONL. Verification: - pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3) - pytest tests/: 2777 pass, 1 skipped, 0 failures - ruff check + format: clean
Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the
four call sites that invoke it in the pipeline:
- ``app/services/llm_client.py::LLMClient.invoke`` (Anthropic messages + system)
- ``app/services/llm_client.py::BedrockLLMClient.invoke`` (same shape as Anthropic)
- ``app/services/llm_client.py::OpenAILLMClient.invoke`` (OpenAI chat.completions)
- ``app/nodes/chat.py::_apply_guardrails_to_messages`` (LangGraph chat node)
Five new parametrized integration tests load a ruleset modeled on the
shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of
``generic_api_token`` when the token value is itself an AWS key — and
assert that the downstream payload (captured by a stub that replaces the
real API client) contains no fragment of either the label prefix or the
key value:
- ``test_anthropic_client_sends_merged_redaction``:
``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is
captured at ``Anthropic.messages.create`` and asserted to carry a
single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or
``AKIA`` fragment.
- ``test_anthropic_system_prompt_also_redacted``:
Same shape, but asserts the ``system`` kwarg (distinct code path from
per-message ``content``) gets the merged treatment.
- ``test_openai_client_sends_merged_redaction``:
Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``.
- ``test_chat_node_emits_merged_redaction``:
``_apply_guardrails_to_messages`` on a LangGraph-style message list,
asserts original is untouched (defensive copy) and the returned
message has merged redaction.
- ``test_contained_real_secret_fully_redacted_in_pipeline``:
End-to-end regression guard. Asserts none of ``api_key=``, ``AKIA``,
``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload.
Pre-fix main leaks ``api_key=`` and the surrounding characters.
Verification:
- ``pytest tests/``: 2782 pass (was 2777; +5 integration)
- ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
7565e7d to
39eb5a8
Compare
Address Greptile P2 readability suggestion on PR Tracer-Cloud#780. The interval-merge loop in ``GuardrailEngine._redact`` previously read and wrote the merged list as raw 4-tuples accessed by positional index (``merged[-1][1]``, ``merged[-1][3]``, etc.), which obscured what each slot meant. Replace the raw tuple with a typed ``_MergedSpan`` NamedTuple carrying named fields (``start``, ``end``, ``rule_name``, ``width``) so the loop reads naturally: merged[-1].end # was merged[-1][1] prev.width # was prev_width _MergedSpan(start, ...) # was (start, ..., width) No behavior change — same algorithm, same outputs, same 91 tests pass. Verification: - ``pytest tests/test_guardrails/``: 91 pass, 0 fail (unchanged) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module
yashksaini-coder
left a comment
There was a problem hiding this comment.
This is a well-engineered security fix. Here's what I verified:
Algorithm correctness — the interval merge is textbook correct:
- Sort by
(start ASC, -end)puts widest match first at same start offset, preserving the behavior from #520 while enabling a correct forward sweep - Merge condition
match.start < merged[-1].endcorrectly catches containment and partial overlap; adjacent spans (A.end == B.start) intentionally stay separate ✅ max(prev.end, match.end)handles partial overlap correctly; width comparison picks the widest individual match's rule name for the representative label ✅- Right-to-left application over non-overlapping merged intervals keeps string indices valid throughout ✅
- Empty match list → empty
merged→textreturned unchanged ✅
I traced through all adversarial shapes (3-way transitive, containment, same-span duplicate, widest-first) — all produce correct output.
Tests — 91/91 pass. The 10 new engine regression tests cover every overlap subclass that the old seen_end cursor got wrong. The 5 LLM integration tests asserting no secret fragment reaches an API call are the right layer to test — good defensive coverage.
Security impact — this closes a real data-disclosure gap. With the shipped _STARTER_CONFIG, api_key=AKIA... and aws_secret_access_key=... both triggered two rules simultaneously and the prefix/suffix bookends leaked on main. Fixed.
CI — all checks green: test, typecheck, quality, CodeQL, Analyze (python). The pre-existing _InputStub failure the author called out was fixed by #929 which is already on main. The branch is clean and mergeable.
No issues. Excellent work.
|
@mayankbharati-ops still trying to understand the context for this change? could you help |
|
@muddlebee — happy to walk through it. The TL;DR is a sensitive-data leak in 1. The bug, in one screenshot of outputI ran the exact same scenarios through
Row 2 is the worst one: with 2. Why
|
| Layer | What it pins |
|---|---|
tests/test_guardrails/test_engine.py (+10 tests) |
contained span (the core regression), longest-rule-name on contained, partial overlap, disjoint, adjacent, three-way transitive A∩B∩C, pattern + keyword on the same span, two _STARTER_CONFIG-grounded production-rule tests, audit-under-merge invariant |
tests/test_guardrails/test_llm_integration.py (+5 tests) |
one per call site of engine.apply — that the merged redacted text actually reaches downstream APIs: Anthropic.messages.create body, Anthropic system kwarg, OpenAI.chat.completions.create body, the LangGraph chat node _apply_guardrails_to_messages (also asserts the source list is not mutated — defensive copy invariant), and an end-to-end regression guard with the shipped-config patterns. BedrockLLMClient shares Anthropic's redaction code path so it's covered transitively. |
Live verification just now: 91/91 tests in tests/test_guardrails/ pass on this branch. CI on the PR is green: test, typecheck, quality, CodeQL, Analyze (python). Approved by @yashksaini-coder on 2026-04-25 with an independent algorithm walkthrough.
5. Why this is worth merging
The class of bug is sensitive-data disclosure to an LLM provider (Anthropic / OpenAI / Bedrock). Anywhere a Tracer user enables guardrails over their default _STARTER_CONFIG, two rules covering the same secret will leak the prefix/suffix bookends into every outgoing prompt, system message, and chat node. This PR makes the redaction step actually idempotent over overlapping rule sets.
Happy to step through any specific scenario on a call or in a screenshare if any part of the merge algorithm itself is the unclear bit — that's the part most worth eyeballing carefully.
| """ | ||
|
|
||
| start: int | ||
| end: int |
There was a problem hiding this comment.
The width field tracks the original single-match width, not the current merged span's width (end - start). In a chained-overlap scenario (A merges B, then C overlaps the grown span), the rule-name winner is whichever individual source match had the largest original width — not whichever rule covered the widest post-merge range. This is intentional and documented in the class docstring, but the field name width alone is ambiguous. Renaming to source_width (or adding a short note in the docstring: "original match width before any merging") would prevent the next reader from assuming it equals end - start.
| assert "IOSFODNN7EXAMPLE" not in content | ||
| # Representative rule is the wider one (generic_api_token). | ||
| assert "[REDACTED:generic_api_token]" in content | ||
| assert "[REDACTED:aws_access_key]" not in content |
There was a problem hiding this comment.
The _FakeMessages and _FakeClient inner classes are duplicated verbatim in all five tests in this class. If the fake's interface ever changes (e.g., to return a streaming response type) all five copies need updating in sync — a common source of drift. A shared @pytest.fixture returning a (client, captured) tuple would eliminate the duplication and keep this test class under ~150 lines.
| assert "api_key=" not in captured["system"] | ||
| assert "AKIA" not in captured["system"] | ||
| assert "[REDACTED:generic_api_token]" in captured["system"] | ||
|
|
There was a problem hiding this comment.
The comment in this test admits the system-prompt injection path is awkward (passing a {"role": "system", ...} entry mixed into the messages list). The assertion assert captured.get("system") is not None passes even if LLMClient embeds the system role inside captured["messages"] rather than hoisting it to a separate system kwarg — the assertion would silently pass with None returned by .get() if the key is absent. Consider assert captured["system"] != "" (explicit KeyError on absence) or — better — test the system-prompt path through a purpose-built helper or by inspecting captured["messages"] when the Anthropic client doesn't separate system from user messages.
… naming, fixture dedupe, system-prompt assertion Three points from @muddlebee's 2026-04-28 review on PR Tracer-Cloud#780: 1. ``_MergedSpan.width`` was ambiguous — it tracks the *original single ScanMatch* width used for representative-rule selection, not the merged span's ``end - start``. Renamed to ``source_width`` and expanded both the NamedTuple docstring and the ``_redact`` algorithm docstring to spell out the distinction (``A∩B then C`` chained- overlap example) so a future reader can't conflate the two. 2. ``_FakeMessages`` / ``_FakeClient`` and the OpenAI equivalents were duplicated verbatim across four tests in ``TestOverlappingRedactionReachesDownstream``. Extracted them into two module-level pytest fixtures, ``anthropic_capture`` and ``openai_capture``, each yielding ``(client, captured)``. Each test now reads as one line of setup + the actual assertion. The two shared fake-response factories ``_anthropic_fake_response`` and ``_openai_fake_response`` keep the response-shape literals in one place so a future Anthropic / OpenAI SDK change touches one location instead of four. 3. ``test_anthropic_system_prompt_also_redacted`` used ``captured.get("system") is not None`` which was both unclear in intent and silent on the more interesting failure mode (system role smuggled into the messages list rather than hoisted to the top-level ``system`` kwarg). Replaced with subscript (KeyError on absent key), a non-empty assertion, and a structural check that no entry in ``captured["messages"]`` carries ``role: system``. Fail mode is now obvious in pytest output. Verification: - ``pytest tests/test_guardrails/`` — 91 / 91 pass - Full unit suite — 3159 pass / 2 skipped / 1 xfailed - ``ruff check`` and ``ruff format --check`` clean on both files - ``mypy app/guardrails/`` clean (the 5 errors are pre-existing missing-stub warnings in ``app/integrations/`` unrelated to this PR)
|
@muddlebee — pushed 1.
|
|
Looks good from me. I will wait for additional reviews.. |
|
🏄 Some PRs rot in review for six weeks. @mayankbharati-ops's said "not today" and merged like it owned the place. 🌊 👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome. |

Summary
GuardrailEngine.applywalked sorted redact matches right-to-left with a singleseen_endcursor and skipped any match whose end exceeded that cursor. When a wider match overlapped — or fully contained — an already-redacted narrower match, the wider match was silently dropped and its prefix / suffix survived in the output.Concretely, with two rules (wide:
super_secret_token_value, inner:secret_token):main)\"data super_[REDACTED:short]_value end\"\"data [REDACTED:long] end\"The
super_and_valuefragments leaked into the output. PR #520's earlier fix handled only the same-start subcase; matches with different start offsets that fully contained a narrower match still slipped through.Affects the shipped starter rules. With
_STARTER_CONFIGinapp/guardrails/cli.py, text likeconfig: api_key=AKIAIOSFODNN7EXAMPLEtriggers bothaws_access_key([16:36]) andgeneric_api_token([8:36]). Pre-fix the prefixapi_key=leaked; after, the whole span is redacted. Same foraws_secret_access_key=AKIA...xxxxx....Fix
Replace the single-cursor walk with the canonical interval-merge algorithm:
(start ASC, -end)so same-start ties put the widest match first.start < merged[-1].end. Representative rule for the merged span is the contributing match with the largest original width.Adjacent matches (
A.end == B.start) intentionally do not merge. Disjoint matches produce independent redactions. Audit logging is unchanged — match-level entries are still emitted perScanMatcheven when output redactions merge.Algorithm matches the standard merge-intervals pattern (see Merge Intervals - LeetCode 56).
Deep verification
Algorithm correctness — adversarial audit (10/11 PASS, 1 was a test bug)
Constructed every nasty edge case I could think of:
replacementfalls back to[REDACTED:name]Security audit — bypass attempts (11/11 SAFE)
GuardrailBlockedError[X]in replacement don't confuse downstreamDeterminism
get_guardrail_engine()singleton is stable ✓main's behavior, no regressionE2E validation — full synthetic suites, real Anthropic LLM
HEALTHY_SHORT_CIRCUIT=true, mocked backends, realclaude-sonnet-4-6reasoning +claude-haiku-4-5toolcall, on this branch.state: alertingLLM categorization drift — pre-existing onmain, identical to PR #777's verificationstate: alerting, same pre-existing driftConfirmed every failure is in
state: alerting(programmatic check on eachalert.json). The synthetic fixtures don't contain overlapping-rule secrets, so the failures are in territory this PR does not touch.Unit + integration coverage — 59 tests, 91 in
tests/test_guardrails/totalEngine-level (46 tests in
test_engine.py)_STARTER_CONFIG-grounded production-rule tests, audit-under-merge invariant.Integration-level (13 tests in
test_llm_integration.py)engine.apply(), asserting downstream payload contains no secret fragment:AnthropicLLMClient.invoke→ captured atAnthropic.messages.createAnthropicLLMClient.invoke→ capturedsystemkwargOpenAILLMClient.invoke→ captured atchat.completions.create_apply_guardrails_to_messages(LangGraph chat node) — also asserts source untouched (defensive copy)BedrockLLMClient.invokeshares its redaction code path withAnthropicLLMClient.invokeand is therefore covered by the same logic.Verification — final state
test (ubuntu-latest)PASS,typecheckPASS,qualityPASS,CodeQLPASS,Analyze (python)PASS. (Re-running on rebased branch.)pytest tests/(Python 3.14, no coverage): 3124 pass, 1 skipped, 1 pre-existing failure intests/nodes/plan_actions/(unrelated —_InputStubmissingmodel_copy, present on pristinemain)pytest tests/test_guardrails/: 91 pass, 0 failruff check app/ tests/: cleanruff format --checkon touched files: cleanmypy app/guardrails/: clean on touched moduleScope
app/guardrails/engine.pytests/test_guardrails/test_engine.pytests/test_guardrails/test_llm_integration.pyNo public API changes. No state-shape changes. No behavior change for non-overlapping matches, audit-only rules, or blocked rules. Directly extends #520, whose same-start case is preserved verbatim.
Rebased onto current
main(post-#777). No file overlap with #777.Security note
The bug class is sensitive-data disclosure. Anywhere guardrail rules filter outputs shown to users, logged, or exported, two overlapping rules covering the same secret could each partially redact and leave fragments in the wild. The shipped
_STARTER_CONFIGdemonstrates this concretely —api_key=AKIA...andaws_secret_access_key=...both produced partial redactions onmain. This PR closes the gap.