Skip to content

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780

Merged
muddlebee merged 5 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/guardrail-overlapping-redaction-union
May 1, 2026
Merged

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends#780
muddlebee merged 5 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/guardrail-overlapping-redaction-union

Conversation

@mayankbharati-ops
Copy link
Copy Markdown
Contributor

@mayankbharati-ops mayankbharati-ops commented Apr 23, 2026

Summary

GuardrailEngine.apply walked sorted redact matches right-to-left with a single seen_end cursor and skipped any match whose end exceeded that cursor. When a wider match overlapped — or fully contained — an already-redacted narrower match, the wider match was silently dropped and its prefix / suffix survived in the output.

Concretely, with two rules (wide: super_secret_token_value, inner: secret_token):

Output
Before (main) \"data super_[REDACTED:short]_value end\"
After (this PR) \"data [REDACTED:long] end\"

The super_ and _value fragments leaked into the output. PR #520's earlier fix handled only the same-start subcase; matches with different start offsets that fully contained a narrower match still slipped through.

Affects the shipped starter rules. With _STARTER_CONFIG in app/guardrails/cli.py, text like config: api_key=AKIAIOSFODNN7EXAMPLE triggers both aws_access_key ([16:36]) and generic_api_token ([8:36]). Pre-fix the prefix api_key= leaked; after, the whole span is redacted. Same for aws_secret_access_key=AKIA...xxxxx....

Fix

Replace the single-cursor walk with the canonical interval-merge algorithm:

  1. Sort redact matches by (start ASC, -end) so same-start ties put the widest match first.
  2. Sweep left-to-right, merging any match whose start < merged[-1].end. Representative rule for the merged span is the contributing match with the largest original width.
  3. Apply replacements right-to-left so string indices stay valid.

Adjacent matches (A.end == B.start) intentionally do not merge. Disjoint matches produce independent redactions. Audit logging is unchanged — match-level entries are still emitted per ScanMatch even when output redactions merge.

Algorithm matches the standard merge-intervals pattern (see Merge Intervals - LeetCode 56).

Deep verification

Algorithm correctness — adversarial audit (10/11 PASS, 1 was a test bug)

Constructed every nasty edge case I could think of:

Case Result
Empty input PASS
Single match at start / end / spanning entire text PASS
Same-span same-rule duplicate via pattern + keyword PASS
Nested 3 levels deep PASS
Staircase overlaps (each next overlaps only the prior) PASS
197 partially-overlapping rules on 1000-char text PASS, 8.6 ms
1000 disjoint same-keyword matches PASS, 1.1 ms
Adjacent matches (A.end == B.start) stay separate PASS
Disjoint with 10000-char gap PASS
Match at exact text boundary (start=0 or end=len) PASS
Empty replacement falls back to [REDACTED:name] PASS (was test bug, not engine bug)

Security audit — bypass attempts (11/11 SAFE)

Scenario Result
Custom replacement containing another rule's keyword SAFE — replacements not re-scanned (documented invariant)
Block + redact overlap SAFE — block correctly raises GuardrailBlockedError
Audit-only rule overlapping a redact rule SAFE — redact still applied
Disabled rule does not bypass active rule SAFE
Replacement string mentions another rule's secret SAFE
10-deep nested matches sharing the same secret SAFE — outermost merges all
Empty rule list SAFE (passthrough; no protection promised)
Brackets [X] in replacement don't confuse downstream SAFE

Determinism

  • Same-input → same-output across repeated runs ✓
  • get_guardrail_engine() singleton is stable ✓
  • Tied-width same-span overlap uses first-defined rule name — identical to main's behavior, no regression

E2E validation — full synthetic suites, real Anthropic LLM

HEALTHY_SHORT_CIRCUIT=true, mocked backends, real claude-sonnet-4-6 reasoning + claude-haiku-4-5 toolcall, on this branch.

Suite Result Notes
EKS (14 scenarios) 11 PASS, 3 FAIL All 3 failures state: alerting LLM categorization drift — pre-existing on main, identical to PR #777's verification
RDS (15 scenarios) 11 PASS, 4 FAIL All 4 failures state: alerting, same pre-existing drift

Confirmed every failure is in state: alerting (programmatic check on each alert.json). The synthetic fixtures don't contain overlapping-rule secrets, so the failures are in territory this PR does not touch.

Unit + integration coverage — 59 tests, 91 in tests/test_guardrails/ total

Engine-level (46 tests in test_engine.py)

  • 36 pre-existing
  • 10 new regression tests: contained-span union (the core regression), longest-rule-name on contained, partial overlap, disjoint, adjacent, three-way transitive overlap, pattern+keyword mixed, two _STARTER_CONFIG-grounded production-rule tests, audit-under-merge invariant.

Integration-level (13 tests in test_llm_integration.py)

  • 8 pre-existing
  • 5 new tests — one per call site of engine.apply(), asserting downstream payload contains no secret fragment:
    • AnthropicLLMClient.invoke → captured at Anthropic.messages.create
    • AnthropicLLMClient.invoke → captured system kwarg
    • OpenAILLMClient.invoke → captured at chat.completions.create
    • _apply_guardrails_to_messages (LangGraph chat node) — also asserts source untouched (defensive copy)
    • End-to-end regression guard with shipped-config patterns

BedrockLLMClient.invoke shares its redaction code path with AnthropicLLMClient.invoke and is therefore covered by the same logic.

Verification — final state

  • CI (Python 3.13, post-fix: emit healthy-short-circuit claims from investigation keys only #777-merge main): test (ubuntu-latest) PASS, typecheck PASS, quality PASS, CodeQL PASS, Analyze (python) PASS. (Re-running on rebased branch.)
  • Local pytest tests/ (Python 3.14, no coverage): 3124 pass, 1 skipped, 1 pre-existing failure in tests/nodes/plan_actions/ (unrelated — _InputStub missing model_copy, present on pristine main)
  • pytest tests/test_guardrails/: 91 pass, 0 fail
  • ruff check app/ tests/: clean
  • ruff format --check on touched files: clean
  • mypy app/guardrails/: clean on touched module

Scope

File Change
app/guardrails/engine.py ~40 line replacement of redaction loop
tests/test_guardrails/test_engine.py 10 new tests
tests/test_guardrails/test_llm_integration.py 5 new tests

No public API changes. No state-shape changes. No behavior change for non-overlapping matches, audit-only rules, or blocked rules. Directly extends #520, whose same-start case is preserved verbatim.

Rebased onto current main (post-#777). No file overlap with #777.

Security note

The bug class is sensitive-data disclosure. Anywhere guardrail rules filter outputs shown to users, logged, or exported, two overlapping rules covering the same secret could each partially redact and leave fragments in the wild. The shipped _STARTER_CONFIG demonstrates this concretely — api_key=AKIA... and aws_secret_access_key=... both produced partial redactions on main. This PR closes the gap.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 23, 2026

Greptile Summary

This PR fixes a span-leak bug in GuardrailEngine._redact where overlapping redaction rules (e.g. a wide keyword fully containing a narrower keyword) would silently drop the wider rule's match, leaving its prefix and suffix unredacted in the output. The fix replaces the old single-cursor right-to-left walk with a proper interval merge: sort → left-to-right sweep to merge overlapping spans (tracking the largest individual match width to select the representative rule name) → right-to-left application of replacements to preserve string indices.

Key changes:

  • apply delegates to a new _redact(text, matches) method, improving testability and separation of concerns.
  • The sort key changes from reverse=True on (start, end) to (start ASC, -end) — preserving the "widest-match-first at same start" behavior from PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520 while enabling a correct forward sweep.
  • The seen_end cursor and skip logic are replaced with an O(n) interval merge that handles all three overlap classes: containment, partial overlap, and the transitive chain A∩B, B∩C, A⊥C.
  • 7 new regression tests cover all relevant sub-cases; the 36 existing guardrail tests are unaffected.

Confidence Score: 5/5

Safe to merge — the interval merge algorithm is correct, well-documented, and backed by 7 targeted regression tests covering all overlap sub-cases, with zero regressions in the existing suite.

The core fix (left-to-right interval merge replacing the buggy seen_end cursor) is algorithmically correct: overlapping spans are collapsed, the widest individual match wins the representative rule name, adjacent spans remain separate, and right-to-left application preserves string indices. All edge cases (containment, partial overlap, transitive chain, disjoint, adjacent, regex+keyword mix) have dedicated tests. Code follows project style conventions. The only finding is a non-blocking P2 style suggestion to use a NamedTuple instead of a raw 4-tuple for the merged interval list.

No files require special attention.

Important Files Changed

Filename Overview
app/guardrails/engine.py Extracts _redact into a dedicated method that performs correct interval merging; the old single-cursor right-to-left walk is fully replaced. Algorithm is sound and well-documented.
tests/test_guardrails/test_engine.py Adds 7 targeted regression tests covering contained spans, partial overlap, disjoint, adjacent, three-way chain, and pattern-vs-keyword cases. All new tests are well-structured and use the existing _rule helper correctly.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A["apply(text)"] --> B["scan(text) → ScanResult"]
    B --> C{any matches?}
    C -- No --> D[return text unchanged]
    C -- Yes --> E[audit each match]
    E --> F{blocked?}
    F -- Yes --> G[raise GuardrailBlockedError]
    F -- No --> H["_redact(text, matches)"]
    H --> I["Filter: action == REDACT\nSort by (start ASC, -end)"]
    I --> J["Left-to-right sweep\nmerge overlapping intervals\ntrack widest individual match\nfor rule-name selection"]
    J --> K["Right-to-left application\nof replacements over\nmerged intervals"]
    K --> L[return redacted text]
    style G fill:#f66,color:#fff
    style L fill:#6c6,color:#fff
Loading

Reviews (1): Last reviewed commit: "fix(guardrails): redact union of overlap..." | Re-trigger Greptile

Comment thread app/guardrails/engine.py Outdated
Comment on lines +155 to +166
merged: list[tuple[int, int, str, int]] = []
for match in redact_matches:
if match.end > seen_end:
continue
replacement = self._get_replacement(match.rule_name)
redacted = redacted[: match.start] + replacement + redacted[match.end :]
seen_end = match.start
width = match.end - match.start
if merged and match.start < merged[-1][1]:
prev_start, prev_end, prev_rule, prev_width = merged[-1]
new_end = max(prev_end, match.end)
if width > prev_width:
merged[-1] = (prev_start, new_end, match.rule_name, width)
else:
merged[-1] = (prev_start, new_end, prev_rule, prev_width)
else:
merged.append((match.start, match.end, match.rule_name, width))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Consider a NamedTuple for the merged interval for readability

The inner loop reads and writes the merged list by positional tuple index (merged[-1][1], merged[-1][3], etc.). The positions are correct, but the tuple layout (start, end, rule_name, width) is entirely implicit — a reader must mentally map each index to its meaning.

A lightweight NamedTuple would make accesses self-documenting with zero runtime overhead:

from typing import NamedTuple

class _MergedInterval(NamedTuple):
    start: int
    end: int
    rule_name: str
    width: int

Then the sweep body becomes:

merged: list[_MergedInterval] = []
for match in redact_matches:
    width = match.end - match.start
    if merged and match.start < merged[-1].end:
        prev = merged[-1]
        new_end = max(prev.end, match.end)
        if width > prev.width:
            merged[-1] = _MergedInterval(prev.start, new_end, match.rule_name, width)
        else:
            merged[-1] = _MergedInterval(prev.start, new_end, prev.rule_name, prev.width)
    else:
        merged.append(_MergedInterval(match.start, match.end, match.rule_name, width))

This is a non-blocking suggestion; the current code is functionally correct.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@mayankbharati-ops mayankbharati-ops force-pushed the fix/guardrail-overlapping-redaction-union branch from 7b221e1 to 7565e7d Compare April 25, 2026 07:17
@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

Heads up: the test (ubuntu-latest) CI failure on this PR is a pre-existing bug on main, not introduced here.

tests/nodes/plan_actions/test_node_plan_actions.py::test_node_plan_actions_emits_retrieval_controls raises AttributeError: '_InputStub' object has no attribute 'model_copy' because PR #807 added a model_copy / model_dump round-trip in app/nodes/plan_actions/node.py:55 but did not update the test stub. The same failure reproduces on a clean checkout of main with no other changes.

I opened #929 with the minimal stub repair (14 lines in the test file, no production code touched). Once that lands, I'll rebase this PR — the rest of CI on this branch is green (typecheck, quality, CodeQL, Analyze (python)).

…ive bookends

``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a
single ``seen_end`` cursor and skipped any match whose end exceeded the cursor.
When a wider match overlapped or contained an already-redacted narrower match,
the wider match was silently dropped — leaving its prefix and suffix in the
output. Concretely, with rules matching ``super_secret_token_value`` (wide)
and ``secret_token`` (contained):

  input:  "data super_secret_token_value end"
  before: "data super_[REDACTED:short]_value end"   ← super_ and _value leak
  after:  "data [REDACTED:long] end"

The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the
secondary sort key; matches that started at different offsets but fully
contained a narrower match still slipped through.

Replace the single-cursor walk with a proper merge:

1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset
   keep the longest-keyword-wins behavior Tracer-Cloud#520 restored.
2. Sweep left-to-right, merging any match whose ``start`` falls before the
   current interval's ``end``. The representative rule for the merged
   interval is whichever contributing match had the largest original width,
   so the output reflects the most-specific rule that matched anywhere in
   the merged span.
3. Apply replacements right-to-left over the merged intervals so string
   indices stay valid as each redaction resizes the output.

Adjacent matches (``A.end == B.start``) are intentionally not merged —
they don't overlap. Disjoint matches produce independent redactions
with intervening text preserved verbatim.

Tests added in ``tests/test_guardrails/test_engine.py``:

- contained-span union redaction (the regression from the bug report)
- longest-rule-name semantics on a contained span
- partial (non-containing) overlap produces one union redaction
- disjoint matches stay separate, middle text preserved
- adjacent matches stay separate (touching at one offset, not overlapping)
- three-way transitive overlap chain collapses to one span
- regex pattern + keyword on the same region: wider-wins applies across
  match kinds

Existing coverage preserved:
- ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case
- ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap
- ``test_multiple_rules_on_same_span`` — audit + redact on same span

Verification:

- ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
- Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail
Extend the overlap regression tests with two cases grounded in the exact
patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an
audit-behavior assertion that demonstrates the match-level audit record
remains complete even when the output-level redactions are merged:

- test_real_world_api_key_and_aws_access_key_overlap:
  config: api_key=AKIAIOSFODNN7EXAMPLE
  aws_access_key pattern matches the [16:36] span; generic_api_token
  pattern matches the [8:36] span (contains aws_access_key). Pre-fix
  output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]".

- test_real_world_aws_secret_key_contains_aws_access_key:
  export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst
  aws_access_key matches [29:49]; aws_secret_key matches [7:69] and
  contains it. Pre-fix output leaked the label and the 40-char tail.
  Merged output: "export [REDACTED:aws_secret_key]".

- test_audit_logger_records_every_match_even_when_merged:
  Confirms the AuditLogger still emits one entry per ScanMatch even
  when multiple matches merge into a single output redaction — a
  reviewer tracing audit -> output can still account for every rule
  that fired. Uses a real AuditLogger with a tmp_path JSONL.

Verification:
- pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3)
- pytest tests/: 2777 pass, 1 skipped, 0 failures
- ruff check + format: clean
Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the
four call sites that invoke it in the pipeline:

- ``app/services/llm_client.py::LLMClient.invoke``  (Anthropic messages + system)
- ``app/services/llm_client.py::BedrockLLMClient.invoke``  (same shape as Anthropic)
- ``app/services/llm_client.py::OpenAILLMClient.invoke``  (OpenAI chat.completions)
- ``app/nodes/chat.py::_apply_guardrails_to_messages``  (LangGraph chat node)

Five new parametrized integration tests load a ruleset modeled on the
shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of
``generic_api_token`` when the token value is itself an AWS key — and
assert that the downstream payload (captured by a stub that replaces the
real API client) contains no fragment of either the label prefix or the
key value:

- ``test_anthropic_client_sends_merged_redaction``:
  ``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is
  captured at ``Anthropic.messages.create`` and asserted to carry a
  single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or
  ``AKIA`` fragment.

- ``test_anthropic_system_prompt_also_redacted``:
  Same shape, but asserts the ``system`` kwarg (distinct code path from
  per-message ``content``) gets the merged treatment.

- ``test_openai_client_sends_merged_redaction``:
  Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``.

- ``test_chat_node_emits_merged_redaction``:
  ``_apply_guardrails_to_messages`` on a LangGraph-style message list,
  asserts original is untouched (defensive copy) and the returned
  message has merged redaction.

- ``test_contained_real_secret_fully_redacted_in_pipeline``:
  End-to-end regression guard.  Asserts none of ``api_key=``, ``AKIA``,
  ``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload.
  Pre-fix main leaks ``api_key=`` and the surrounding characters.

Verification:
- ``pytest tests/``: 2782 pass (was 2777; +5 integration)
- ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
@mayankbharati-ops mayankbharati-ops force-pushed the fix/guardrail-overlapping-redaction-union branch from 7565e7d to 39eb5a8 Compare April 25, 2026 14:32
Address Greptile P2 readability suggestion on PR Tracer-Cloud#780. The interval-merge
loop in ``GuardrailEngine._redact`` previously read and wrote the merged
list as raw 4-tuples accessed by positional index (``merged[-1][1]``,
``merged[-1][3]``, etc.), which obscured what each slot meant. Replace
the raw tuple with a typed ``_MergedSpan`` NamedTuple carrying named
fields (``start``, ``end``, ``rule_name``, ``width``) so the loop reads
naturally:

    merged[-1].end             # was merged[-1][1]
    prev.width                 # was prev_width
    _MergedSpan(start, ...)    # was (start, ..., width)

No behavior change — same algorithm, same outputs, same 91 tests pass.

Verification:
- ``pytest tests/test_guardrails/``: 91 pass, 0 fail (unchanged)
- ``ruff check`` / ``ruff format --check``: clean
- ``mypy app/guardrails/``: clean on touched module
Copy link
Copy Markdown
Collaborator

@yashksaini-coder yashksaini-coder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a well-engineered security fix. Here's what I verified:

Algorithm correctness — the interval merge is textbook correct:

  • Sort by (start ASC, -end) puts widest match first at same start offset, preserving the behavior from #520 while enabling a correct forward sweep
  • Merge condition match.start < merged[-1].end correctly catches containment and partial overlap; adjacent spans (A.end == B.start) intentionally stay separate ✅
  • max(prev.end, match.end) handles partial overlap correctly; width comparison picks the widest individual match's rule name for the representative label ✅
  • Right-to-left application over non-overlapping merged intervals keeps string indices valid throughout ✅
  • Empty match list → empty mergedtext returned unchanged ✅

I traced through all adversarial shapes (3-way transitive, containment, same-span duplicate, widest-first) — all produce correct output.

Tests — 91/91 pass. The 10 new engine regression tests cover every overlap subclass that the old seen_end cursor got wrong. The 5 LLM integration tests asserting no secret fragment reaches an API call are the right layer to test — good defensive coverage.

Security impact — this closes a real data-disclosure gap. With the shipped _STARTER_CONFIG, api_key=AKIA... and aws_secret_access_key=... both triggered two rules simultaneously and the prefix/suffix bookends leaked on main. Fixed.

CI — all checks green: test, typecheck, quality, CodeQL, Analyze (python). The pre-existing _InputStub failure the author called out was fixed by #929 which is already on main. The branch is clean and mergeable.

No issues. Excellent work.

@muddlebee
Copy link
Copy Markdown
Collaborator

@mayankbharati-ops still trying to understand the context for this change? could you help

@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

@muddlebee — happy to walk through it. The TL;DR is a sensitive-data leak in GuardrailEngine's redaction pass when two rules overlap on the same span — including with the rules we ship in _STARTER_CONFIG. I just re-ran a live repro on main to confirm everything below.

1. The bug, in one screenshot of output

I ran the exact same scenarios through GuardrailEngine.apply on main (current HEAD) and on this branch:

input main (current shipping behavior) this PR
Synthetic data super_secret_token_value end 'data super_[REDACTED:short]_value end' 'data [REDACTED:long] end'
Shipping _STARTER_CONFIG config: api_key=AKIAIOSFODNN7EXAMPLE rest 'config: api_key=[REDACTED:aws_access_key] rest' 'config: [REDACTED:generic_api_token] rest'
3-way transitive abcdefghijklmno tail '[REDACTED:a]h[REDACTED:c] tail' '[REDACTED:a] tail'
Disjoint (regression check) foo middle bar '[REDACTED:a] middle [REDACTED:b]' '[REDACTED:a] middle [REDACTED:b]'
Adjacent (regression check) foobar tail '[REDACTED:a][REDACTED:b] tail' '[REDACTED:a][REDACTED:b] tail'

Row 2 is the worst one: with app/guardrails/cli.py:_STARTER_CONFIG (the rules shipped to every user via opensre guardrails init), the literal substring api_key= survives in any text we redact and forward to an LLM — because aws_access_key (matches AKIA… only, span [16:36]) and generic_api_token (matches api_key=AKIA…, span [8:36]) both fire, and main's loop drops the wider one. Same shape with aws_secret_access_key=AKIA…xxxxx….

2. Why main does this — the algorithm

GuardrailEngine.apply had this redact loop on main (engine.py:127-138):

redact_matches = sorted(matches_with_REDACT_action, key=(start, end), reverse=True)
seen_end = len(text)
for match in redact_matches:
    if match.end > seen_end:        # ← drops the wider match
        continue
    redacted = redacted[:match.start] + replacement + redacted[match.end:]
    seen_end = match.start

It walks right-to-left with a single seen_end cursor and silently skips any match whose end exceeds the cursor. PR #520 fixed this for same-start overlaps (longest-keyword-wins) but matches with different start offsets that fully contain a narrower match still slip through — the wider match's prefix and suffix are never written over.

3. The fix — canonical interval merge

engine.py change replaces the cursor walk with the standard merge-intervals algorithm:

  1. Sort redact matches by (start ASC, -end) so same-start ties put the widest match first (preserves PR fix: overlapping keyword redaction and hardcoded investigation loop limit #520's behavior).
  2. Sweep left-to-right. Merge any match whose start < merged[-1].end. Track the contributing match with the largest original width — that one wins the representative rule_name for the merged span.
  3. Apply replacements right-to-left over the merged intervals so string indices stay valid.

Adjacent matches (A.end == B.start) intentionally do not merge — verified by the test_adjacent_matches_not_merged test. Disjoint matches stay independent. Audit logging is unchanged: every individual ScanMatch still emits its own audit entry; we only collapse the output redactions.

(The merged-span list uses a _MergedSpan: NamedTuple — addressing the Greptile P2 nit on the original commit; this is what d26c631 refactor(guardrails): use NamedTuple for merged-span records does.)

4. Test coverage I added on top

Layer What it pins
tests/test_guardrails/test_engine.py (+10 tests) contained span (the core regression), longest-rule-name on contained, partial overlap, disjoint, adjacent, three-way transitive A∩B∩C, pattern + keyword on the same span, two _STARTER_CONFIG-grounded production-rule tests, audit-under-merge invariant
tests/test_guardrails/test_llm_integration.py (+5 tests) one per call site of engine.apply — that the merged redacted text actually reaches downstream APIs: Anthropic.messages.create body, Anthropic system kwarg, OpenAI.chat.completions.create body, the LangGraph chat node _apply_guardrails_to_messages (also asserts the source list is not mutated — defensive copy invariant), and an end-to-end regression guard with the shipped-config patterns. BedrockLLMClient shares Anthropic's redaction code path so it's covered transitively.

Live verification just now: 91/91 tests in tests/test_guardrails/ pass on this branch. CI on the PR is green: test, typecheck, quality, CodeQL, Analyze (python). Approved by @yashksaini-coder on 2026-04-25 with an independent algorithm walkthrough.

5. Why this is worth merging

The class of bug is sensitive-data disclosure to an LLM provider (Anthropic / OpenAI / Bedrock). Anywhere a Tracer user enables guardrails over their default _STARTER_CONFIG, two rules covering the same secret will leak the prefix/suffix bookends into every outgoing prompt, system message, and chat node. This PR makes the redaction step actually idempotent over overlapping rule sets.

Happy to step through any specific scenario on a call or in a screenshare if any part of the merge algorithm itself is the unclear bit — that's the part most worth eyeballing carefully.

Copy link
Copy Markdown
Collaborator

@muddlebee muddlebee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pls take a look.

Comment thread app/guardrails/engine.py
"""

start: int
end: int
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The width field tracks the original single-match width, not the current merged span's width (end - start). In a chained-overlap scenario (A merges B, then C overlaps the grown span), the rule-name winner is whichever individual source match had the largest original width — not whichever rule covered the widest post-merge range. This is intentional and documented in the class docstring, but the field name width alone is ambiguous. Renaming to source_width (or adding a short note in the docstring: "original match width before any merging") would prevent the next reader from assuming it equals end - start.

assert "IOSFODNN7EXAMPLE" not in content
# Representative rule is the wider one (generic_api_token).
assert "[REDACTED:generic_api_token]" in content
assert "[REDACTED:aws_access_key]" not in content
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _FakeMessages and _FakeClient inner classes are duplicated verbatim in all five tests in this class. If the fake's interface ever changes (e.g., to return a streaming response type) all five copies need updating in sync — a common source of drift. A shared @pytest.fixture returning a (client, captured) tuple would eliminate the duplication and keep this test class under ~150 lines.

assert "api_key=" not in captured["system"]
assert "AKIA" not in captured["system"]
assert "[REDACTED:generic_api_token]" in captured["system"]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment in this test admits the system-prompt injection path is awkward (passing a {"role": "system", ...} entry mixed into the messages list). The assertion assert captured.get("system") is not None passes even if LLMClient embeds the system role inside captured["messages"] rather than hoisting it to a separate system kwarg — the assertion would silently pass with None returned by .get() if the key is absent. Consider assert captured["system"] != "" (explicit KeyError on absence) or — better — test the system-prompt path through a purpose-built helper or by inspecting captured["messages"] when the Anthropic client doesn't separate system from user messages.

… naming, fixture dedupe, system-prompt assertion

Three points from @muddlebee's 2026-04-28 review on PR Tracer-Cloud#780:

1. ``_MergedSpan.width`` was ambiguous — it tracks the *original single
   ScanMatch* width used for representative-rule selection, not the
   merged span's ``end - start``. Renamed to ``source_width`` and
   expanded both the NamedTuple docstring and the ``_redact`` algorithm
   docstring to spell out the distinction (``A∩B then C`` chained-
   overlap example) so a future reader can't conflate the two.

2. ``_FakeMessages`` / ``_FakeClient`` and the OpenAI equivalents were
   duplicated verbatim across four tests in
   ``TestOverlappingRedactionReachesDownstream``. Extracted them into
   two module-level pytest fixtures, ``anthropic_capture`` and
   ``openai_capture``, each yielding ``(client, captured)``. Each test
   now reads as one line of setup + the actual assertion. The two
   shared fake-response factories ``_anthropic_fake_response`` and
   ``_openai_fake_response`` keep the response-shape literals in one
   place so a future Anthropic / OpenAI SDK change touches one
   location instead of four.

3. ``test_anthropic_system_prompt_also_redacted`` used
   ``captured.get("system") is not None`` which was both unclear in
   intent and silent on the more interesting failure mode (system
   role smuggled into the messages list rather than hoisted to the
   top-level ``system`` kwarg). Replaced with subscript (KeyError
   on absent key), a non-empty assertion, and a structural check
   that no entry in ``captured["messages"]`` carries
   ``role: system``. Fail mode is now obvious in pytest output.

Verification:
- ``pytest tests/test_guardrails/`` — 91 / 91 pass
- Full unit suite — 3159 pass / 2 skipped / 1 xfailed
- ``ruff check`` and ``ruff format --check`` clean on both files
- ``mypy app/guardrails/`` clean (the 5 errors are pre-existing
  missing-stub warnings in ``app/integrations/`` unrelated to this PR)
@mayankbharati-ops
Copy link
Copy Markdown
Contributor Author

@muddlebee — pushed 5f1b3b6 addressing all three review nits. Detail per comment:

1. app/guardrails/engine.py:42width field name ambiguity

Renamed _MergedSpan.widthsource_width and expanded both the NamedTuple docstring and the _redact algorithm comment to spell out the distinction explicitly. The docstring now reads:

source_width is the width of the single ScanMatch that contributed the current rule_name — i.e. that match's end - start before any merging happened. It is not the merged span's width (end - start of the _MergedSpan); in a chained-overlap scenario where A merges B and then C overlaps the grown span, the rule-name winner is whichever individual source match was the widest, not whichever rule covered the widest post-merge range.

The inline comment in _redact also flags this on the line where source_width is computed, so anyone editing the loop body sees the distinction in context. Good catch — width alone was genuinely ambiguous.

2. tests/test_guardrails/test_llm_integration.py:340_FakeMessages / _FakeClient duplication

Right — the four ad-hoc copies were a drift hazard. Extracted into two module-level pytest fixtures:

@pytest.fixture
def anthropic_capture(monkeypatch) -> tuple[Any, dict[str, Any]]:
    """Patch LLMClient (Anthropic surface) to capture messages.create kwargs."""
    ...
    return client, captured

@pytest.fixture
def openai_capture(monkeypatch) -> tuple[Any, dict[str, Any]]:
    """Patch OpenAILLMClient to capture chat.completions.create kwargs."""
    ...
    return client, captured

Each test in TestOverlappingRedactionReachesDownstream is now ~5–10 lines of actual test logic instead of 25+ of mostly-identical fake-class scaffolding. The two response-shape literals (_anthropic_fake_response / _openai_fake_response) are also factored out, so a future SDK change touches one location instead of four.

3. tests/test_guardrails/test_llm_integration.py:382 — system-prompt assertion

Tightened to:

# Subscript (not ``.get``) so a missing ``system`` kwarg surfaces a
# KeyError immediately — the whole point of the test is that the
# client hoisted the system entry out of the messages list.
system_kwarg = captured["system"]
assert system_kwarg, "system kwarg must be non-empty after redaction"
assert "api_key=" not in system_kwarg
assert "[REDACTED:generic_api_token]" in system_kwarg
# System must be a separate kwarg, not smuggled into ``messages``.
for msg in captured.get("messages", []):
    assert msg.get("role") != "system", (
        f"system role leaked into messages list: {msg!r}"
    )

KeyError on a missing system kwarg is now louder than the old .get(...) is not None (which produced an unhelpful AssertionError if the kwarg was absent), and the new structural assertion catches the failure mode you flagged — Anthropic client failing to hoist role: system out of the messages list. The test now actually exercises both "the system kwarg exists and was redacted" and "the system role was not smuggled into messages."

Verification

  • pytest tests/test_guardrails/: 91 / 91 pass
  • Full unit suite: 3159 pass / 2 skipped / 1 xfailed
  • ruff check and ruff format --check clean on both files
  • mypy app/guardrails/ clean (the 5 errors mypy reports are pre-existing missing-stub warnings in app/integrations/ — unrelated)
  • CI on 5f1b3b6: all 5 active checks SUCCESS (quality, Analyze (python), typecheck, test, CodeQL)

Ready for another look.

@muddlebee
Copy link
Copy Markdown
Collaborator

Looks good from me. I will wait for additional reviews..

@muddlebee
Copy link
Copy Markdown
Collaborator

@VaibhavUpreti

@muddlebee muddlebee merged commit c60125c into Tracer-Cloud:main May 1, 2026
7 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 1, 2026

🏄 Some PRs rot in review for six weeks. @mayankbharati-ops's said "not today" and merged like it owned the place. 🌊


👋 Join us on Discord - OpenSRE : hang out, contribute, or hunt for features and issues. Everyone's welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants