fix: overlapping keyword redaction and hardcoded investigation loop limit by mayankbharati-ops · Pull Request #520 · Tracer-Cloud/opensre

mayankbharati-ops · 2026-04-09T20:57:21Z

Summary

Guardrails engine: Overlapping keywords at the same start position (e.g. "secret" and "secret_key") could leave partial text unredacted (e.g. "[REDACTED:r1]_key=xyz" instead of "[REDACTED:r2]=xyz"). Fixed by adding match end position to the sort key so the longest match is always processed first.
Root-cause diagnosis node: Hardcoded loop_count < 3 prevented investigation recommendations from being generated on loops 3–4, even though MAX_INVESTIGATION_LOOPS = 4 allows up to 4 iterations. Extracted the constant to app/pipeline/constants.py (avoiding circular imports) and replaced both hardcoded checks.

Test plan

Added test_overlapping_keyword_redaction_prefers_longest — verifies longest keyword wins across rules
Added test_overlapping_keyword_same_rule_redaction — verifies longest keyword wins within a single rule
All 36 guardrails engine tests pass (pytest tests/test_guardrails/test_engine.py)
All 93 non-dependency-related tests pass (pytest tests/test_guardrails/ tests/tools/test_compaction.py)
Ruff lint passes on all changed files
Full CI (make test-cov, make lint, make typecheck) on environment with all deps installed

greptile-apps · 2026-04-09T21:01:45Z

Greptile Summary

This PR fixes two independent bugs: (1) the guardrails redaction engine could leave partial keywords unredacted when two keywords shared the same start position (e.g. "secret" and "secret_key") because the sort was only by m.start — fixed by adding m.end as a secondary sort key so the longest match always wins; (2) the root-cause diagnosis node hardcoded loop_count < 3 instead of using MAX_INVESTIGATION_LOOPS, silently suppressing investigation recommendations on loop 3. Both fixes are backed by targeted unit tests and an end-to-end lifecycle test in tests/test_bug_fixes_e2e.py.

Confidence Score: 5/5

Safe to merge — both fixes are correct, well-tested, and narrowly scoped.

Both bugs are clearly diagnosed and properly fixed. The sort key change in the redaction engine is minimal and provably correct. The constants extraction cleanly solves the circular-import problem. All remaining findings are P2 style/clarity notes that don't affect correctness or reliability.

No files require special attention; the two P2 comments on app/pipeline/routing.py and app/nodes/root_cause_diagnosis/node.py are purely advisory.

Important Files Changed

Filename	Overview
app/guardrails/engine.py	Sort key extended to `(m.start, m.end)` so the longest match at any given start position is processed first, preventing partial-keyword leakage. Logic and tests are correct.
app/investigation_constants.py	New thin module exporting `MAX_INVESTIGATION_LOOPS = 4` to break the circular import between `app.pipeline` and `app.nodes`. Clean and minimal.
app/nodes/root_cause_diagnosis/node.py	Hardcoded `< 3` replaced with `< MAX_INVESTIGATION_LOOPS`. Minor: `_handle_insufficient_evidence` unconditionally increments loop count even when no recommendations are generated, inconsistent with `diagnose_root_cause`.
app/pipeline/routing.py	Now imports `MAX_INVESTIGATION_LOOPS` from the new constants module. Uses `>` boundary (allows loop_count == MAX to continue) while the node uses `<` boundary; functionally correct but worth documenting.
tests/test_guardrails/test_engine.py	Two new tests added — `test_overlapping_keyword_redaction_prefers_longest` and `test_overlapping_keyword_same_rule_redaction` — directly exercise the fixed sort key.
tests/test_bug_fixes_e2e.py	New end-to-end test file covering both fixes: overlapping keyword redaction (5 scenarios) and investigation loop lifecycle (6 scenarios including the previously broken loop_count=3 case).

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[scan: collect all keyword + pattern matches] --> B[sort by start DESC, end DESC]
    B --> C{same start position?}
    C -->|yes| D[longest match end processed first]
    C -->|no| E[right-to-left order]
    D --> F[apply replacement, set seen_end = match.start]
    E --> F
    F --> G{next match.end > seen_end?}
    G -->|yes| H[skip - overlapping region already redacted]
    G -->|no| F

    subgraph Investigation Loop
        I[diagnose_root_cause node] -->|loop_count < MAX_INVESTIGATION_LOOPS| J[generate recommendations]
        I -->|loop_count >= MAX_INVESTIGATION_LOOPS| K[no recommendations]
        J --> L{routing: loop_count > MAX?}
        K --> L
        L -->|no + has recs| M[investigate]
        L -->|yes or no recs| N[publish]
        M --> I
    end

Prompt To Fix All With AI

This is a comment left during a code review.
Path: app/pipeline/routing.py
Line: 60-61

Comment:
**Routing boundary uses `>` while diagnosis node uses `<` — easy to mis-read**

The stop condition here (`loop_count > MAX_INVESTIGATION_LOOPS`) allows `loop_count == 4` to pass through and continue investigating if recommendations exist. In practice that never happens because both node paths generate recommendations only when `loop_count < MAX_INVESTIGATION_LOOPS` (i.e., 0–3). But the asymmetry means a future change to the diagnosis conditions could silently add an extra iteration without routing ever enforcing the ceiling. A brief inline comment tying the two together would prevent confusion:

```suggestion
        if loop_count > MAX_INVESTIGATION_LOOPS:
            # Safety net: node paths gate recommendation generation on
            # loop_count < MAX_INVESTIGATION_LOOPS, so this branch is
            # only reached if a new code path forgets that check.
            debug_print(f"Max loops ({MAX_INVESTIGATION_LOOPS}) exceeded -> publish")
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: app/nodes/root_cause_diagnosis/node.py
Line: 170

Comment:
**Loop count always increments in `_handle_insufficient_evidence`, unlike `diagnose_root_cause`**

`_handle_insufficient_evidence` bumps `next_loop_count` unconditionally, even when no recommendations are generated (e.g. `grafana_service_names` is absent). The sibling path `diagnose_root_cause` only increments when recommendations exist (`loop_count + 1 if recommendations else loop_count`). The routing publishes immediately when recommendations are empty so the stale loop count never re-enters the node, but the inconsistency means the returned state carries an inflated counter that could confuse observability tooling (e.g. LangSmith traces showing `loop_count=1` after an instant publish). Consider aligning them:

```suggestion
        next_loop_count = loop_count + 1 if recommendations else loop_count
```

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (5): Last reviewed commit: "merge: resolve PR #520 conflicts with ma..." | Re-trigger Greptile}

…imit Two bugs fixed: 1. Guardrails engine: overlapping keywords at the same start position (e.g. "secret" and "secret_key") could leave partial text unredacted. The sort key now includes match end position so the longest match is always processed first. 2. Root-cause diagnosis node: hardcoded `loop_count < 3` prevented recommendations from being generated on loops 3-4, even though MAX_INVESTIGATION_LOOPS=4 allows up to 4 iterations. Replaced the magic number with the shared constant.

Avoids circular import: app.pipeline.__init__ → graph → app.nodes → diagnosis node → app.pipeline.constants would fail because app.nodes is still partially initialized when graph.py tries to import it. Placing the constant in app/investigation_constants.py (outside both packages) breaks the cycle. Also fixes import ordering (ruff I001).

…fixes - 5 guardrails tests: overlapping keywords across rules, within rules, multiple occurrences, pattern+keyword overlap, full scan-apply cycle - 6 investigation loop tests: constant sharing, routing at all valid loop counts, diagnosis node at loop_count=3 (the previously broken case), full loop lifecycle from 0 to termination

The old max_investigation_loops: 3 was calibrated against the buggy hardcoded `loop_count < 3`. Now that the diagnosis node uses MAX_INVESTIGATION_LOOPS (4), investigations can legitimately run 4 loops. Updated all 14 answer.yml files accordingly.

davincios · 2026-04-16T21:47:27Z

Approved workflow to run :)

Bring the overlapping-redaction fix branch forward onto main and keep the review-approved test cleanup so the PR is mergeable again. Made-with: Cursor

davincios · 2026-04-19T10:57:15Z

Thank you very much for your valuable contribution.

…ive bookends ``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a single ``seen_end`` cursor and skipped any match whose end exceeded the cursor. When a wider match overlapped or contained an already-redacted narrower match, the wider match was silently dropped — leaving its prefix and suffix in the output. Concretely, with rules matching ``super_secret_token_value`` (wide) and ``secret_token`` (contained): input: "data super_secret_token_value end" before: "data super_[REDACTED:short]_value end" ← super_ and _value leak after: "data [REDACTED:long] end" The prior fix in Tracer-Cloud#520 only handled the same-start subcase by changing the secondary sort key; matches that started at different offsets but fully contained a narrower match still slipped through. Replace the single-cursor walk with a proper merge: 1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset keep the longest-keyword-wins behavior Tracer-Cloud#520 restored. 2. Sweep left-to-right, merging any match whose ``start`` falls before the current interval's ``end``. The representative rule for the merged interval is whichever contributing match had the largest original width, so the output reflects the most-specific rule that matched anywhere in the merged span. 3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output. Adjacent matches (``A.end == B.start``) are intentionally not merged — they don't overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Tests added in ``tests/test_guardrails/test_engine.py``: - contained-span union redaction (the regression from the bug report) - longest-rule-name semantics on a contained span - partial (non-containing) overlap produces one union redaction - disjoint matches stay separate, middle text preserved - adjacent matches stay separate (touching at one offset, not overlapping) - three-way transitive overlap chain collapses to one span - regex pattern + keyword on the same region: wider-wins applies across match kinds Existing coverage preserved: - ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case - ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap - ``test_multiple_rules_on_same_span`` — audit + redact on same span Verification: - ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module - Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail

@muddlebee

…ive bookends (#780) * fix(guardrails): redact union of overlapping spans, never leak sensitive bookends ``GuardrailEngine.apply`` walked the sorted redact matches right-to-left with a single ``seen_end`` cursor and skipped any match whose end exceeded the cursor. When a wider match overlapped or contained an already-redacted narrower match, the wider match was silently dropped — leaving its prefix and suffix in the output. Concretely, with rules matching ``super_secret_token_value`` (wide) and ``secret_token`` (contained): input: "data super_secret_token_value end" before: "data super_[REDACTED:short]_value end" ← super_ and _value leak after: "data [REDACTED:long] end" The prior fix in #520 only handled the same-start subcase by changing the secondary sort key; matches that started at different offsets but fully contained a narrower match still slipped through. Replace the single-cursor walk with a proper merge: 1. Sort redact matches by ``(start ASC, -end)`` so ties at the same offset keep the longest-keyword-wins behavior #520 restored. 2. Sweep left-to-right, merging any match whose ``start`` falls before the current interval's ``end``. The representative rule for the merged interval is whichever contributing match had the largest original width, so the output reflects the most-specific rule that matched anywhere in the merged span. 3. Apply replacements right-to-left over the merged intervals so string indices stay valid as each redaction resizes the output. Adjacent matches (``A.end == B.start``) are intentionally not merged — they don't overlap. Disjoint matches produce independent redactions with intervening text preserved verbatim. Tests added in ``tests/test_guardrails/test_engine.py``: - contained-span union redaction (the regression from the bug report) - longest-rule-name semantics on a contained span - partial (non-containing) overlap produces one union redaction - disjoint matches stay separate, middle text preserved - adjacent matches stay separate (touching at one offset, not overlapping) - three-way transitive overlap chain collapses to one span - regex pattern + keyword on the same region: wider-wins applies across match kinds Existing coverage preserved: - ``test_overlapping_keyword_redaction_prefers_longest`` — same-start case - ``test_overlapping_keyword_same_rule_redaction`` — same-rule overlap - ``test_multiple_rules_on_same_span`` — audit + redact on same span Verification: - ``pytest``: 2774 pass, 1 skipped, 0 failures (was 2767 on main; +7 new) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module - Full ``tests/test_guardrails/test_engine.py`` suite: 43 pass, 0 fail * test(guardrails): cover production-rule overlaps and audit-under-merge Extend the overlap regression tests with two cases grounded in the exact patterns shipped in _STARTER_CONFIG (app/guardrails/cli.py), and with an audit-behavior assertion that demonstrates the match-level audit record remains complete even when the output-level redactions are merged: - test_real_world_api_key_and_aws_access_key_overlap: config: api_key=AKIAIOSFODNN7EXAMPLE aws_access_key pattern matches the [16:36] span; generic_api_token pattern matches the [8:36] span (contains aws_access_key). Pre-fix output leaked "api_key=". Merged output: "config: [REDACTED:generic_api_token]". - test_real_world_aws_secret_key_contains_aws_access_key: export aws_secret_access_key=AKIA<16>abcdefghijklmnopqrst aws_access_key matches [29:49]; aws_secret_key matches [7:69] and contains it. Pre-fix output leaked the label and the 40-char tail. Merged output: "export [REDACTED:aws_secret_key]". - test_audit_logger_records_every_match_even_when_merged: Confirms the AuditLogger still emits one entry per ScanMatch even when multiple matches merge into a single output redaction — a reviewer tracing audit -> output can still account for every rule that fired. Uses a real AuditLogger with a tmp_path JSONL. Verification: - pytest tests/test_guardrails/test_engine.py: 46 pass (was 43; +3) - pytest tests/: 2777 pass, 1 skipped, 0 failures - ruff check + format: clean * test(guardrails): assert merged redaction reaches every LLM call site Closes the gap between unit-level ``GuardrailEngine.apply`` tests and the four call sites that invoke it in the pipeline: - ``app/services/llm_client.py::LLMClient.invoke`` (Anthropic messages + system) - ``app/services/llm_client.py::BedrockLLMClient.invoke`` (same shape as Anthropic) - ``app/services/llm_client.py::OpenAILLMClient.invoke`` (OpenAI chat.completions) - ``app/nodes/chat.py::_apply_guardrails_to_messages`` (LangGraph chat node) Five new parametrized integration tests load a ruleset modeled on the shipped ``_STARTER_CONFIG`` — ``aws_access_key`` is a strict substring of ``generic_api_token`` when the token value is itself an AWS key — and assert that the downstream payload (captured by a stub that replaces the real API client) contains no fragment of either the label prefix or the key value: - ``test_anthropic_client_sends_merged_redaction``: ``LLMClient.invoke("Debug dump: api_key=AKIA... from config.yml")`` is captured at ``Anthropic.messages.create`` and asserted to carry a single ``[REDACTED:generic_api_token]`` span with no ``api_key=`` or ``AKIA`` fragment. - ``test_anthropic_system_prompt_also_redacted``: Same shape, but asserts the ``system`` kwarg (distinct code path from per-message ``content``) gets the merged treatment. - ``test_openai_client_sends_merged_redaction``: Same shape for ``OpenAILLMClient.invoke`` -> ``chat.completions.create``. - ``test_chat_node_emits_merged_redaction``: ``_apply_guardrails_to_messages`` on a LangGraph-style message list, asserts original is untouched (defensive copy) and the returned message has merged redaction. - ``test_contained_real_secret_fully_redacted_in_pipeline``: End-to-end regression guard. Asserts none of ``api_key=``, ``AKIA``, ``IOSFODNN``, ``7EXAMPLE`` appear anywhere in the downstream payload. Pre-fix main leaks ``api_key=`` and the surrounding characters. Verification: - ``pytest tests/``: 2782 pass (was 2777; +5 integration) - ``pytest tests/test_guardrails/``: 59 pass (46 engine + 13 integration) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module * refactor(guardrails): use NamedTuple for merged-span records Address Greptile P2 readability suggestion on PR #780. The interval-merge loop in ``GuardrailEngine._redact`` previously read and wrote the merged list as raw 4-tuples accessed by positional index (``merged[-1][1]``, ``merged[-1][3]``, etc.), which obscured what each slot meant. Replace the raw tuple with a typed ``_MergedSpan`` NamedTuple carrying named fields (``start``, ``end``, ``rule_name``, ``width``) so the loop reads naturally: merged[-1].end # was merged[-1][1] prev.width # was prev_width _MergedSpan(start, ...) # was (start, ..., width) No behavior change — same algorithm, same outputs, same 91 tests pass. Verification: - ``pytest tests/test_guardrails/``: 91 pass, 0 fail (unchanged) - ``ruff check`` / ``ruff format --check``: clean - ``mypy app/guardrails/``: clean on touched module * refactor(guardrails): address PR #780 review nits — width naming, fixture dedupe, system-prompt assertion Three points from @muddlebee's 2026-04-28 review on PR #780: 1. ``_MergedSpan.width`` was ambiguous — it tracks the *original single ScanMatch* width used for representative-rule selection, not the merged span's ``end - start``. Renamed to ``source_width`` and expanded both the NamedTuple docstring and the ``_redact`` algorithm docstring to spell out the distinction (``A∩B then C`` chained- overlap example) so a future reader can't conflate the two. 2. ``_FakeMessages`` / ``_FakeClient`` and the OpenAI equivalents were duplicated verbatim across four tests in ``TestOverlappingRedactionReachesDownstream``. Extracted them into two module-level pytest fixtures, ``anthropic_capture`` and ``openai_capture``, each yielding ``(client, captured)``. Each test now reads as one line of setup + the actual assertion. The two shared fake-response factories ``_anthropic_fake_response`` and ``_openai_fake_response`` keep the response-shape literals in one place so a future Anthropic / OpenAI SDK change touches one location instead of four. 3. ``test_anthropic_system_prompt_also_redacted`` used ``captured.get("system") is not None`` which was both unclear in intent and silent on the more interesting failure mode (system role smuggled into the messages list rather than hoisted to the top-level ``system`` kwarg). Replaced with subscript (KeyError on absent key), a non-empty assertion, and a structural check that no entry in ``captured["messages"]`` carries ``role: system``. Fail mode is now obvious in pytest output. Verification: - ``pytest tests/test_guardrails/`` — 91 / 91 pass - Full unit suite — 3159 pass / 2 skipped / 1 xfailed - ``ruff check`` and ``ruff format --check`` clean on both files - ``mypy app/guardrails/`` clean (the 5 errors are pre-existing missing-stub warnings in ``app/integrations/`` unrelated to this PR)

mayankbharati-ops added 2 commits April 10, 2026 02:35

mayankbharati-ops force-pushed the fix/overlapping-redaction-and-loop-limit branch from a4abcd5 to fe8eec8 Compare April 9, 2026 21:05

mayankbharati-ops added 2 commits April 10, 2026 02:42

Devesh36 reviewed Apr 11, 2026

View reviewed changes

Comment thread app/guardrails/engine.py

Comment thread tests/test_bug_fixes_e2e.py Outdated

github-advanced-security AI found potential problems Apr 16, 2026

View reviewed changes

Comment thread tests/test_bug_fixes_e2e.py Fixed

merge: resolve PR Tracer-Cloud#520 conflicts with main

94953e5

Bring the overlapping-redaction fix branch forward onto main and keep the review-approved test cleanup so the PR is mergeable again. Made-with: Cursor

davincios merged commit 7aaa851 into Tracer-Cloud:main Apr 19, 2026
11 checks passed

mayankbharati-ops mentioned this pull request Apr 23, 2026

fix(guardrails): redact union of overlapping spans, never leak sensitive bookends #780

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: overlapping keyword redaction and hardcoded investigation loop limit#520

fix: overlapping keyword redaction and hardcoded investigation loop limit#520
davincios merged 5 commits intoTracer-Cloud:mainfrom
mayankbharati-ops:fix/overlapping-redaction-and-loop-limit

mayankbharati-ops commented Apr 9, 2026 •

edited

Loading

Uh oh!

greptile-apps Bot commented Apr 9, 2026 •

edited

Loading

Flowchart

Uh oh!

Uh oh!

Uh oh!

davincios commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

davincios commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

mayankbharati-ops commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

greptile-apps Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

davincios commented Apr 16, 2026

Uh oh!

Uh oh!

Uh oh!

davincios commented Apr 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mayankbharati-ops commented Apr 9, 2026 •

edited

Loading

greptile-apps Bot commented Apr 9, 2026 •

edited

Loading