Skip to content

Garbage Filter

Marty McEnroe edited this page Feb 14, 2026 · 1 revision

Garbage Filter

The garbage filter is a 95-pattern compiled regex engine that identifies and removes terminal rendering noise from session mirror output. It's the single source of truth for what constitutes "garbage" — shared between live mirror filtering (unleashed-c-21.py) and post-session transcript cleaning (clean_transcript.py).

What Is Terminal Garbage?

Claude Code is built with Ink, a React-like terminal framework. Ink doesn't append text — it maintains a virtual screen buffer and repaints regions. After ANSI stripping, the residual artifacts include:

  • Spinner ticks: , , , , , , , , , — hundreds per minute
  • Timing fragments: 3m 42s, 42s, 8.2s, ─── (progress bars)
  • Status bars: Plan mode, Auto-approval on, Read, Write
  • Agent trees: ├── Agent: task description, │ └── reading file
  • Permission UI: Allow this action?, > Yes No, Esc to cancel
  • Checklist artifacts: , , ,
  • CLI help text: Usage: claude, Commands:, Options:
  • Garbled merges: Characters from different rendering frames that overlap

Pattern Categories

The 95 patterns in transcript_filters.py are organized into these categories:

Category Patterns Examples
Timing fragments 8 ^\d+m\s?\d+s$, ^\d+\.\ds$, ^─+$
Status-bar words 12 ^Read$, ^Write$, ^Plan mode$, ^Auto-approval
Spinner characters 6 Braille dots, arrows, Unicode spinners
Permission UI 10 Allow this action, Esc to cancel, Yes.*No
Agent tree lines 8 ^[├│└─], Agent:, reading, writing
Checklist artifacts 5 ^[☐☑✓✗], ^\[[ x]\]
CLI help fragments 7 ^Usage:, ^Commands:, ^Options:
Short garbage 15 Lines < 3 chars, single Unicode symbols, stray brackets
Garbled text 12 Repeated single characters, common merge artifacts
Tool labels 12 ^Bash$, ^Read$, ^Glob$ (bare tool names from rendering)

How Patterns Are Developed

Each pattern follows a development cycle:

  1. Observe: Run a session, review the mirror transcript
  2. Identify: Find repeated noise patterns that aren't real content
  3. Regex: Write a pattern that matches the noise without catching content
  4. Test: Run against saved transcripts to check for false positives
  5. Ship: Add to transcript_filters.py

The trickiest patterns are those that overlap with real content. For example:

  • Read is both a status bar label AND a real word in English
  • (box-drawing) is garbage in status bars BUT legitimate in tree output (#37)
  • Short lines (1-2 characters) are usually spinner artifacts BUT sometimes real output

False Positive Examples

Pattern Garbage (correct) Real Content (false positive)
^[─│├└┘┐┌]+$ Status bar fragments Tree output, box-drawing tables (#37)
^.{1,2}$ Spinner characters Single-char answers, short outputs
ORPHAN_OSC_RE Unterminated OSC sequences Lines starting with characters that match OSC prefix (#36)

How to Add a New Pattern

  1. Identify the garbage in a mirror transcript
  2. Write a regex that matches it
  3. Add to transcript_filters.py in the appropriate category section
  4. Test against 3-5 saved transcripts:
    # Quick validation — search for false positives
    poetry run python -c "
    import re
    pattern = re.compile(r'your_new_pattern')
    with open('data/transcript.clean') as f:
        for i, line in enumerate(f, 1):
            if pattern.search(line.strip()):
                print(f'{i}: {line.rstrip()}')
    "
  5. If no false positives, commit

The Arms Race

Garbage filtering is fundamentally an arms race. Every Claude Code update may introduce new rendering patterns. Every Ink version may change how repaints work. The 95 patterns represent 4+ weeks of daily observation and iteration.

The filter will never be complete. The goal is to catch enough garbage that the mirror is useful — not to produce a perfect transcript.

Related Files

File Role
src/transcript_filters.py 95 compiled patterns (single source of truth)
src/clean_transcript.py Post-session cleaner (uses same patterns + block dedup)
src/unleashed-c-21.py Live mirror pipeline (uses patterns during session)

Clone this wiki locally