-
Notifications
You must be signed in to change notification settings - Fork 0
Garbage Filter
The garbage filter is a 95-pattern compiled regex engine that identifies and removes terminal rendering noise from session mirror output. It's the single source of truth for what constitutes "garbage" — shared between live mirror filtering (unleashed-c-21.py) and post-session transcript cleaning (clean_transcript.py).
Claude Code is built with Ink, a React-like terminal framework. Ink doesn't append text — it maintains a virtual screen buffer and repaints regions. After ANSI stripping, the residual artifacts include:
-
Spinner ticks:
⠋,⠙,⠹,⠸,⠼,⠴,⠦,⠧,⠇,⠏— hundreds per minute -
Timing fragments:
3m 42s,42s,8.2s,───(progress bars) -
Status bars:
Plan mode,Auto-approval on,Read,Write -
Agent trees:
├── Agent: task description,│ └── reading file -
Permission UI:
Allow this action?,> Yes No,Esc to cancel -
Checklist artifacts:
☐,☑,✓,✗ -
CLI help text:
Usage: claude,Commands:,Options: - Garbled merges: Characters from different rendering frames that overlap
The 95 patterns in transcript_filters.py are organized into these categories:
| Category | Patterns | Examples |
|---|---|---|
| Timing fragments | 8 |
^\d+m\s?\d+s$, ^\d+\.\ds$, ^─+$
|
| Status-bar words | 12 |
^Read$, ^Write$, ^Plan mode$, ^Auto-approval
|
| Spinner characters | 6 | Braille dots, arrows, Unicode spinners |
| Permission UI | 10 |
Allow this action, Esc to cancel, Yes.*No
|
| Agent tree lines | 8 |
^[├│└─], Agent:, reading, writing
|
| Checklist artifacts | 5 |
^[☐☑✓✗], ^\[[ x]\]
|
| CLI help fragments | 7 |
^Usage:, ^Commands:, ^Options:
|
| Short garbage | 15 | Lines < 3 chars, single Unicode symbols, stray brackets |
| Garbled text | 12 | Repeated single characters, common merge artifacts |
| Tool labels | 12 |
^Bash$, ^Read$, ^Glob$ (bare tool names from rendering) |
Each pattern follows a development cycle:
- Observe: Run a session, review the mirror transcript
- Identify: Find repeated noise patterns that aren't real content
- Regex: Write a pattern that matches the noise without catching content
- Test: Run against saved transcripts to check for false positives
-
Ship: Add to
transcript_filters.py
The trickiest patterns are those that overlap with real content. For example:
-
Readis both a status bar label AND a real word in English -
─(box-drawing) is garbage in status bars BUT legitimate in tree output (#37) - Short lines (1-2 characters) are usually spinner artifacts BUT sometimes real output
| Pattern | Garbage (correct) | Real Content (false positive) |
|---|---|---|
^[─│├└┘┐┌]+$ |
Status bar fragments | Tree output, box-drawing tables (#37) |
^.{1,2}$ |
Spinner characters | Single-char answers, short outputs |
ORPHAN_OSC_RE |
Unterminated OSC sequences | Lines starting with characters that match OSC prefix (#36) |
- Identify the garbage in a mirror transcript
- Write a regex that matches it
- Add to
transcript_filters.pyin the appropriate category section - Test against 3-5 saved transcripts:
# Quick validation — search for false positives poetry run python -c " import re pattern = re.compile(r'your_new_pattern') with open('data/transcript.clean') as f: for i, line in enumerate(f, 1): if pattern.search(line.strip()): print(f'{i}: {line.rstrip()}') "
- If no false positives, commit
Garbage filtering is fundamentally an arms race. Every Claude Code update may introduce new rendering patterns. Every Ink version may change how repaints work. The 95 patterns represent 4+ weeks of daily observation and iteration.
The filter will never be complete. The goal is to catch enough garbage that the mirror is useful — not to produce a perfect transcript.
| File | Role |
|---|---|
src/transcript_filters.py |
95 compiled patterns (single source of truth) |
src/clean_transcript.py |
Post-session cleaner (uses same patterns + block dedup) |
src/unleashed-c-21.py |
Live mirror pipeline (uses patterns during session) |
Architecture
Safety & Security
Session Mirror
Reference