-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Problem
The session mirror (Tab 2, tail -f of logs/session-*.log) produces noisy, barely-readable output. Two distinct problems:
1. Garbage lines not filtered
The mirror's _log_to_mirror() has only 4 noise patterns in NOISE_RE plus a 12-char minimum length filter. Meanwhile clean_transcript.py has 95 garbage patterns that effectively remove TUI artifacts post-hoc. The mirror lets through all of the following:
| Garbage type | Example from live session |
|---|---|
| Thinking fragments | g (thinking), n (thinking), ✶ s n (thinking) |
| Spinner activity lines | Harmonizing… 1, ✢Harmonizing… 2, *Harmonizing… 5 |
| Spinner activity with timing | Harmonizing…(30s · ↓2.2k tokens) |
| Token/timing fragments | 1 · 6.7k tokens, 2 s · 6.7k tokens, 15.1k tokens |
| Permission UI chrome | Esctocancel·Tabtoamend·ctrl+etoexplain |
| "Do you want to proceed?" UI | Doyouwanttoproceed? |
| Permission option repaints | ❯1.Yes, 2.Yes,allowreadingfromProjects\fromthisproject |
| Agent tree lines | ├─ Explore (Read core source files) · 10 tool uses |
| Agent initializing | ⎿ Initializing… |
| "ctrl+b to run in background" | ctrl+b to run in background |
| "ctrl+o to expand" | +18 more tool uses (ctrl+o to expand) |
| Running N agents | Running 2 gents…(ctrl+o to expand) |
| Status bar timestamps | [02-1401:41:44] /c/Users/mcwiz/Projects/Hermes (main) |
| File count lines | 2 s, reading 19 files… |
| Bash command labels | Bash command |
| Bare "thought for Ns" | (thought for 2s) |
2. Word-merging and letter drops
The mirror_strip_ansi() cursor-tracking parser inserts spaces only when the cursor jumps past the current column position. This works for simple cases like:
\x1b[7;1HIt\x1b[7;4His → "It" + gap(3→4) + "is" → "It is" ✓
But the Ink TUI renders many words character-by-character at adjacent columns with no gap:
\x1b[5;20HI\x1b[5;21H'\x1b[5;22Hl\x1b[5;23Hl\x1b[5;24Hr\x1b[5;25He\x1b[5;26Ha\x1b[5;27Hd
→ No column gaps → "I'llread" (correct per parser, wrong per English)
The parser cannot distinguish "adjacent chars in the same word" from "adjacent chars at the start of a new word" because the TUI uses the same positioning pattern for both.
Examples from live session:
| Mirror output | Should be |
|---|---|
I'llreadthecodebase,issues,andwikiinparallel. |
I'll read the codebase, issues, and wiki in parallel. |
Doyouwanttoproceed? |
Do you want to proceed? |
ls-la/c/Users/mcwiz/Projects/HermesWiki/ |
ls -la /c/Users/mcwiz/Projects/HermesWiki/ |
2>/dev/null||echo"Nolocalwikiclone" |
2>/dev/null || echo "No local wiki clone" |
2 gents |
2 agents (dropped letter) |
loal |
local (dropped letter) |
Lst |
List (dropped letter) |
one-l esummris |
one-line summaries (dropped letters + wrong space) |
un acked |
untracked (dropped letters + wrong space) |
remoebranches |
remote branches (dropped letters + merged) |
Sow recent session logs |
Show recent session logs (dropped letter) |
Letter drops happen when the TUI repaints a line and the PTY read lands mid-repaint — some characters from the old render and some from the new render get interleaved, losing characters at the boundary.
Current filtering (insufficient)
# _log_to_mirror() filters:
NOISE_RE = [
re.compile(r'^\s*$'), # blank lines
re.compile(r'^[\u2500\u2550]{10,}$'), # ─ or ═ horizontal rules
re.compile(r'^\xb7\s+\S+ing'), # · spinner lines (middle dot only)
re.compile(r'^\s*\d+ files? '), # file count status
]
# Plus: 12-char minimum, separator filter, timestamp fragment filter, 32-line dedupPlan: B + D + F
B — Shared filter module (garbage filtering)
Extract GARBAGE_PATTERNS and is_garbage() from clean_transcript.py into a shared src/transcript_filters.py module. Both clean_transcript.py and unleashed-c-20.py import from it.
- Single source of truth — add a pattern once, both scripts get it
- No pattern drift — the mirror and the post-hoc cleaner always match
- No performance concern — 95 compiled regexes against a line of text is microseconds. Python's
recompiles to C. Not measurable even with 100 terminals open.
D — Accept word-merging in live mirror
The cursor-tracking parser is fundamentally limited by how the Ink TUI renders text. Word boundaries without column gaps are undetectable at the ANSI level. Accept merged words in the live mirror. Use clean_transcript.py --fix-spaces (wordninja) post-session for readable prose.
Word-merging in the live mirror is a separate problem tracked in its own issue — see "Related" below.
F — Rate-limit mirror writes
Buffer PTY output for 200-500ms before processing for the mirror. Larger chunks → fewer mid-repaint reads → fewer letter drops and less garbage.
- Mirror becomes slightly delayed but it's
tail -f— 200ms is invisible - Reduces both garbage volume AND letter drops cheaply
- Combines naturally with B (filtered after buffering)
Implementation steps
- Create
src/transcript_filters.py— extractGARBAGE_PATTERNS,COMPACTION_KEEP,is_garbage(),normalize_for_dedup()fromclean_transcript.py - Update
clean_transcript.pyto import from shared module - Update
unleashed-c-20.py_log_to_mirror()to useis_garbage()instead ofNOISE_RE - Add 200ms write buffer in
_log_to_mirror()(accumulate data, flush on timer) - Test: run a session and compare mirror output before/after
Related
src/clean_transcript.py— 95 garbage patterns, wordninja integrationdocs/runbooks/0908-transcript-cleaning.md— post-session cleaning workflow- Separate issue needed for live word-splitting (wordninja or alternative in the mirror itself)