Parent
Epic: #426
Summary
Deduplicate repetitive log output by normalizing timestamps, UUIDs, hex addresses, and long numbers, then counting identical patterns.
Expected Savings
70-85% on verbose logs.
Behavior
# Before (cargo run output, 500 lines):
2025-01-15T10:00:01Z INFO agent: processing request id=abc-123
2025-01-15T10:00:02Z INFO agent: processing request id=def-456
2025-01-15T10:00:03Z INFO agent: processing request id=ghi-789
... (repeated 200 times)
2025-01-15T10:00:04Z ERROR agent: connection refused addr=127.0.0.1:5432
2025-01-15T10:00:05Z ERROR agent: connection refused addr=127.0.0.1:5432
# After:
[×200] INFO agent: processing request id=<UUID>
[×2] ERROR agent: connection refused addr=<ADDR>
Normalization Rules
- Strip ISO timestamps:
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*?\s
- Replace UUIDs:
[0-9a-f]{8}-[0-9a-f]{4}-... → <UUID>
- Replace hex:
0x[0-9a-fA-F]+ → <HEX>
- Replace long numbers:
\b\d{4,}\b → <NUM>
- Replace IP:port:
\d+\.\d+\.\d+\.\d+:\d+ → <ADDR>
Implementation
LogDeduplicationFilter — generic, applied when no command-specific filter matches and output has many lines (>50) with high repetition ratio
- HashMap of normalized patterns → (count, first_raw_line)
- Emit in order of first occurrence
Acceptance Criteria
Parent
Epic: #426
Summary
Deduplicate repetitive log output by normalizing timestamps, UUIDs, hex addresses, and long numbers, then counting identical patterns.
Expected Savings
70-85% on verbose logs.
Behavior
Normalization Rules
\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*?\s[0-9a-f]{8}-[0-9a-f]{4}-...→<UUID>0x[0-9a-fA-F]+→<HEX>\b\d{4,}\b→<NUM>\d+\.\d+\.\d+\.\d+:\d+→<ADDR>Implementation
LogDeduplicationFilter— generic, applied when no command-specific filter matches and output has many lines (>50) with high repetition ratioAcceptance Criteria