Skip to content

feat: log deduplication with pattern normalization and counting #432

@bug-ops

Description

@bug-ops

Parent

Epic: #426

Summary

Deduplicate repetitive log output by normalizing timestamps, UUIDs, hex addresses, and long numbers, then counting identical patterns.

Expected Savings

70-85% on verbose logs.

Behavior

# Before (cargo run output, 500 lines):
2025-01-15T10:00:01Z INFO agent: processing request id=abc-123
2025-01-15T10:00:02Z INFO agent: processing request id=def-456
2025-01-15T10:00:03Z INFO agent: processing request id=ghi-789
... (repeated 200 times)
2025-01-15T10:00:04Z ERROR agent: connection refused addr=127.0.0.1:5432
2025-01-15T10:00:05Z ERROR agent: connection refused addr=127.0.0.1:5432

# After:
[×200] INFO agent: processing request id=<UUID>
[×2] ERROR agent: connection refused addr=<ADDR>

Normalization Rules

  • Strip ISO timestamps: \d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}.*?\s
  • Replace UUIDs: [0-9a-f]{8}-[0-9a-f]{4}-...<UUID>
  • Replace hex: 0x[0-9a-fA-F]+<HEX>
  • Replace long numbers: \b\d{4,}\b<NUM>
  • Replace IP:port: \d+\.\d+\.\d+\.\d+:\d+<ADDR>

Implementation

  • LogDeduplicationFilter — generic, applied when no command-specific filter matches and output has many lines (>50) with high repetition ratio
  • HashMap of normalized patterns → (count, first_raw_line)
  • Emit in order of first occurrence

Acceptance Criteria

  • Deduplication with counts
  • Normalization preserves error context
  • Only activates on repetitive output (>30% duplication)
  • Unit tests

Metadata

Metadata

Assignees

No one assigned

    Labels

    P2High value, medium complexitysize/MMedium PR (51-200 lines)token-savingsToken economy improvementstoolsTool execution and MCP integration

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions