Skip to content

feat(summarize): hierarchical multi-level conversation summarization (#2)#63

Merged
Siddhant-K-code merged 1 commit into
mainfrom
feat/2-hierarchical-summarization
May 2, 2026
Merged

feat(summarize): hierarchical multi-level conversation summarization (#2)#63
Siddhant-K-code merged 1 commit into
mainfrom
feat/2-hierarchical-summarization

Conversation

@Siddhant-K-code

Copy link
Copy Markdown
Owner

Closes #2

Summary

Adds pkg/summarize — a rule-based hierarchical summarizer that compresses conversation turns progressively as they age, without requiring an LLM.

Compression levels

Level Name Description
0 LevelFull Original content (recent turns)
1 LevelParagraph First paragraph + code blocks
2 LevelSentence First 1–2 sentences
3 LevelKeywords Top-12 significant words
4 LevelEvicted Dropped entirely (last resort)

Key behaviours

  • PreserveRecent: N most recent turns always kept at LevelFull
  • ImportanceThreshold: turns scoring above threshold never exceed LevelParagraph
  • ScoreImportance: heuristic scoring — code blocks (+0.4), error keywords (+0.3), decision keywords (+0.2), system role (1.0)
  • enforceTokenBudget: second pass with progressive eviction when still over budget after first pass
  • BuildFromGoFiles (in pkg/graph): 4-chars-per-token approximation for token estimation

Files

  • pkg/summarize/summarize.goTurn, Level, SummarizeOptions, Summarizer interface
  • pkg/summarize/importance.goScoreImportance, ScoreTurns, estimateTokens
  • pkg/summarize/hierarchy.goHierarchicalSummarizer with all compression logic
  • pkg/summarize/summarize_test.go — 10 tests covering all major behaviours

@Siddhant-K-code Siddhant-K-code added enhancement New feature or request priority: high labels May 2, 2026
Implements issue #2. Compresses conversation turns progressively as they
age using extractive techniques (no LLM required):

- LevelFull (0): original content, recent turns
- LevelParagraph (1): first paragraph + code blocks preserved
- LevelSentence (2): first 1-2 sentences
- LevelKeywords (3): top-12 significant words
- LevelEvicted (4): dropped entirely when budget is exhausted

Key behaviours:
- PreserveRecent: N most recent turns always kept at LevelFull
- ImportanceThreshold: turns above threshold never exceed LevelParagraph
- ScoreImportance: heuristic scoring (code blocks, error keywords, role)
- enforceTokenBudget: second pass with progressive eviction when over budget
- estimateTokens: 4-chars-per-token approximation

Co-authored-by: Ona <[email protected]>
@Siddhant-K-code Siddhant-K-code force-pushed the feat/2-hierarchical-summarization branch from fccd972 to 04cb5a1 Compare May 2, 2026 14:30
@Siddhant-K-code Siddhant-K-code merged commit 677ae8f into main May 2, 2026
@Siddhant-K-code Siddhant-K-code deleted the feat/2-hierarchical-summarization branch May 2, 2026 14:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request priority: high

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Hierarchical Summarization

1 participant