fix: remove silent 8-line AI response truncation in convo_miner by sanjay3290 · Pull Request #708 · MemPalace/mempalace

sanjay3290 · 2026-04-12T13:21:25Z

Summary

Fixes #692

_chunk_by_exchange() in convo_miner.py was silently truncating AI responses to 8 lines (ai_lines[:8] on line 73). Any content beyond line 8 was permanently discarded during mining, violating the project's core principle of verbatim storage.

What changed

Removed the [:8] slice — the full AI response is now preserved
Added CHUNK_SIZE = 800 (aligned with miner.py) to convo_miner.py
Long exchanges are split across multiple drawers instead of being truncated — when a user-turn + AI response exceeds CHUNK_SIZE, it is chunked into consecutive drawers so nothing is lost

Why this matters

Conversations with detailed AI responses (code explanations, architecture discussions, multi-step instructions) were losing most of their content. Only the first ~8 lines survived mining. This made the palace unreliable for recall on any non-trivial exchange.

Test results

All 14 existing unit tests pass:

tests/test_convo_miner_unit.py — 14 passed in 0.03s

The _chunk_by_exchange() function was silently truncating AI responses to 8 lines via ai_lines[:8]. Any content beyond line 8 was discarded, violating the project's verbatim storage principle. Now the full AI response is preserved. When a combined exchange exceeds CHUNK_SIZE (800 chars, aligned with miner.py), it is split across consecutive drawers instead of being truncated.

bensig

Code review + security audit clean.

Upstream merged MemPalace#682-684 (our splits), MemPalace#687 (dry-run None room), MemPalace#695/MemPalace#708 (convo_miner full response), MemPalace#732 (0-chunk re-processing), plus VitePress docs site. Conflicts: - config.py: take upstream's [^\W_] regex (our MemPalace#683 merged version) - miner.py: integrate upstream's early-return for tiny files, dedupe dry-run read path - test_miner.py: keep our detect_room tests + upstream's dry-run test - CONTRIBUTING.md: take upstream's org URL update Co-Authored-By: Claude Opus 4.6 <[email protected]>

… 0-chunk files Three upstream fixes ported together because they're conceptually one "convo_miner polish" pass on the same exchange-chunking path. 1. Remove ai_lines[:8] truncation (upstream d52d6c9, PR MemPalace#695). The _chunk_by_exchange path was silently dropping every line past line 8 of the AI response, violating the verbatim-storage principle. 2. Split oversize exchanges across drawers (upstream 9b60c6e, PR MemPalace#708). Now that the full response is preserved, an exchange that exceeds CHUNK_SIZE (800 chars, aligned with miner.py) is split into consecutive drawers instead of a single oversized one. Adds CHUNK_SIZE module constant. 3. Register a no-embedding sentinel for files that produce zero chunks (upstream 87e8baf, PR MemPalace#732). mine_convos has three early-exit paths (OSError, content too short, zero chunks) that previously wrote nothing — file_already_mined() then returned False on the next run and the file was re-read every time. Adapted MemPalace#3 for the PG backend: the upstream sentinel uses collection.upsert() (ChromaDB API). This fork instead adds a PalaceDB.register_empty_file() method that inserts a row directly with embedding=NULL and metadata.ingest_mode='registry', so the sentinel is free of embedding cost and invisible to vector search. file_already_mined() already keys on source_file + source_mtime, so the existing path picks up the sentinel without further changes. Three behavioural tests added: full AI response preserved, oversize exchange split across drawers, and the sentinel + file_already_mined round trip. Upstream: MemPalace@d52d6c9 MemPalace@9b60c6e MemPalace@87e8baf Co-authored-by: shafdev <[email protected]> Co-authored-by: Sanjay Ramadugu <[email protected]> Co-authored-by: Mikhail Valentsev <[email protected]> Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

sanjay3290 requested review from bensig and milla-jovovich as code owners April 12, 2026 13:21

bensig approved these changes Apr 12, 2026

View reviewed changes

bensig merged commit 9b60c6e into MemPalace:develop Apr 12, 2026

igorls mentioned this pull request Apr 13, 2026

release: v3.2.0 #762

Merged

4 tasks

arnoldwender mentioned this pull request Apr 23, 2026

fix(convo_miner): preserve blank lines and indentation in AI responses #1137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: remove silent 8-line AI response truncation in convo_miner#708

fix: remove silent 8-line AI response truncation in convo_miner#708
bensig merged 1 commit intoMemPalace:developfrom
sanjay3290:fix/convo-miner-truncation

sanjay3290 commented Apr 12, 2026

Uh oh!

bensig left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

sanjay3290 commented Apr 12, 2026

Summary

What changed

Why this matters

Test results

Uh oh!

bensig left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants