fix: store full AI response in convo_miner exchange chunking by shafdev · Pull Request #695 · MemPalace/mempalace

shafdev · 2026-04-12T10:10:07Z

Fixes #692

What does this PR do?

Removes an undocumented 8-line cap on AI responses in _chunk_by_exchange()
inside convo_miner.py.

Before:

ai_response = " ".join(ai_lines[:8])

After:

ai_response = " ".join(ai_lines)

The [:8] slice silently discarded everything beyond the 8th line of any AI
response, violating the project's core "verbatim first" principle
(CONTRIBUTING.md). The fallback _chunk_by_paragraph() path has no equivalent
cap, making this inconsistency a likely unintentional oversight from the initial
commit.

How to test

# Run the targeted unit tests
python -m pytest tests/test_convo_miner_unit.py tests/test_convo_miner.py -v

# Run the full suite
python -m pytest tests/ -v

A new regression test test_long_ai_response_not_truncated in
tests/test_convo_miner_unit.py verifies that a 13-line AI response is stored
in full.

Note: Targeting develop branch per CONTRIBUTING.md PR guidelines.

Checklist

Tests pass (python -m pytest tests/ -v)
No hardcoded paths
Linter passes (ruff check .)

bensig

Code review + security audit clean.

Upstream merged MemPalace#682-684 (our splits), MemPalace#687 (dry-run None room), MemPalace#695/MemPalace#708 (convo_miner full response), MemPalace#732 (0-chunk re-processing), plus VitePress docs site. Conflicts: - config.py: take upstream's [^\W_] regex (our MemPalace#683 merged version) - miner.py: integrate upstream's early-return for tiny files, dedupe dry-run read path - test_miner.py: keep our detect_room tests + upstream's dry-run test - CONTRIBUTING.md: take upstream's org URL update Co-Authored-By: Claude Opus 4.6 <[email protected]>

Remove the ai_lines[:8] truncation in _chunk_by_exchange() so full AI responses are stored. Add _register_empty_file() sentinel drawer for conversation files that produce 0 chunks, preventing infinite re-processing on subsequent runs. Also fix pre-existing test_miner assertion that expected int from process_file (now returns tuple since PR #45). Ports upstream MemPalace#654, MemPalace#692, MemPalace#695.

… 0-chunk files Three upstream fixes ported together because they're conceptually one "convo_miner polish" pass on the same exchange-chunking path. 1. Remove ai_lines[:8] truncation (upstream d52d6c9, PR MemPalace#695). The _chunk_by_exchange path was silently dropping every line past line 8 of the AI response, violating the verbatim-storage principle. 2. Split oversize exchanges across drawers (upstream 9b60c6e, PR MemPalace#708). Now that the full response is preserved, an exchange that exceeds CHUNK_SIZE (800 chars, aligned with miner.py) is split into consecutive drawers instead of a single oversized one. Adds CHUNK_SIZE module constant. 3. Register a no-embedding sentinel for files that produce zero chunks (upstream 87e8baf, PR MemPalace#732). mine_convos has three early-exit paths (OSError, content too short, zero chunks) that previously wrote nothing — file_already_mined() then returned False on the next run and the file was re-read every time. Adapted MemPalace#3 for the PG backend: the upstream sentinel uses collection.upsert() (ChromaDB API). This fork instead adds a PalaceDB.register_empty_file() method that inserts a row directly with embedding=NULL and metadata.ingest_mode='registry', so the sentinel is free of embedding cost and invisible to vector search. file_already_mined() already keys on source_file + source_mtime, so the existing path picks up the sentinel without further changes. Three behavioural tests added: full AI response preserved, oversize exchange split across drawers, and the sentinel + file_already_mined round trip. Upstream: MemPalace@d52d6c9 MemPalace@9b60c6e MemPalace@87e8baf Co-authored-by: shafdev <[email protected]> Co-authored-by: Sanjay Ramadugu <[email protected]> Co-authored-by: Mikhail Valentsev <[email protected]> Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>

fix: store full AI response in convo_miner exchange chunking

356f89c

shafdev marked this pull request as ready for review April 12, 2026 13:36

shafdev requested review from bensig, igorls and milla-jovovich as code owners April 12, 2026 13:36

bensig approved these changes Apr 12, 2026

View reviewed changes

bensig merged commit d52d6c9 into MemPalace:develop Apr 12, 2026

igorls mentioned this pull request Apr 13, 2026

release: v3.2.0 #762

Merged

4 tasks

ichoosetoaccept mentioned this pull request Apr 13, 2026

fix: preserve full AI response and add sentinel for 0-chunk files detailobsessed/mempalace#48

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: store full AI response in convo_miner exchange chunking#695

fix: store full AI response in convo_miner exchange chunking#695
bensig merged 1 commit intoMemPalace:developfrom
shafdev:fix/convo-miner-response-truncation

shafdev commented Apr 12, 2026

Uh oh!

bensig left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

shafdev commented Apr 12, 2026

What does this PR do?

How to test

Checklist

Uh oh!

bensig left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants