Skip to content

fix: store full AI response in convo_miner exchange chunking#695

Merged
bensig merged 1 commit intoMemPalace:developfrom
shafdev:fix/convo-miner-response-truncation
Apr 12, 2026
Merged

fix: store full AI response in convo_miner exchange chunking#695
bensig merged 1 commit intoMemPalace:developfrom
shafdev:fix/convo-miner-response-truncation

Conversation

@shafdev
Copy link
Copy Markdown
Contributor

@shafdev shafdev commented Apr 12, 2026

Fixes #692

What does this PR do?

Removes an undocumented 8-line cap on AI responses in _chunk_by_exchange()
inside convo_miner.py.

Before:

ai_response = " ".join(ai_lines[:8])

After:

ai_response = " ".join(ai_lines)

The [:8] slice silently discarded everything beyond the 8th line of any AI
response, violating the project's core "verbatim first" principle
(CONTRIBUTING.md). The fallback _chunk_by_paragraph() path has no equivalent
cap, making this inconsistency a likely unintentional oversight from the initial
commit.

How to test

# Run the targeted unit tests
python -m pytest tests/test_convo_miner_unit.py tests/test_convo_miner.py -v

# Run the full suite
python -m pytest tests/ -v

A new regression test test_long_ai_response_not_truncated in
tests/test_convo_miner_unit.py verifies that a 13-line AI response is stored
in full.

Note: Targeting develop branch per CONTRIBUTING.md PR guidelines.

Checklist

  • Tests pass (python -m pytest tests/ -v)
  • No hardcoded paths
  • Linter passes (ruff check .)

@shafdev shafdev marked this pull request as ready for review April 12, 2026 13:36
Copy link
Copy Markdown
Collaborator

@bensig bensig left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review + security audit clean.

@bensig bensig merged commit d52d6c9 into MemPalace:develop Apr 12, 2026
jphein added a commit to jphein/mempalace that referenced this pull request Apr 12, 2026
Upstream merged MemPalace#682-684 (our splits), MemPalace#687 (dry-run None room),
MemPalace#695/MemPalace#708 (convo_miner full response), MemPalace#732 (0-chunk re-processing),
plus VitePress docs site. Conflicts:
- config.py: take upstream's [^\W_] regex (our MemPalace#683 merged version)
- miner.py: integrate upstream's early-return for tiny files, dedupe
  dry-run read path
- test_miner.py: keep our detect_room tests + upstream's dry-run test
- CONTRIBUTING.md: take upstream's org URL update

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@igorls igorls mentioned this pull request Apr 13, 2026
4 tasks
ichoosetoaccept pushed a commit to detailobsessed/mempalace that referenced this pull request Apr 13, 2026
Remove the ai_lines[:8] truncation in _chunk_by_exchange() so full AI
responses are stored. Add _register_empty_file() sentinel drawer for
conversation files that produce 0 chunks, preventing infinite
re-processing on subsequent runs.

Also fix pre-existing test_miner assertion that expected int from
process_file (now returns tuple since PR #45).

Ports upstream MemPalace#654, MemPalace#692, MemPalace#695.
ichoosetoaccept pushed a commit to detailobsessed/mempalace that referenced this pull request Apr 13, 2026
Remove the ai_lines[:8] truncation in _chunk_by_exchange() so full AI
responses are stored. Add _register_empty_file() sentinel drawer for
conversation files that produce 0 chunks, preventing infinite
re-processing on subsequent runs.

Also fix pre-existing test_miner assertion that expected int from
process_file (now returns tuple since PR #45).

Ports upstream MemPalace#654, MemPalace#692, MemPalace#695.
gnusam pushed a commit to gnusam/mempalace-pgsql that referenced this pull request Apr 25, 2026
… 0-chunk files

Three upstream fixes ported together because they're conceptually one
"convo_miner polish" pass on the same exchange-chunking path.

1. Remove ai_lines[:8] truncation (upstream d52d6c9, PR MemPalace#695). The
   _chunk_by_exchange path was silently dropping every line past line 8
   of the AI response, violating the verbatim-storage principle.

2. Split oversize exchanges across drawers (upstream 9b60c6e, PR MemPalace#708).
   Now that the full response is preserved, an exchange that exceeds
   CHUNK_SIZE (800 chars, aligned with miner.py) is split into
   consecutive drawers instead of a single oversized one. Adds
   CHUNK_SIZE module constant.

3. Register a no-embedding sentinel for files that produce zero chunks
   (upstream 87e8baf, PR MemPalace#732). mine_convos has three early-exit paths
   (OSError, content too short, zero chunks) that previously wrote
   nothing — file_already_mined() then returned False on the next run
   and the file was re-read every time.

Adapted MemPalace#3 for the PG backend: the upstream sentinel uses
collection.upsert() (ChromaDB API). This fork instead adds a
PalaceDB.register_empty_file() method that inserts a row directly with
embedding=NULL and metadata.ingest_mode='registry', so the sentinel is
free of embedding cost and invisible to vector search. file_already_mined()
already keys on source_file + source_mtime, so the existing path picks
up the sentinel without further changes.

Three behavioural tests added: full AI response preserved, oversize
exchange split across drawers, and the sentinel + file_already_mined
round trip.

Upstream:
  MemPalace@d52d6c9
  MemPalace@9b60c6e
  MemPalace@87e8baf

Co-authored-by: shafdev <[email protected]>
Co-authored-by: Sanjay Ramadugu <[email protected]>
Co-authored-by: Mikhail Valentsev <[email protected]>
Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

convo_miner: AI responses silently truncated to 8 lines in _chunk_by_exchange()

2 participants