Skip to content

fix(indexing): stream endpoint namespace + errors parity (#590)#591

Merged
memtomem merged 1 commit intomainfrom
fix/590-index-stream-parity
Apr 30, 2026
Merged

fix(indexing): stream endpoint namespace + errors parity (#590)#591
memtomem merged 1 commit intomainfrom
fix/590-index-stream-parity

Conversation

@memtomem
Copy link
Copy Markdown
Owner

Summary

Closes #590.

IndexEngine.index_path_stream and GET /api/index/stream were missing two contract elements that the non-stream index_path / POST /api/index pair already had:

  1. namespace parameter — silently dropped. Folder-mode UI / CLI / MCP callers that streamed an index ignored the user's namespace selection.
  2. Error aggregation_index_file returns result["errors"] per file (engine.py:606, 628, 684, 752); the stream loop summed only chunk counters (754-757) and never propagated errors. The complete event lacked an errors field, so partial-failure runs appeared successful in the UI.

Changes

  • Engine (packages/memtomem/src/memtomem/indexing/engine.py:707-784):
    • Add namespace: str | None = None to index_path_stream and forward it to _index_file(fp, force, namespace=namespace) (which already accepted the param).
    • Accumulate result["errors"] into all_errors and emit errors: list[str] in the complete event — same loose shape as IndexingStats.errors so non-stream UI handlers (5-cap "+N more" rendering, error-row toggle) reuse verbatim.
    • Empty-path branch (no files / not-a-path) now emits errors: [] for shape consistency.
    • Stream-level uncaught-exception branch (line 752) prefixes f"{fp.name}: {exc}" so consumers see the same shape regardless of which branch caught the error. Matches non-stream's asyncio.gather(return_exceptions=True) branch (line 394).
  • Route (packages/memtomem/src/memtomem/web/routes/system.py:768-790): add namespace: str | None = None query param to GET /api/index/stream, forward to engine.
  • Tests (packages/memtomem/tests/test_indexing_engine.py):
    • test_index_path_stream_namespace_propagates — pass namespace="ns590", verify indexed chunks have metadata.namespace == "ns590".
    • test_index_path_stream_complete_errors_no_silent_drop — write a file with a NUL byte (binary-detected branch at engine.py:621-628), assert complete.errors is non-empty and mentions the file name. Test name documents intent so future grep finds the regression guard.

Compatibility

Out of scope

  • Per-file streaming of errors (option ii/iii from the issue's planning discussion). complete.errors is sufficient for current UX; revisit if user reports indicate large indexing runs (>100 files) where waiting for complete is impractical.
  • Normalizing _index_file's mixed path-prefix convention — some branches prefix f"{file_path.name}: ..." (file-too-large, binary), others don't (embedding failures at engine.py:684). Separate non-stream bug; this PR preserves the existing loose contract and only path-prefixes the stream-level uncaught-exception branch (line 752) for consistency with the non-stream outer-gather branch.

Refs

Test plan

  • uv run ruff check packages/memtomem/src && uv run ruff format --check packages/memtomem/src
  • uv run pytest packages/memtomem/tests/test_indexing_engine.py -k "test_index_path_stream" -m "not ollama" — 4/4 pass (2 existing exclude-guard tests + 2 new parity guards)
  • uv run pytest packages/memtomem/tests/test_indexing_engine.py packages/memtomem/tests/test_web_routes.py packages/memtomem/tests/test_web_exclude_guard.py packages/memtomem/tests/test_init_cmd.py -m "not ollama" — 441/441 pass (combined). Confirms no regression in route handlers, exclude guards, or the mm init wizard's seed flow.

🤖 Generated with Claude Code

``IndexEngine.index_path_stream`` and the ``GET /api/index/stream``
endpoint were missing two contract elements that ``index_path`` /
``POST /api/index`` already had: the ``namespace`` parameter (silently
dropped, so streamed indexes ignored user namespace selection) and
error aggregation in the ``complete`` event (per-file
``result["errors"]`` was summed only into chunk counters and never
emitted, so partial-failure runs appeared successful in the UI).

Engine: thread ``namespace`` through to ``_index_file``, accumulate
per-file errors, include them as ``errors: list[str]`` in the
``complete`` event — same loose shape as ``IndexingStats.errors`` so
non-stream UI handlers reuse verbatim. Path-prefix the stream-level
uncaught-exception branch (``f"{fp.name}: {exc}"``) to match the
non-stream ``asyncio.gather(return_exceptions=True)`` branch.

Route: forward the new ``namespace`` query param to the engine.

Tests: ``test_index_path_stream_namespace_propagates`` (chunks gain
``metadata.namespace`` for the streamed run) and
``test_index_path_stream_complete_errors_no_silent_drop`` (the binary
file triggers a per-file error that surfaces in ``complete.errors``).
The test names document intent so future grep finds the regression
guards.

Out of scope: per-file streaming of errors (option ii/iii from the
planning discussion); normalizing ``_index_file``'s mixed
path-prefix convention. Both noted in the issue.

Unblocks #582 PR-B (Prev #1 — Index tab two-button collapse) and
informs #582 PR #6 (4.11 — indexing in-flight visibility).

Refs #590.

Co-Authored-By: Claude <[email protected]>
@memtomem memtomem merged commit e9ce9c0 into main Apr 30, 2026
7 checks passed
@memtomem memtomem deleted the fix/590-index-stream-parity branch April 30, 2026 04:01
@github-actions github-actions Bot locked and limited conversation to collaborators Apr 30, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Streaming indexing endpoint missing namespace + errors parity

2 participants