Skip to content

Streaming indexing endpoint missing namespace + errors parity #590

@memtomem

Description

@memtomem

What

IndexEngine.index_path_stream (engine.py:707) and the GET /api/index/stream endpoint (system.py:768) lack two contract elements that the non-stream index_path / POST /api/index pair already has:

  1. namespace parameter — silently dropped, so any UI/CLI/MCP caller that streams an index ignores the user's namespace selection.
  2. Error aggregation_index_file returns result["errors"] per file (engine.py:606, 628, 684, 752), but the stream loop sums only chunk counters (754-757), never propagating errors. The complete event lacks an errors field, so partial-failure runs (e.g. ONNX missing on a subset, file-too-large, binary-detected) appear successful in the UI.

Fix

  • Engine: add namespace: str | None = None to index_path_stream, pass through to _index_file. Aggregate result["errors"] into a loop accumulator and include it as errors: tuple[str, ...] in the complete event — same shape as IndexingStats.errors so the existing non-stream UI handler reuses verbatim. Per-file uncaught-exception branch (engine.py:752) prefixes f"{fp.name}: {exc}" to match non-stream's asyncio.gather(return_exceptions=True) branch (engine.py:394).
  • Route: add namespace query param to GET /api/index/stream, forward to the engine call.
  • Tests:
    • test_index_stream_namespace_propagatesnamespace="x" → indexed chunks have metadata.namespace == "x".
    • test_index_stream_complete_errors_no_silent_drop — force a file-level failure → complete.errors non-empty (regression guard against current silent-drop behavior; the test name documents intent so future grep finds it).

Why now

Unblocks #582 PR-B (Prev #1 — Index tab two-button collapse) and informs #582 PR #6 (4.11 — indexing in-flight visibility), which will surface complete.errors via toast.

Out of scope

  • Per-file streaming of errors (option ii/iii from the planning discussion). complete.errors is sufficient for current UX. Revisit if user reports indicate large indexing runs (>100 files) where waiting for complete to surface partial failures is impractical.
  • Normalizing _index_file's mixed path-prefix convention — some branches prefix f"{file_path.name}: ..." (file-too-large, binary), others don't (embedding failures at line 684). That's a separate non-stream bug. The PR for this issue preserves the existing loose contract and only path-prefixes the stream-level uncaught-exception branch (line 752) to match non-stream's outer-gather branch.

Refs

  • Engine: packages/memtomem/src/memtomem/indexing/engine.py:348-414 (non-stream), :707-776 (stream).
  • Route: packages/memtomem/src/memtomem/web/routes/system.py:768-804 (stream), :807-832 (non-stream).
  • IndexingStats: packages/memtomem/src/memtomem/models.py:172-185.

🤖 Generated with Claude Code

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions