Skip to content

feat(cli): convert mm index to stream + click.progressbar (closes #656)#741

Merged
pandas-studio merged 2 commits intomainfrom
feat/cli-index-stream-progress-656
May 3, 2026
Merged

feat(cli): convert mm index to stream + click.progressbar (closes #656)#741
pandas-studio merged 2 commits intomainfrom
feat/cli-index-stream-progress-656

Conversation

@pandas-studio
Copy link
Copy Markdown
Collaborator

Stacked on #740 (rebases onto main once #740 merges).

Motivation

Direct CLI users running mm index <large-dir> previously saw nothing until
the run finished — index_path() blocks for the entire duration. On
multi-minute embedder runs (bge-m3 / CPU, large corpus) this looked hung.
Switch _index() to consume index_path_stream() and render the same
throttled progress bar that #740 just polished for the wizard's seed flow.

Design

Lifted the streaming + bar lifecycle into a new module
memtomem.cli._index_progress so both call sites (_seed_with_progress and
mm index's _index) drive identical UX (throttled chunk-progress label,
lazy bar creation, file-unit length). _collect_seed_scale moved with it,
re-exported from init_cmd so existing test imports keep working.

Only _index() is converted. The debounce closure at indexing.py:194
keeps the non-stream API since hook-driven drains (--flush /
--debounce-window) don't want progress noise. The non-stream
index_path() stays for tests, MCP mem_index, and the closure — issue
text covers this explicitly.

Invariants preserved

  • Summary line shape Indexed N file(s): N new, N unchanged, N deleted (Nms) — stable interface scripts may grep.
  • --force, --namespace, --recursive forward to index_path_stream.
  • Ctrl-C prints Cancelled. Resume with: mm index <path> (idempotent re-run via content-hash dedup).
  • Exit code unchanged on success / partial failure / hard failure.
  • _seed_with_progress keeps its zero-chunks warning, multi-path web hint, graceful-skip — verified by the 264 existing wizard tests (no regressions).

Test plan

  • 5 new tests in tests/test_indexing_cli.py: stream-event pass-through with legacy summary-line pin, error-line rendering, Ctrl-C resume hint, --namespace/--force kwarg propagation, default-no-namespace stub-engine compat.
  • Full wizard suite (264 tests in test_init_cmd.py) green, including feat(cli): show chunk progress in mm init wizard's _seed_with_progress (closes #655) #740's two new chunk-progress regression tests.
  • test_cli_index_noop_e2e.py (inline + subprocess) green.
  • Full sweep: 3991 passed, 46 deselected.
  • ruff check and ruff format --check clean.
  • mypy clean on _index_progress.py, indexing.py, init_cmd.py.

Closes #656

🤖 Generated with Claude Code

@memtomem memtomem force-pushed the feat/cli-init-chunk-progress-655 branch 2 times, most recently from 4bea6da to 54a8693 Compare May 3, 2026 08:31
Copy link
Copy Markdown
Owner

@memtomem memtomem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Solid extraction — bar lifecycle is sound (helper's finally: _close_bar() covers KbInt cleanup before propagation), chunk-event no-advance contract is pinned by test (bar.pos == 1 after 4 chunk + 1 progress event), summary-line shape preserved verbatim for grep-stable scripts.

Two follow-ups filed (separate issues) — non-blocking:

  1. namespace conditional kwarg shim in run_with_progress is a stub-compat workaround for wizard test fakes that predate the kwarg. Should be cleaned up by updating the fakes to accept it.
  2. _collect_seed_scale only counts .md files for bar length — fine for the wizard (markdown memos), but mm index ./src/ or mm index ./docs/api.json gets no bar at all. Engine-provided count would fix both this and the duplicated FS walk.

Borderline rule-of-three (two Python callers, JS as referenced third) is the right judgment call — both implementations would otherwise land identical in this cycle and the contract is well-specified by the chunk_progress event shape.

Approving.

Direct CLI users running ``mm index <large-dir>`` previously saw nothing
until the run finished — index_path() blocks for the entire duration. On
multi-minute embedder runs (bge-m3 / CPU, large corpus) this looked hung.

Switch _index() to consume index_path_stream() and render the same
throttled progress bar that #740 just polished for the wizard's seed flow.
Lift the streaming + bar lifecycle into a shared module
``memtomem.cli._index_progress`` so both call sites (``_seed_with_progress``
and ``mm index``) drive identical UX. _collect_seed_scale moves with it
(re-exported from init_cmd to keep test imports working).

Only ``_index`` is converted — the debounce closure at indexing.py:194
keeps the non-stream API since hook-driven drains don't want progress noise.
The non-stream index_path() stays for tests, MCP mem_index, and the closure.

Invariants preserved:
- Summary line shape "Indexed N file(s): N new, N unchanged, N deleted (Nms)"
  — stable interface that scripts may grep.
- --force / --namespace / --recursive flow through to index_path_stream.
- Ctrl-C prints the same yellow "Cancelled. Resume with: mm index <path>"
  hint as before; idempotent re-run via content-hash dedup.
- Exit code unchanged on success / partial failure / hard failure.
- _seed_with_progress retains its zero-chunks warning, multi-path web hint,
  and graceful-skip behavior — verified by the 264 existing wizard tests.

Tests: 5 new tests in tests/test_indexing_cli.py covering stream-event
pass-through with summary line pin, KeyboardInterrupt resume hint,
--namespace/--force kwarg propagation, and the namespace-None-omitted
stub-engine compat path. Full sweep: 3991 passed, 46 deselected.

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@memtomem memtomem force-pushed the feat/cli-index-stream-progress-656 branch from 53f6798 to 7cda14d Compare May 3, 2026 08:59
@pandas-studio pandas-studio changed the base branch from feat/cli-init-chunk-progress-655 to main May 3, 2026 09:14
@pandas-studio pandas-studio merged commit ca15f87 into main May 3, 2026
9 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(cli): convert mm index to stream + click.progressbar

2 participants