feat(cli): convert mm index to stream + click.progressbar (closes #656)#741
Merged
pandas-studio merged 2 commits intomainfrom May 3, 2026
Merged
feat(cli): convert mm index to stream + click.progressbar (closes #656)#741pandas-studio merged 2 commits intomainfrom
pandas-studio merged 2 commits intomainfrom
Conversation
4bea6da to
54a8693
Compare
memtomem
approved these changes
May 3, 2026
Owner
memtomem
left a comment
There was a problem hiding this comment.
Solid extraction — bar lifecycle is sound (helper's finally: _close_bar() covers KbInt cleanup before propagation), chunk-event no-advance contract is pinned by test (bar.pos == 1 after 4 chunk + 1 progress event), summary-line shape preserved verbatim for grep-stable scripts.
Two follow-ups filed (separate issues) — non-blocking:
namespaceconditional kwarg shim inrun_with_progressis a stub-compat workaround for wizard test fakes that predate the kwarg. Should be cleaned up by updating the fakes to accept it._collect_seed_scaleonly counts.mdfiles for bar length — fine for the wizard (markdown memos), butmm index ./src/ormm index ./docs/api.jsongets no bar at all. Engine-provided count would fix both this and the duplicated FS walk.
Borderline rule-of-three (two Python callers, JS as referenced third) is the right judgment call — both implementations would otherwise land identical in this cycle and the contract is well-specified by the chunk_progress event shape.
Approving.
Direct CLI users running ``mm index <large-dir>`` previously saw nothing until the run finished — index_path() blocks for the entire duration. On multi-minute embedder runs (bge-m3 / CPU, large corpus) this looked hung. Switch _index() to consume index_path_stream() and render the same throttled progress bar that #740 just polished for the wizard's seed flow. Lift the streaming + bar lifecycle into a shared module ``memtomem.cli._index_progress`` so both call sites (``_seed_with_progress`` and ``mm index``) drive identical UX. _collect_seed_scale moves with it (re-exported from init_cmd to keep test imports working). Only ``_index`` is converted — the debounce closure at indexing.py:194 keeps the non-stream API since hook-driven drains don't want progress noise. The non-stream index_path() stays for tests, MCP mem_index, and the closure. Invariants preserved: - Summary line shape "Indexed N file(s): N new, N unchanged, N deleted (Nms)" — stable interface that scripts may grep. - --force / --namespace / --recursive flow through to index_path_stream. - Ctrl-C prints the same yellow "Cancelled. Resume with: mm index <path>" hint as before; idempotent re-run via content-hash dedup. - Exit code unchanged on success / partial failure / hard failure. - _seed_with_progress retains its zero-chunks warning, multi-path web hint, and graceful-skip behavior — verified by the 264 existing wizard tests. Tests: 5 new tests in tests/test_indexing_cli.py covering stream-event pass-through with summary line pin, KeyboardInterrupt resume hint, --namespace/--force kwarg propagation, and the namespace-None-omitted stub-engine compat path. Full sweep: 3991 passed, 46 deselected. Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
53f6798 to
7cda14d
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Stacked on #740 (rebases onto main once #740 merges).
Motivation
Direct CLI users running
mm index <large-dir>previously saw nothing untilthe run finished —
index_path()blocks for the entire duration. Onmulti-minute embedder runs (bge-m3 / CPU, large corpus) this looked hung.
Switch
_index()to consumeindex_path_stream()and render the samethrottled progress bar that #740 just polished for the wizard's seed flow.
Design
Lifted the streaming + bar lifecycle into a new module
memtomem.cli._index_progressso both call sites (_seed_with_progressandmm index's_index) drive identical UX (throttled chunk-progress label,lazy bar creation, file-unit length).
_collect_seed_scalemoved with it,re-exported from
init_cmdso existing test imports keep working.Only
_index()is converted. The debounce closure atindexing.py:194keeps the non-stream API since hook-driven drains (
--flush/--debounce-window) don't want progress noise. The non-streamindex_path()stays for tests, MCPmem_index, and the closure — issuetext covers this explicitly.
Invariants preserved
Indexed N file(s): N new, N unchanged, N deleted (Nms)— stable interface scripts may grep.--force,--namespace,--recursiveforward toindex_path_stream.Cancelled. Resume with: mm index <path>(idempotent re-run via content-hash dedup)._seed_with_progresskeeps its zero-chunks warning, multi-path web hint, graceful-skip — verified by the 264 existing wizard tests (no regressions).Test plan
tests/test_indexing_cli.py: stream-event pass-through with legacy summary-line pin, error-line rendering, Ctrl-C resume hint,--namespace/--forcekwarg propagation, default-no-namespace stub-engine compat.test_init_cmd.py) green, including feat(cli): show chunk progress in mm init wizard's _seed_with_progress (closes #655) #740's two new chunk-progress regression tests.test_cli_index_noop_e2e.py(inline + subprocess) green.ruff checkandruff format --checkclean.mypyclean on_index_progress.py,indexing.py,init_cmd.py.Closes #656
🤖 Generated with Claude Code