feat(cli): show chunk progress in mm init wizard's _seed_with_progress (closes #655)#740
Merged
pandas-studio merged 1 commit intomainfrom May 3, 2026
Merged
Conversation
6 tasks
memtomem
approved these changes
May 3, 2026
Owner
memtomem
left a comment
There was a problem hiding this comment.
Mirrors the web Index tab's chunk_progress pattern (app.js): 100ms throttle, final-tick bypass, reset on file boundary, lazy bar creation on first event of either type. Helper kept self-contained per #659 rule-of-three deferral — correct call, the third call-site (#656 mm index streaming) is the natural extraction trigger. Solid regression test for the 'label refresh, no advance' contract (chunk events must not double-count file-unit bar length). Approving.
a8fe4d5 to
4bea6da
Compare
PR #653 added per-chunk SSE progress (`chunk_progress` events) and surfaced it in the web Index tab. The wizard's `_seed_with_progress` consumed the same `index_path_stream` but ignored `chunk_progress`, so multi-minute embedder runs from the wizard looked hung — exactly the UX bug `chunk_progress` was added to fix, just on a different surface. Adopt option 1 from issue #655 ("refresh sub-label without advance"): the bar's length stays in **file units** (no chunk pre-count), and `chunk_progress` events call `bar.update(0, (file, done, total))` to refresh the label only. `item_show_func` dispatches on `tuple` vs `str` so the existing `progress`-event path keeps the basename label. Throttle/render rules mirror the web Index tab (`app.js` ~L4219-4256) so the two surfaces feel consistent: 100ms gap between intermediate renders via `time.monotonic()`, final tick (`chunks_done >= total`) bypasses throttle, clock resets to 0 on every `progress` event so the next file's first chunk renders immediately. `_ensure_bar()` extracted so the bar can come into existence on either the first `chunk_progress` or `progress` event (whichever arrives first). Switched the basename render from `rsplit("/", 1)` to `Path(...).name` so Windows backslash paths display correctly — drive-by fix while the format function was being touched. Server-side gating in `indexing/engine.py:786` already filters small files (default `progress_threshold=32`), so the CLI doesn't need to threshold itself; small files stay quiet, matching the web Index tab. KeyboardInterrupt cleanup and the multi-path aggregation summary are unchanged. Issue #659 (extract a shared throttle helper between this and the JS sites) becomes natural once #656 lands — this is the second of the three call-sites it tracks; defer per rule-of-three. Closes #655 Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
4bea6da to
54a8693
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The wizard's
_seed_with_progressconsumedchunk_progressSSE events fromindex_path_stream(added in #653) but ignored them, so multi-minute embedder runs frommm initlooked hung — exactly the UX issuechunk_progresswas introduced to fix, just on a different surface. This wires the wizard up to the same chunk-level signal the web Index tab uses.Design
Option 1 from #655: refresh the bar's sub-label without advancing.
chunk_progress->bar.update(0, (file, done, total))re-renders the label only.item_show_funcdispatches ontuplevsstrso the existingprogress-event path keeps the basename label.app.js~L4219-4256): 100ms gap between intermediate renders viatime.monotonic(), final tick (chunks_done >= total) bypasses throttle, clock resets to 0 on everyprogressevent so the next file's first chunk renders immediately._ensure_bar()extracted so the bar can lazily appear on either the firstchunk_progressorprogressevent.rsplit("/", 1)toPath(...).nameso Windows backslash paths display correctly.Server-side gating in
indexing/engine.py:786already filters small files (defaultprogress_threshold=32), so the CLI doesn't need to threshold itself — small files stay quiet, matching the web Index tab.Invariants preserved
_close_bar()+try/exceptblock intact).completeevent feeds it).index_path_stream/engine.py/ any web static file.#659(shared throttle helper between CLI and JS sites) deferred to rule-of-three; becomes natural after#656lands. This is the second of the three call-sites it tracks.Test plan
TestInitialSeedThresholdpin the no-advance contract (chunk events leavebar.pos == file_count) and the no-regression contract (zerochunk_progressevents == today's behavior).uv run pytest packages/memtomem/tests/test_init_cmd.py -m "not ollama" -q-> 264 passed.uv run ruff check+ruff format --checkclean.uv run mypy packages/memtomem/src/memtomem/cli/init_cmd.pyclean.Closes #655
Co-Authored-By: Claude Opus 4.7 (1M context) [email protected]