Skip to content

feat(cli): show chunk progress in mm init wizard's _seed_with_progress (closes #655)#740

Merged
pandas-studio merged 1 commit intomainfrom
feat/cli-init-chunk-progress-655
May 3, 2026
Merged

feat(cli): show chunk progress in mm init wizard's _seed_with_progress (closes #655)#740
pandas-studio merged 1 commit intomainfrom
feat/cli-init-chunk-progress-655

Conversation

@pandas-studio
Copy link
Copy Markdown
Collaborator

Summary

The wizard's _seed_with_progress consumed chunk_progress SSE events from index_path_stream (added in #653) but ignored them, so multi-minute embedder runs from mm init looked hung — exactly the UX issue chunk_progress was introduced to fix, just on a different surface. This wires the wizard up to the same chunk-level signal the web Index tab uses.

Design

Option 1 from #655: refresh the bar's sub-label without advancing.

  • Bar length stays in file units (no chunk pre-count).
  • chunk_progress -> bar.update(0, (file, done, total)) re-renders the label only.
  • item_show_func dispatches on tuple vs str so the existing progress-event path keeps the basename label.
  • Throttle/render rules mirror the web Index tab (app.js ~L4219-4256): 100ms gap between intermediate renders via time.monotonic(), final tick (chunks_done >= total) bypasses throttle, clock resets to 0 on every progress event so the next file's first chunk renders immediately.
  • _ensure_bar() extracted so the bar can lazily appear on either the first chunk_progress or progress event.
  • Drive-by: switched basename render from rsplit("/", 1) to Path(...).name so Windows backslash paths display correctly.

Server-side gating in indexing/engine.py:786 already filters small files (default progress_threshold=32), so the CLI doesn't need to threshold itself — small files stay quiet, matching the web Index tab.

Invariants preserved

  • KeyboardInterrupt cleanup unchanged (_close_bar() + try/except block intact).
  • Multi-path aggregation summary line unchanged (only the complete event feeds it).
  • No changes to index_path_stream / engine.py / any web static file.
  • #659 (shared throttle helper between CLI and JS sites) deferred to rule-of-three; becomes natural after #656 lands. This is the second of the three call-sites it tracks.

Test plan

  • Two new tests in TestInitialSeedThreshold pin the no-advance contract (chunk events leave bar.pos == file_count) and the no-regression contract (zero chunk_progress events == today's behavior).
  • uv run pytest packages/memtomem/tests/test_init_cmd.py -m "not ollama" -q -> 264 passed.
  • uv run ruff check + ruff format --check clean.
  • uv run mypy packages/memtomem/src/memtomem/cli/init_cmd.py clean.

Closes #655

Co-Authored-By: Claude Opus 4.7 (1M context) [email protected]

Copy link
Copy Markdown
Owner

@memtomem memtomem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mirrors the web Index tab's chunk_progress pattern (app.js): 100ms throttle, final-tick bypass, reset on file boundary, lazy bar creation on first event of either type. Helper kept self-contained per #659 rule-of-three deferral — correct call, the third call-site (#656 mm index streaming) is the natural extraction trigger. Solid regression test for the 'label refresh, no advance' contract (chunk events must not double-count file-unit bar length). Approving.

@memtomem memtomem force-pushed the feat/cli-init-chunk-progress-655 branch from a8fe4d5 to 4bea6da Compare May 3, 2026 08:17
PR #653 added per-chunk SSE progress (`chunk_progress` events) and
surfaced it in the web Index tab. The wizard's `_seed_with_progress`
consumed the same `index_path_stream` but ignored `chunk_progress`,
so multi-minute embedder runs from the wizard looked hung — exactly
the UX bug `chunk_progress` was added to fix, just on a different
surface.

Adopt option 1 from issue #655 ("refresh sub-label without advance"):
the bar's length stays in **file units** (no chunk pre-count), and
`chunk_progress` events call `bar.update(0, (file, done, total))` to
refresh the label only. `item_show_func` dispatches on `tuple` vs
`str` so the existing `progress`-event path keeps the basename label.

Throttle/render rules mirror the web Index tab (`app.js` ~L4219-4256)
so the two surfaces feel consistent: 100ms gap between intermediate
renders via `time.monotonic()`, final tick (`chunks_done >= total`)
bypasses throttle, clock resets to 0 on every `progress` event so
the next file's first chunk renders immediately. `_ensure_bar()`
extracted so the bar can come into existence on either the first
`chunk_progress` or `progress` event (whichever arrives first).

Switched the basename render from `rsplit("/", 1)` to `Path(...).name`
so Windows backslash paths display correctly — drive-by fix while
the format function was being touched.

Server-side gating in `indexing/engine.py:786` already filters small
files (default `progress_threshold=32`), so the CLI doesn't need to
threshold itself; small files stay quiet, matching the web Index
tab. KeyboardInterrupt cleanup and the multi-path aggregation summary
are unchanged.

Issue #659 (extract a shared throttle helper between this and the JS
sites) becomes natural once #656 lands — this is the second of the
three call-sites it tracks; defer per rule-of-three.

Closes #655

Co-Authored-By: Claude Opus 4.7 (1M context) <[email protected]>
@memtomem memtomem force-pushed the feat/cli-init-chunk-progress-655 branch from 4bea6da to 54a8693 Compare May 3, 2026 08:31
@pandas-studio pandas-studio merged commit 7eaf6bd into main May 3, 2026
9 checks passed
@github-actions github-actions Bot locked and limited conversation to collaborators May 3, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(cli): show chunk progress in mm init wizard's _seed_with_progress

2 participants