Skip to content

cli: use engine discovery count for mm index bar length (replace _collect_seed_scale .md-only walk) #743

@pandas-studio

Description

@pandas-studio

Context

PR #741 reuses `_collect_seed_scale` (originally a wizard helper for markdown-memo seeding) as the bar-length pre-computation for `mm index`:

```python
expected_total = _collect_seed_scale(resolved)[0] # cli/indexing.py
```

`_collect_seed_scale` only counts `.md` files. For the wizard's "seed initial markdown memos" workflow this is correct. For `mm index`:

  • `mm index ./docs/api.json` → `expected_total = 0` → no bar renders. User sees nothing until the summary line.
  • `mm index ./src/` (Python project) → same. Long runs look hung.
  • `mm index ./large-corpus/` (mixed file types) → bar undercounts.

Two issues, one root cause

  1. UX: bar disappears for non-`.md` corpora.
  2. Perf: `_collect_seed_scale` does a separate `rglob("*.md")` walk before the engine's own walk — duplicated I/O on huge trees.

Suggested direction

Have `IndexEngine.index_path_stream` emit an early `discovery` event (or include a `files_total` field on the first `progress` event) carrying the count of files the engine plans to process, using whatever extension filter the engine actually applies. The helper consumes that to lazy-create the bar with the correct length, instead of pre-computing.

This collapses the duplicate walk and fixes the non-`.md` case in one shot.

Files (likely)

  • `packages/memtomem/src/memtomem/indexing/engine.py` — emit discovery count
  • `packages/memtomem/src/memtomem/cli/_index_progress.py` — consume discovery event, drop `expected_total` parameter (or accept it as override hint)
  • Wizard caller can drop its own `_collect_seed_scale` summing too

Acceptance

  • `mm index ./src/` shows a progress bar
  • No duplicate `rglob` per index invocation
  • Wizard seed flow still pre-computes for the file-count gate decision (the seed-or-skip threshold logic) but the bar length comes from the stream

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions