feat(wizard): explicit guardrails for failed init steps (closes #626)#661
Merged
feat(wizard): explicit guardrails for failed init steps (closes #626)#661
Conversation
The `mm init` wizard previously continued silently after a step failed, which is how the user reported scenario looked: a broken Ollama server caused `_ollama_has_model` to misread "server timed out" as "model not installed", the wizard offered a doomed `ollama pull`, the pull also failed, and the wizard then advanced straight to the Reranker step with no acknowledgement that anything was wrong. This change adds an explicit recovery prompt at each Ollama failure point. The user must now choose retry / back / quit — the wizard never silently moves past a failed action. Mechanism (`wizard.py`): - `StepRetry` exception, caught by `run_steps` to re-invoke the same step, or by the step itself to retry only an inner action without re-prompting earlier inputs. - `fail_step(message, *, retryable=True)` helper that surfaces a red failure message and prompts for r/b/q (or just b/q when `retryable=False`). Always raises — never returns. Wiring (`init_cmd.py`): - `_ollama_has_model` now returns `bool | None` so callers can distinguish "server unreachable / errored" (`None`) from "model absent" (`False`). The original `False`-on-error conflation was the root of the misread. - `_step_embedding` Ollama branch: server-reachability check, model verification, and `ollama pull` are each guarded with `fail_step`. The action region is wrapped in nested retry loops so the outer `StepRetry` re-runs server check + model select, and the inner `StepRetry` re-runs only the pull. - The misleading "Ollama not running. Starting 'ollama serve'..." message is dropped — the wizard never actually started the server, it only printed the line. The new path tells the user to start it themselves and offers retry once it is up. Incidental cleanup: the OpenAI branch's "API key invalid + decline to continue" path used `raise SystemExit(1)`. Switched to `WizardCancel` so cancellation is uniform with `q`. Exit code changes from 1 to 0 — this is a deliberate cancel, not an error. Out of scope (follow-up): silent-continue audit of other steps (`_step_memory_dir` dir creation, `_step_settings` writes), automatic backoff retry, and starting `ollama serve` from the wizard. Co-Authored-By: Claude <[email protected]>
…CHANGELOG Addresses self-review feedback on #661: - Drop the 2-line "Was raise SystemExit(1)" comment in init_cmd.py. CLAUDE.md prohibits removed-code comments; the context already lives in the commit history and PR body. - Add ``test_invalid_key_decline_continue_raises_wizard_cancel`` and ``test_invalid_key_decline_does_not_advance_to_next_step`` to pin the OpenAI bad-key + decline → ``WizardCancel`` change. The exit code 1 → 0 shift is now load-bearing in tests, so a future revert is loud. - Add ``test_has_model_none_r_retries_then_succeeds`` symmetric to the pull-failure retry test — covers the tri-state ``None`` retry path via the outer loop, complementing the existing ``b`` branch test. - Add a CHANGELOG entry under [Unreleased] / Changed documenting the wizard guardrail and the OpenAI exit-code behavior change so users monitoring exit codes notice on upgrade. Co-Authored-By: Claude <[email protected]>
Owner
Author
|
Admin-merging despite Evidence the failure is pre-existing and unrelated to this PR:
The flake is upstream of this PR and should be triaged separately. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
StepRetryexception andfail_step()helper towizard.pyso a failed wizard step prompts the user to retry / back / quit instead of silently advancing._step_embedding's Ollama branch — server reachability check, model verification, andollama pullare now guarded._ollama_has_modelnow returnsbool | None(tri-state) so callers distinguish "server unreachable / errored" from "model absent". The originalFalse-on-error conflation is the root cause of the [feature]: add guardrails for failed steps in themm initwizard #626 misread (a server timeout was being interpreted as "model not installed", triggering a doomed pull and silent fall-through to the Reranker step).Reproduction (the #626 scenario)
After this PR the user sees:
rre-runs only the pull (model selection is preserved).breturns to the previous step.qcancels the wizard via the same path as theqshortcut.Incidental scope
The OpenAI branch's "API key invalid + decline to continue" path used
raise SystemExit(1); switched toWizardCancelfor uniform cancel UX. Exit code on this path changes from 1 to 0 — this is a deliberate cancel by the user, not an error. No tests pinned the previous exit code.Out of scope (follow-up)
_step_memory_dirdir creation,_step_settingswrites, etc.). The mechanism added here can be reused per step in a follow-up PR.ollama servefrom the wizard.Test plan
uv run ruff check + format --check packages/memtomem/{src,tests}cleanuv run pytest -m "not ollama"— 3873 passedStepRetry/fail_stepchoice→exception mapping (12 cases) and the_ollama_has_modeltri-state contract (5 cases)mm initwizard #626 invariant:_step_embeddingdoes NOT silently advance to the next step on Ollama failure (sentinel next-step check)mm init --advancedwith a fakeollamabinary, in 5 scenarios:q) — red✗ Ollama server is not reachable.prompt fires, wizard cancels (exit 0). TheStarting 'ollama serve'...lie is gone.b) — falls back to(already at first step), re-runs_step_embeddingfrom the top. Wizard never advances past it.rwith server still broken +q) — outer retry loop re-runs the server check only; the embedding-provider menu does NOT re-render.r+ still fails +q) — inner retry loop re-runs only the pull action; the model-selection menu is NOT re-prompted.b) — wizard returns to the embedding choice;_step_reranker(next step) never runs. This is the core [feature]: add guardrails for failed steps in themm initwizard #626 invariant.🤖 Generated with Claude Code