Skip to content

docs: slim READMEs, add configuration + embeddings guides#6

Merged
tsdata merged 1 commit intomainfrom
docs/restructure-readmes-with-guides
Apr 9, 2026
Merged

docs: slim READMEs, add configuration + embeddings guides#6
tsdata merged 1 commit intomainfrom
docs/restructure-readmes-with-guides

Conversation

@tsdata
Copy link
Copy Markdown
Collaborator

@tsdata tsdata commented Apr 9, 2026

Summary

Both READMEs were doubling as user manuals. The PyPI inner README (377 lines) included a 47-line env var reference and 100 lines of tool usage examples; the top-level GitHub README (251 lines) duplicated much of it. New users had to scroll past hundreds of lines of reference material to figure out whether memtomem was for them.

This PR slims both READMEs to landing-page form, extracts the reference material into dedicated guides under docs/guides/, and verifies every documented command, flag, and env var against the actual source code first.

Sizes

File Before After Δ
packages/memtomem/README.md (PyPI) 377 87 -77%
README.md (top-level / GitHub) 251 129 -49%
docs/guides/configuration.md (NEW) 106 new
docs/guides/embeddings.md (NEW) 77 new

What's in the new READMEs

packages/memtomem/README.md (PyPI landing page)

Tagline → "Built for:" personas → install (MCP server + optional CLI) → 3-call quickstart → key features list with doc links → documentation tree (absolute GitHub URLs because PyPI doesn't resolve relative links to the repo) → license. That's it.

README.md (GitHub landing page)

Existing "Why memtomem?" table → 3-step quickstart → features list (now with optional STM pointer) → documentation tree → contributing block. Removed the embedded embedding-models table and glossary; those moved into docs/guides.

What moved out

Removed from Moved to
47-line "Environment Variables" section (PyPI README) docs/guides/configuration.md
30-line "Embedding Providers" section (PyPI README) + 6-line "Embedding Models" table (top-level README) docs/guides/embeddings.md
100-line "Key Tool Usage Examples" section (PyPI README) already covered by docs/guides/user-guide.md (deleted to avoid drift)
16-line "CLI Usage" section (PyPI README) already covered by docs/guides/getting-started.md (deleted)
12-line MCP tool category tables (PyPI README) already covered by docs/guides/user-guide.md
"Glossary" (top-level README) already covered by docs/guides/user-guide.md (deleted)

"Docs as tests" — verified before writing

  • All mm subcommands listed in CLI references exist in mm --help: add, config, context, embedding-reset, index, init, recall, search, shell, watchdog, web. (No mm stm — already removed in docs: post-STM-extraction cleanup + ground-truth count fixes #5.)
  • mm embedding-reset --mode flags status, apply-current, revert-to-stored all exist (verified via --help).
  • mm context subcommands: detect, diff, generate, init, sync.
  • All 22 MEMTOMEM_* env vars in configuration.md match pydantic config classes (StorageConfig, EmbeddingConfig, SearchConfig, IndexingConfig, DecayConfig, MMRConfig, NamespaceConfig, ContextWindowConfig). Env prefix MEMTOMEM_, nested delimiter __, both confirmed via Mem2MemConfig.model_config.
  • All embedding model dimensions match what each provider actually returns (nomic-embed-text 768, bge-m3 1024, text-embedding-3-small 1536, text-embedding-3-large 3072).
  • Tool counts (72 / 9 / ~32 / 63 actions) match the values verified in docs: post-STM-extraction cleanup + ground-truth count fixes #5 against tool_registry.ACTIONS and full-mode tool count.
  • Test count 886 verified via pytest --co -q.
  • [all] extra exists in packages/memtomem/pyproject.toml.
  • All guide links in both READMEs point to existing docs/guides/*.md files.

Local checks

  • ruff check packages/memtomem/src → All checks passed
  • pytest --co → 886 tests collected

Test plan

  • CI lint job passes
  • CI typecheck job runs
  • CI test job passes (886 tests, no code changes)
  • PyPI README preview renders (the new layout uses absolute GitHub URLs since PyPI doesn't resolve relative links)

🤖 Generated with Claude Code

Both READMEs were doubling as user manuals. The PyPI inner README (377
lines) included a 47-line env var reference and 100 lines of tool
usage examples; the top-level GitHub README (251 lines) duplicated
much of it. New users had to scroll past hundreds of lines of
reference material to figure out whether memtomem was for them.

This PR slims both READMEs to landing-page form, extracts the
reference material into dedicated guides under docs/guides/, and
verifies every documented command, flag, and env var against the
actual source code first.

Sizes:

- packages/memtomem/README.md (PyPI):  377 → 87 lines  (-77%)
- README.md (top-level / GitHub):      251 → 129 lines (-49%)
- docs/guides/configuration.md (NEW):  106 lines
- docs/guides/embeddings.md (NEW):     77 lines

What's in the new READMEs:

- packages/memtomem/README.md: tagline, "Built for:" personas,
  install (MCP server + optional CLI), 3-call quickstart, key
  features list with doc links, documentation tree (absolute GitHub
  URLs because PyPI doesn't resolve relative links to the repo),
  license. That's it.
- README.md: existing "Why memtomem?" table, 3-step quickstart,
  features list (now with optional STM pointer), documentation tree,
  contributing block. Removed the embedded embedding-models table
  and glossary; those moved into docs/guides.

What moved out:

- 47-line "Environment Variables" section (PyPI README)
  → docs/guides/configuration.md
- 30-line "Embedding Providers" section (PyPI README)
  + 6-line "Embedding Models" table (top-level README)
  → docs/guides/embeddings.md
- 100-line "Key Tool Usage Examples" section (PyPI README)
  → already covered by docs/guides/user-guide.md (deleted to avoid
    drift)
- 16-line "CLI Usage" section (PyPI README)
  → already covered by docs/guides/getting-started.md (deleted)
- 12-line MCP tool category tables (PyPI README)
  → already covered by docs/guides/user-guide.md
- "Glossary" (top-level README)
  → already covered by docs/guides/user-guide.md (deleted)

"Docs as tests" — verified before writing:

- mm subcommands listed in CLI reference all exist in `mm --help`:
  add, config, context, embedding-reset, index, init, recall, search,
  shell, watchdog, web. (No `mm stm` — already removed in #5.)
- `mm embedding-reset` --mode flags: `status`, `apply-current`,
  `revert-to-stored` all exist (verified via --help).
- `mm context` subcommands: detect, diff, generate, init, sync.
- All 22 MEMTOMEM_* env vars in configuration.md match pydantic
  config classes (StorageConfig, EmbeddingConfig, SearchConfig,
  IndexingConfig, DecayConfig, MMRConfig, NamespaceConfig,
  ContextWindowConfig). Env prefix `MEMTOMEM_`, nested delimiter
  `__`, both confirmed via Mem2MemConfig.model_config.
- All embedding model dimensions match what each provider actually
  returns (nomic-embed-text 768, bge-m3 1024, text-embedding-3-small
  1536, text-embedding-3-large 3072).
- Tool counts (72 / 9 / ~32 / 63 actions) match the values verified
  in #5 against tool_registry.ACTIONS and full-mode tool count.
- Test count 886 verified via `pytest --co -q`.
- `[all]` extra exists in packages/memtomem/pyproject.toml.
- All guide links in both READMEs point to existing
  docs/guides/*.md files.

Local checks:
- ruff check packages/memtomem/src   → All checks passed
- pytest --co                        → 886 tests collected

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
@tsdata tsdata merged commit 5074ba7 into main Apr 9, 2026
@tsdata tsdata deleted the docs/restructure-readmes-with-guides branch April 9, 2026 13:40
memtomem pushed a commit that referenced this pull request Apr 29, 2026
… in toast, align canonical_root style

Follow-up to PR #549 review:

- **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map.
  GET `/api/context/{skills,commands,agents}` now also returns
  `canonical_root: str` and `scanned_dirs: list[str]` (computed once at
  module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the
  empty-state hint pulls from the wire instead of duplicating the
  detector layout client-side. The JS still has a one-line fallback
  (`.memtomem/${type}` + empty list) for older backends, but it never
  fires when the cache-bust takes effect.
- **#2 (cosmetic)**: skills sync response now uses the
  `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly,
  matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern
  in agents/commands routes. Drops the `_safe_rel(canonical_skills_root(
  project_root), project_root)` round-trip — both forms resolve to the
  same string today, but a constant won't drift if a future PR adds env
  / scope resolution to `canonical_skills_root()`.
- **#3 (low)**: import-no-runtimes toast renders
  `basename(project_root)` instead of the absolute path. The wire still
  carries the absolute string (consistent with `/api/system/*` and
  useful for debugging / reverse-proxy contexts) but the user-facing
  toast keeps it short — `scanned_dirs` already gives full orientation.
  New `_ctxBasename()` helper handles the POSIX path split.

Cache-bust bumped to `context-gateway.js?v=3`.

Tests:
- New assertions on `TestListSkills.test_empty`,
  `TestListCommands.test_empty`, `TestListAgents.test_empty` verify
  GET response now includes `canonical_root` + `scanned_dirs`.
- `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format
  --check` clean.

Skipped per reviewer guidance:
- #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory).
- #5 additional positive route assertions for `invalid_name` /
  `already_imported` / `toml_parse_error` (route layer is a trivial
  dict comprehension; covered at the core layer).
- #6 Korean `<이름>` literal angle-bracket (intentional placeholder
  guide, not an unsubstituted variable).

Co-Authored-By: Claude <[email protected]>
memtomem added a commit that referenced this pull request Apr 29, 2026
* fix(web): surface "why nothing happened" on Skills/Commands/Agents Sync+Import

Users reported "Sync 도 Import 도 동작이 이상함" — both buttons in
mm web → Settings → Skills/Commands/Agents looked successful (a green
toast popped) but nothing actually moved on disk. The root cause was a
shape mismatch in the response handling, not a wiring bug:

- For skills, the sync response carries `skipped` (e.g. "no canonical
  skills") but never `dropped`. The client only read `dropped`, so the
  skipped reason was silently dropped on the floor.
- The import response carried no metadata at all, so a 0-imported,
  0-skipped result (the common case when no `.claude/skills/` exists in
  the project) just rendered "Import completed" with no clue that the
  scanner found nothing because the directories didn't exist.

This change keeps cwd-bound single-project behavior unchanged and adds
machine-readable plumbing the UI can match against:

- All three context types' `Sync` and `Import` core layers
  (`memtomem.context.{skills,commands,agents}`) now record skipped items
  as `(name, reason, reason_code)` triples. Reason codes come from a new
  `memtomem.context._skip_reasons` module: `no_canonical_root`,
  `unknown_runtime`, `parse_error`, `invalid_name`, `already_imported`,
  `canonical_exists`, `toml_parse_error`. Human `reason` strings stay
  for CLI/log output.
- Web route handlers (`web/routes/context_{skills,commands,agents}.py`)
  surface `reason_code` per skipped item, plus `canonical_root` on sync
  responses and `project_root` + `scanned_dirs` on import responses.
- The web client (`context-gateway.js`) reads `data.skipped` (in
  addition to `data.dropped` for commands/agents field-level drops),
  matches on `reason_code === "no_canonical_root"` to show
  "No canonical {type} under {canonical}. Create one first." instead of
  a generic success, and on a 0/0 import shows "No runtime {type} found
  in {project_root}. Scanned: {scan_dirs}." so users see exactly which
  paths the detector inspected.
- The empty list state hint now points at the canonical and runtime
  paths instead of "Create one or import from existing runtimes." — a
  fresh user can drop a `SKILL.md` into the right directory without
  guessing.
- Three new placeholder-form i18n keys (`empty_hint`,
  `import_no_runtimes`, `sync_empty_canonical`) cover all three types
  via `{type}` / `{canonical}` / `{root}` / `{scan_dirs}` substitution
  rather than 9 type-specific keys.
- `context-gateway.js?v=2` cache-bust.

All response changes are additive — existing clients that only read
`imported`/`generated`/`skipped[].name|runtime|reason` keep working
unchanged. The CLI (`mm context skills/agents/commands ...`) and the
MCP `context` tool also unpack the new triple shape via `_code` discard
so their human-readable output is byte-identical.

Tests:
- 3 core test suites (`test_context_{skills,commands,agents}.py`)
  updated to the 3-tuple shape.
- New web route assertions verify `reason_code`, `canonical_root`,
  `project_root`, and `scanned_dirs` are present in sync/import
  responses for the empty-canonical and empty-runtime paths.
- Full suite green: `pytest -m "not ollama"` 3152 passed; `ruff check`
  + `ruff format --check` clean.

Co-Authored-By: Claude <[email protected]>

* fix(web/context): address review — surface scan dirs on GET, basename in toast, align canonical_root style

Follow-up to PR #549 review:

- **#1 (low-medium)**: drop the hardcoded `_CTX_EMPTY_HINT_META` JS map.
  GET `/api/context/{skills,commands,agents}` now also returns
  `canonical_root: str` and `scanned_dirs: list[str]` (computed once at
  module import from `SKILL_DIRS` / `AGENT_DIRS` / `COMMAND_DIRS`), so the
  empty-state hint pulls from the wire instead of duplicating the
  detector layout client-side. The JS still has a one-line fallback
  (`.memtomem/${type}` + empty list) for older backends, but it never
  fires when the cache-bust takes effect.
- **#2 (cosmetic)**: skills sync response now uses the
  `CANONICAL_SKILL_ROOT = ".memtomem/skills"` constant directly,
  matching the `CANONICAL_AGENT_ROOT` / `CANONICAL_COMMAND_ROOT` pattern
  in agents/commands routes. Drops the `_safe_rel(canonical_skills_root(
  project_root), project_root)` round-trip — both forms resolve to the
  same string today, but a constant won't drift if a future PR adds env
  / scope resolution to `canonical_skills_root()`.
- **#3 (low)**: import-no-runtimes toast renders
  `basename(project_root)` instead of the absolute path. The wire still
  carries the absolute string (consistent with `/api/system/*` and
  useful for debugging / reverse-proxy contexts) but the user-facing
  toast keeps it short — `scanned_dirs` already gives full orientation.
  New `_ctxBasename()` helper handles the POSIX path split.

Cache-bust bumped to `context-gateway.js?v=3`.

Tests:
- New assertions on `TestListSkills.test_empty`,
  `TestListCommands.test_empty`, `TestListAgents.test_empty` verify
  GET response now includes `canonical_root` + `scanned_dirs`.
- `pytest -m "not ollama"` 3152 passed; `ruff check` + `ruff format
  --check` clean.

Skipped per reviewer guidance:
- #4 `Final[str]` → `Literal` / `StrEnum` (mypy is advisory).
- #5 additional positive route assertions for `invalid_name` /
  `already_imported` / `toml_parse_error` (route layer is a trivial
  dict comprehension; covered at the core layer).
- #6 Korean `<이름>` literal angle-bracket (intentional placeholder
  guide, not an unsubstituted variable).

Co-Authored-By: Claude <[email protected]>

* refactor(context): narrow reason_code to Literal SkipCode

Tightens the typing on the third element of `(name, reason, reason_code)`
triples produced by `SkillSyncResult.skipped`, `CommandSyncResult.skipped`,
`AgentSyncResult.skipped`, and `ExtractResult.skipped`. The new
`memtomem.context._skip_reasons.SkipCode` is a `Literal` of the seven
codes already enumerated in that module, so a typo at the construction
site fails type-check instead of slipping through to the wire.

Also strengthens the import-empty web-route tests for skills, commands,
and agents to assert `data["project_root"] == str(tmp_path)` instead of
just key presence — confirms the response actually echoes the
fixture's project root rather than some unrelated path.

No runtime behavior change; addresses follow-up polish flagged in the
review of this PR.

Co-Authored-By: Claude <[email protected]>

---------

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem added a commit that referenced this pull request Apr 30, 2026
``IndexEngine.index_path_stream`` and the ``GET /api/index/stream``
endpoint were missing two contract elements that ``index_path`` /
``POST /api/index`` already had: the ``namespace`` parameter (silently
dropped, so streamed indexes ignored user namespace selection) and
error aggregation in the ``complete`` event (per-file
``result["errors"]`` was summed only into chunk counters and never
emitted, so partial-failure runs appeared successful in the UI).

Engine: thread ``namespace`` through to ``_index_file``, accumulate
per-file errors, include them as ``errors: list[str]`` in the
``complete`` event — same loose shape as ``IndexingStats.errors`` so
non-stream UI handlers reuse verbatim. Path-prefix the stream-level
uncaught-exception branch (``f"{fp.name}: {exc}"``) to match the
non-stream ``asyncio.gather(return_exceptions=True)`` branch.

Route: forward the new ``namespace`` query param to the engine.

Tests: ``test_index_path_stream_namespace_propagates`` (chunks gain
``metadata.namespace`` for the streamed run) and
``test_index_path_stream_complete_errors_no_silent_drop`` (the binary
file triggers a per-file error that surfaces in ``complete.errors``).
The test names document intent so future grep finds the regression
guards.

Out of scope: per-file streaming of errors (option ii/iii from the
planning discussion); normalizing ``_index_file``'s mixed
path-prefix convention. Both noted in the issue.

Unblocks #582 PR-B (Prev #1 — Index tab two-button collapse) and
informs #582 PR #6 (4.11 — indexing in-flight visibility).

Refs #590.

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
memtomem added a commit that referenced this pull request Apr 30, 2026
…592)

The folder-mode panel exposed two side-by-side buttons — primary
``Index`` (POST /api/index) and ghost ``Index with Progress`` (SSE
stream). The two-button shape made users guess which one to pick and
diluted the form's primary action. Drop the non-stream button entirely
and route the single primary ``인덱스 / Index`` button through the
streaming handler so progress is always visible.

This is option (a) from the umbrella discussion (Prev #1). Option (b)
— single button + a "show progress" toggle — would have kept two
indexing paths alive and required a more complex ``STATE.indexing``
shape in #582 PR #6 (4.11). With (a) the indexing path is unified;
PR #6 can use a single boolean flag.

Made possible by #591 (closes #590), which gave ``index_path_stream``
the missing ``namespace`` parameter and a ``complete.errors`` field —
both consumed here. The streaming handler now matches the full UX of
the removed inline POST handler:

- Reads ``namespace`` from the form input and forwards it as a query
  param when non-empty.
- On ``complete``: renders ``r-errors-row`` with the same 5-cap
  "+N more" rule that the inline handler used (#354 partial-failure
  surface), then dispatches either ``toast.index_partial`` (red, with
  error count + first error) or ``toast.indexed_count`` (green, with
  the "Register as Source" toast action — folder mode is one-shot,
  so this nudge mirrors the inline behavior).

Removes ``index.stream_btn``, ``toast.stream_complete``, and the
``.index-btn-row`` CSS rule (now wraps a single button → row container
unnecessary). Updates ``test_i18n.py``'s required key set so the
parity guard stays green.

Refs #582 (Prev #1).

Co-authored-by: pandas-studio <[email protected]>
Co-authored-by: Claude <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant