Summary
User search queries containing punctuation common in markdown / YAML / URLs / filesystem paths can crash BM25 with fts5: syntax error near ".". The error is caught and logged but the HTTP response stays 200, so users see degraded (dense-only) results without any UI signal.
Found by Playwright UX review of v0.1.34 prod (2026-05-02). See docs/reports/mm-web-prod-v0.1.34-playwright-review.md (P1 — BM25 search can fail on raw markdown/YAML-like queries).
Evidence
Sanitization gap — packages/memtomem/src/memtomem/storage/fts_tokenizer.py:18
_FTS5_SPECIAL_RE = re.compile(r'[*"()\-+^:]')
Missing characters that FTS5 treats as syntactically meaningful: . / \ < >. URLs (https://example.com), filesystem paths (a/b/c), dotted filenames (file.name.ext), and YAML/frontmatter (key: value, ---) all flow through unquoted and trip the parser.
Silent degradation — packages/memtomem/src/memtomem/search/pipeline.py:468-474
if use_bm25:
try:
bm25_results = await bm25_task
except Exception as exc:
logger.warning("BM25 search failed: %s", exc)
bm25_results = []
bm25_error = str(exc)
bm25_error lands in RetrievalStats but the web UI does not surface it. Users get dense-only results and have no signal that keyword search broke.
Tests — no FTS5-syntax-in-query regression cases in packages/memtomem/tests/. Queries with frontmatter, code spans, URLs, paths, punctuation are uncovered.
Suggested fix
- Extend
_FTS5_SPECIAL_RE to include . / \ < > (and re-audit the rest of the FTS5 special set against the sqlite docs).
- Add regression cases to
tests/ that pass each of: frontmatter (---\nkey: value), URL (https://example.com/path), dotted filename (file.name.ext), unix path (a/b/c), and a code-span fragment.
- Surface
bm25_error in the web UI as a non-blocking warning (e.g., "Keyword search degraded — using vector results only") rather than swallowing it. This matches the repo's loud-vs-silent invariant.
Notes for first-time contributors
(1)+(2) is a self-contained good first issue-sized change. (3) touches the web layer and is a separate follow-up — keep it out of this PR unless requested.
References
- Review:
docs/reports/mm-web-prod-v0.1.34-playwright-review.md
- Tracking umbrella: TBD (linked once opened)
Summary
User search queries containing punctuation common in markdown / YAML / URLs / filesystem paths can crash BM25 with
fts5: syntax error near ".". The error is caught and logged but the HTTP response stays200, so users see degraded (dense-only) results without any UI signal.Found by Playwright UX review of v0.1.34 prod (2026-05-02). See
docs/reports/mm-web-prod-v0.1.34-playwright-review.md(P1 — BM25 search can fail on raw markdown/YAML-like queries).Evidence
Sanitization gap —
packages/memtomem/src/memtomem/storage/fts_tokenizer.py:18Missing characters that FTS5 treats as syntactically meaningful:
./\<>. URLs (https://example.com), filesystem paths (a/b/c), dotted filenames (file.name.ext), and YAML/frontmatter (key: value,---) all flow through unquoted and trip the parser.Silent degradation —
packages/memtomem/src/memtomem/search/pipeline.py:468-474bm25_errorlands inRetrievalStatsbut the web UI does not surface it. Users get dense-only results and have no signal that keyword search broke.Tests — no FTS5-syntax-in-query regression cases in
packages/memtomem/tests/. Queries with frontmatter, code spans, URLs, paths, punctuation are uncovered.Suggested fix
_FTS5_SPECIAL_REto include./\<>(and re-audit the rest of the FTS5 special set against the sqlite docs).tests/that pass each of: frontmatter (---\nkey: value), URL (https://example.com/path), dotted filename (file.name.ext), unix path (a/b/c), and a code-span fragment.bm25_errorin the web UI as a non-blocking warning (e.g., "Keyword search degraded — using vector results only") rather than swallowing it. This matches the repo's loud-vs-silent invariant.Notes for first-time contributors
(1)+(2) is a self-contained
good first issue-sized change. (3) touches the web layer and is a separate follow-up — keep it out of this PR unless requested.References
docs/reports/mm-web-prod-v0.1.34-playwright-review.md