Skip to content

fix: guard against empty ChromaDB query results in search paths#305

Open
dr-robert-li wants to merge 1 commit intoMemPalace:developfrom
dr-robert-li:fix/195-empty-chromadb-results
Open

fix: guard against empty ChromaDB query results in search paths#305
dr-robert-li wants to merge 1 commit intoMemPalace:developfrom
dr-robert-li:fix/195-empty-chromadb-results

Conversation

@dr-robert-li
Copy link
Copy Markdown

What does this PR do?

Guards searcher.search, searcher.search_memories, layers.Layer3.search, and layers.Layer3.search_raw against empty ChromaDB query results. ChromaDB 1.x can return {documents: []} (empty outer list) for queries against a fresh collection or filter that excludes everything; the pre-fix code did results['documents'][0] and crashed with IndexError.

ChromaDB 0.6.x returns {documents: [[]]} (empty inner) which the existing if not docs check already handled, so this only manifests on 1.x — but the guard hardens both shapes for free.

Fixes #195.

How to test

python -m pytest tests/test_searcher.py::test_search_memories_handles_empty_outer_list                  tests/test_searcher.py::test_layer3_search_raw_handles_empty_outer_list -v

Tests inject the empty-outer shape via mock so they exercise the guard regardless of installed ChromaDB version. Without the fix they raise IndexError: list index out of range.

Checklist

  • Tests pass (python -m pytest tests/ -v) — 119 passed, 0 failed
  • No hardcoded paths
  • Linter passes (ruff check .)

ChromaDB 1.x can return {documents: []} (empty outer list) for queries against a fresh collection or filter that excludes everything. The pre-fix code did results['documents'][0] without guarding the outer list, raising IndexError. ChromaDB 0.6.x returns {documents: [[]]} (empty inner) which the existing 'if not docs' check already handled, so this manifests only on 1.x.

Fixed in searcher.search, searcher.search_memories, layers.Layer3.search, and layers.Layer3.search_raw. Regression tests inject the empty-outer shape via mock to exercise the guard regardless of installed ChromaDB version.

Closes MemPalace#195
@bgauryy
Copy link
Copy Markdown

bgauryy commented Apr 8, 2026

PR Review: fix: guard against empty ChromaDB query results in search paths

Executive Summary

Aspect Value
PR Goal Guard 4 search paths against IndexError when ChromaDB 1.x returns {documents: []} instead of {documents: [[]]}
Files Changed 3
Risk Level 🟢 LOW - purely defensive guards, no behavior change on success path
Review Effort 1 - small, focused fix with clear pattern
Recommendation ✅ APPROVE

Affected Areas: mempalace/searcher.py (search, search_memories), mempalace/layers.py (Layer3.search, Layer3.search_raw)

Business Impact: Prevents user-facing crashes when searching an empty or freshly initialized palace, or when ChromaDB filters exclude all results.

Flow Changes: None for the success path. The empty-result path now returns gracefully instead of crashing with IndexError.

Ratings

Aspect Score
Correctness 5/5
Security 5/5
Performance 5/5
Maintainability 4/5

PR Health

The Bug

ChromaDB 0.6.x returns {documents: [[]]} (empty inner list) for zero-result queries. The existing if not docs: guard (after docs = results["documents"][0]) handled this correctly.

ChromaDB 1.x returns {documents: []} (empty outer list). The pre-fix code did results["documents"][0] before any check, causing IndexError: list index out of range.

The Fix

Guard pattern applied consistently across all 4 paths:

if not results.get("documents") or not results["documents"][0]:
    return <appropriate empty response>

This guard handles:

  • {documents: []} — empty outer (ChromaDB 1.x) → not [] is True → early return
  • {documents: [[]]} — empty inner (ChromaDB 0.6.x) → passes first check, not [] on [0] → early return
  • Missing documents key (theoretical) → .get() returns None → early return

The old if not docs: check after [0] access is correctly removed as redundant.

Medium Priority Issues

🎨 #1: Missing test for Layer3.search()

Location: tests/test_searcher.py | Confidence: ⚠️ MED

Tests cover search_memories and Layer3.search_raw but not Layer3.search(), which also received the guard. Since Layer3.search() returns a string (not sys.exit), it's trivially testable with the same mock pattern:

def test_layer3_search_handles_empty_outer_list(monkeypatch, tmp_path):
    """Issue #195: Layer3.search also affected by the empty-outer shape."""
    _mock_empty_chromadb(monkeypatch, "mempalace.layers")
    result = Layer3(palace_path=str(tmp_path)).search("anything")
    assert result == "No results found."

Skipping searcher.search() is understandable since it uses sys.exit(1) on errors, making it harder to test without capturing process exit.


Verification Notes

  • Test mock targets are correct: Both searcher.py and layers.py import chromadb directly on the main branch, so monkeypatch.setattr(f"{module}.chromadb.PersistentClient", ...) resolves correctly.
  • Return values are consistent: Each guarded path returns the same type/shape as the function's normal empty-result behavior (string for Layer3.search, list for search_raw, dict for search_memories, None for CLI search).
  • No regression risk: The guard is pure short-circuit logic that only activates on the previously-crashing input shape.

Created by Octocode MCP https://octocode.ai 🔍🐙

Perseusxrltd added a commit to Perseusxrltd/mnemion that referenced this pull request Apr 9, 2026
- pyproject.toml: widen chromadb to <2.0 for Python 3.14 compat (MemPalace#302)
- config.py + miner/convo_miner/mcp_server: add hnsw:space=cosine so
  similarity = 1 - distance stays in [0,1] instead of negative L2 (MemPalace#304)
- searcher.py + layers.py: guard against ChromaDB 1.x empty-outer query
  results (IndexError on fresh collections) (MemPalace#305)
- mcp_server.py: redirect stdout→stderr at import to protect JSON-RPC
  wire from chromadb/posthog chatter (MemPalace#306)
- mcp_server.py: replace 10k-limited col.get with paginating
  _iter_all_metadatas helper; stop swallowing errors silently (MemPalace#307)
- mcp_server.py: drop undocumented wait_for_previous arg injected by
  Gemini MCP clients (MemPalace#322)
- searcher.py + hybrid_searcher.py + mcp_server.py: add min_similarity
  threshold filter so callers get a clean "no results" signal (MemPalace#350)
- entity_detector.py: add CODE_KEYWORDS blocklist (~80 terms) to stop
  Rust types / React / framework names being detected as entities (MemPalace#349)

Co-Authored-By: Claude Sonnet 4.6 <[email protected]>
@milla-jovovich
Copy link
Copy Markdown
Collaborator

Hey @dr-robert-li — I've taken a look and ran it through CLI. Thanks for the fix! There are three open PRs addressing #195 (empty ChromaDB query results) and going with #197 as the most focused diff. Closing this one as superseded — the bug is still getting fixed, just via the other PR. Really appreciate the contribution. 💜

@bensig
Copy link
Copy Markdown
Collaborator

bensig commented Apr 9, 2026

Reopening — this was closed too quickly and your work deserves proper review. Standing by for merge once we clear the current security baseline. Thank you for the contribution.

@bensig bensig reopened this Apr 9, 2026
Copy link
Copy Markdown

@web3guru888 web3guru888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good defensive fix. The guard against {documents: []} (empty outer list) is necessary for ChromaDB 1.x compatibility, and the approach of checking before [0] indexing is correct.

Coverage is comprehensive — all 4 search paths are guarded:

  • searcher.search() — CLI search
  • searcher.search_memories() — library/MCP search
  • layers.Layer3.search() — memory stack search
  • layers.Layer3.search_raw() — raw result search

The guard logic is consistent across all paths:

if not results.get("documents") or not results["documents"][0]:

This handles both {documents: []} (1.x) and {documents: [[]]} (0.6.x) in one check, which is nice.

One subtle note: The PR moves the guard before the docs = results["documents"][0] assignment and removes the old if not docs check that came after the assignment. This is correct — the old check was dead code for 1.x since it would crash before reaching it. The new placement prevents the crash entirely.

Test approach is good — using mocks to inject the empty-outer shape means the tests work regardless of installed ChromaDB version. This is the right way to test version-dependent behavior.

We pin chromadb >=0.6.0,<0.7.0 in our integration so we haven't hit this specific shape, but the guard is forward-compatible and low-risk. Clean fix.

🔭 Reviewed as part of the MemPalace-AGI integration project — autonomous research with perfect memory. Community interaction updates are posted regularly on the dashboard.

@bensig bensig changed the base branch from main to develop April 11, 2026 22:22
@igorls igorls added area/search Search and retrieval bug Something isn't working labels Apr 14, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area/search Search and retrieval bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

IndexError when ChromaDB query returns empty results

6 participants