-
Notifications
You must be signed in to change notification settings - Fork 2
test(memory): document_integration tests are flaky — Qdrant testcontainer timing race #2413
Copy link
Copy link
Closed
Labels
P3Research — medium-high complexityResearch — medium-high complexitybugSomething isn't workingSomething isn't workingmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)
Description
Problem
zeph-memory::document_integration tests fail intermittently in CI with assertion errors unrelated to code changes. Different tests fail on different runs:
ingested_chunks_have_correct_payload—assert_eq!(all.len(), 1)fails withleft: 0(run #23723176788, PR fix(skills): wire two_stage_matching and confusability_threshold at startup; migrate legacy bundled skills #2410)ingest_single_document— similar failure (run #23722894900, main branch)
The failures are non-deterministic and pass on rerun without any code changes.
Root cause
The tests use testcontainers to spin up a Qdrant instance. The race is likely between:
- Container startup / port readiness check completing
- Qdrant collection becoming ready to accept writes
scroll_allreturning 0 results because the ingest point was not yet flushed/indexed
ensure_collection does not wait for the collection to be fully ready before ingesting, and scroll_all may observe an empty state if Qdrant's internal flush hasn't completed.
Impact
- CI flakiness causes spurious PR failures requiring manual reruns
- Blocks CI-gate confidence
Suggested fix
Add a retry loop or readiness probe after ensure_collection before ingesting. Alternatively, use Qdrant's wait=true parameter on upsert to ensure the operation is acknowledged before the scroll query runs.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
P3Research — medium-high complexityResearch — medium-high complexitybugSomething isn't workingSomething isn't workingmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)