Skip to content

test(memory): document_integration tests are flaky — Qdrant testcontainer timing race #2413

@bug-ops

Description

@bug-ops

Problem

zeph-memory::document_integration tests fail intermittently in CI with assertion errors unrelated to code changes. Different tests fail on different runs:

The failures are non-deterministic and pass on rerun without any code changes.

Root cause

The tests use testcontainers to spin up a Qdrant instance. The race is likely between:

  1. Container startup / port readiness check completing
  2. Qdrant collection becoming ready to accept writes
  3. scroll_all returning 0 results because the ingest point was not yet flushed/indexed

ensure_collection does not wait for the collection to be fully ready before ingesting, and scroll_all may observe an empty state if Qdrant's internal flush hasn't completed.

Impact

  • CI flakiness causes spurious PR failures requiring manual reruns
  • Blocks CI-gate confidence

Suggested fix

Add a retry loop or readiness probe after ensure_collection before ingesting. Alternatively, use Qdrant's wait=true parameter on upsert to ensure the operation is acknowledged before the scroll query runs.

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingmemoryzeph-memory crate (SQLite)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions