Skip to content

bug(memory): tier promotion intermittent SQLite lock contention causes merge_failures #2511

@bug-ops

Description

@bug-ops

Summary

During live testing (CI-350, 2026-03-31), the tier promotion background task intermittently fails with a database is locked (SQLite error code 5) error:

WARN zeph_memory::tiers: tier promotion: cluster merge failed, skipping cluster_size=2 error=database error: error returned from database: (code: 5) database is locked
INFO zeph_memory::tiers: tier promotion: sweep complete candidates=7 clusters=1 promoted=0 merge_failures=1 elapsed_ms=2379
INFO zeph_memory::tiers: tier promotion: sweep complete candidates=7 clusters=1 promoted=1 merge_failures=0 elapsed_ms=1894

The merge failure is transient — the next sweep succeeds. The failure occurs when the tier promotion background writer contends with the main agent loop writer on the SQLite file DB (pool_size=5, busy_timeout=5s).

Root Cause

merge_cluster_and_promote() runs inside a background tokio task while the main agent loop holds a write transaction. WAL checkpointing or exclusive write locks by the agent loop can cause the background tier task to hit the OS-level lock before SQLite's busy_timeout kicks in.

Impact

  • Tier promotion is skipped for that sweep but retried next interval (every ~30s)
  • Memory consolidation is delayed, not lost
  • merge_failures counter increments — no metric/alert exposed to user

Steps to Reproduce

  1. Use testing.toml with [memory] graph enabled
  2. Seed multiple memory entries across sessions
  3. Run a session that triggers both main-loop writes (graph entity extraction) and background tier promotion
  4. Observe WARN zeph_memory::tiers: tier promotion: cluster merge failed

Expected Behavior

Tier promotion should not fail with lock contention errors. Either:

  • Increase busy_timeout for the tier promotion transaction
  • Use BEGIN DEFERRED instead of BEGIN IMMEDIATE for tier reads
  • Or add exponential backoff retry inside merge_cluster_and_promote

Config

  • testing.toml, file DB (not in-memory)
  • graph memory enabled
  • tier promotion background task active

Metadata

Metadata

Assignees

Labels

P3Research — medium-high complexitybugSomething isn't workingmemoryzeph-memory crate (SQLite)

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions