Skip to content

feat(memory): entity canonicalization with alias table (#1231)#1256

Merged
bug-ops merged 5 commits intomainfrom
feat-m33-entity-canonicalization
Mar 6, 2026
Merged

feat(memory): entity canonicalization with alias table (#1231)#1256
bug-ops merged 5 commits intomainfrom
feat-m33-entity-canonicalization

Conversation

@bug-ops
Copy link
Copy Markdown
Owner

@bug-ops bug-ops commented Mar 6, 2026

Summary

  • Add canonical_name column to graph_entities with UNIQUE(canonical_name, entity_type) constraint
  • Add graph_entity_aliases lookup table mapping variant names to canonical entity IDs
  • Rewrite EntityResolver.resolve() with alias-first resolution pipeline
  • Deduplicate graph_recall results by canonical name instead of surface name
  • Migration 023 backfills existing entities and seeds initial aliases

Closes #1231
Epic: #1222

Details

Multiple surface forms (e.g., "Rust", "rust-lang", "Rust language") now resolve to the same canonical entity. The resolver normalizes input (trim, lowercase, strip control chars, truncate to 512 bytes), checks aliases first, then canonical name, and only creates a new entity if no match is found. find_entity_by_alias filters by entity type and uses ORDER BY id ASC for deterministic first-registered-wins semantics.

Migration 023 uses -- @@DISABLE_TRANSACTION to allow PRAGMA foreign_keys = OFF during table recreation, preventing cascade deletion of graph edges.

Files changed

  • crates/zeph-memory/migrations/023_entity_canonicalization.sql — new migration
  • crates/zeph-memory/src/graph/types.rsEntity.canonical_name, EntityAlias struct
  • crates/zeph-memory/src/graph/store.rs — alias CRUD methods, updated SELECT queries
  • crates/zeph-memory/src/graph/resolver.rs — alias-first resolve pipeline
  • crates/zeph-memory/src/graph/retrieval.rs — canonical-name dedup in graph_recall
  • crates/zeph-memory/src/graph/mod.rs — re-export EntityAlias
  • docs/src/concepts/graph-memory.md — entity aliases documentation
  • crates/zeph-memory/README.md — updated graph memory section
  • CHANGELOG.md — added to [Unreleased]

Test plan

  • 17 new tests (8 store, 7 resolver, 1 migration backfill, 1 alias conflict)
  • Migration backfill test validates edges survive table recreation
  • cargo +nightly fmt --check clean
  • cargo clippy --workspace --features full -- -D warnings clean
  • cargo nextest run --workspace --features full --lib --bins — 4261 passed, 11 skipped

Add canonical_name column to graph_entities and graph_entity_aliases
lookup table. The entity resolver checks aliases before creating new
entities, preventing duplicates from name variations (e.g., "Rust",
"rust-lang" resolve to the same canonical entity).

Migration 023 recreates graph_entities with UNIQUE(canonical_name,
entity_type) constraint, backfills canonical_name from existing names,
and seeds initial aliases. Uses @@DISABLE_TRANSACTION to allow PRAGMA
foreign_keys guards during table recreation.

Key changes:
- Entity struct gains canonical_name field, new EntityAlias type
- upsert_entity accepts surface_name (display) and canonical_name (key)
- find_entity_by_alias filters by entity_type with ORDER BY id ASC
- find_entities_fuzzy searches both entity names and aliases
- graph_recall deduplicates by canonical_name for stable results
- Alias input truncated to MAX_ENTITY_NAME_BYTES (512) security boundary

Closes #1231
@github-actions github-actions bot added documentation Improvements or additions to documentation memory zeph-memory crate (SQLite) rust Rust code changes enhancement New feature or request size/XL Extra large PR (500+ lines) labels Mar 6, 2026
Update upsert_entity call sites in semantic.rs and context.rs tests
from main branch to use the new 4-argument signature (surface_name,
canonical_name, entity_type, summary).
@github-actions github-actions bot added the core zeph-core crate label Mar 6, 2026
bug-ops added 2 commits March 6, 2026 03:06
Rename entity canonicalization migration from 023 to 024 to avoid
collision with FTS5 migration merged from main. Rebuild FTS5 triggers
in migration 024 after table recreation. Fix find_entities_fuzzy alias
LIKE to use original query instead of FTS5-sanitized form. Add DB
cleanup in bootstrap test to prevent stale migration version errors.
@bug-ops bug-ops enabled auto-merge (squash) March 6, 2026 02:47
@bug-ops bug-ops merged commit 76f379b into main Mar 6, 2026
28 checks passed
@bug-ops bug-ops deleted the feat-m33-entity-canonicalization branch March 6, 2026 03:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core zeph-core crate documentation Improvements or additions to documentation enhancement New feature or request memory zeph-memory crate (SQLite) rust Rust code changes size/XL Extra large PR (500+ lines)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat(memory): entity canonicalization with alias table

1 participant