feat(memory): entity canonicalization with alias table (#1231)#1256
Merged
feat(memory): entity canonicalization with alias table (#1231)#1256
Conversation
Add canonical_name column to graph_entities and graph_entity_aliases lookup table. The entity resolver checks aliases before creating new entities, preventing duplicates from name variations (e.g., "Rust", "rust-lang" resolve to the same canonical entity). Migration 023 recreates graph_entities with UNIQUE(canonical_name, entity_type) constraint, backfills canonical_name from existing names, and seeds initial aliases. Uses @@DISABLE_TRANSACTION to allow PRAGMA foreign_keys guards during table recreation. Key changes: - Entity struct gains canonical_name field, new EntityAlias type - upsert_entity accepts surface_name (display) and canonical_name (key) - find_entity_by_alias filters by entity_type with ORDER BY id ASC - find_entities_fuzzy searches both entity names and aliases - graph_recall deduplicates by canonical_name for stable results - Alias input truncated to MAX_ENTITY_NAME_BYTES (512) security boundary Closes #1231
Update upsert_entity call sites in semantic.rs and context.rs tests from main branch to use the new 4-argument signature (surface_name, canonical_name, entity_type, summary).
Rename entity canonicalization migration from 023 to 024 to avoid collision with FTS5 migration merged from main. Rebuild FTS5 triggers in migration 024 after table recreation. Fix find_entities_fuzzy alias LIKE to use original query instead of FTS5-sanitized form. Add DB cleanup in bootstrap test to prevent stale migration version errors.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
canonical_namecolumn tograph_entitieswithUNIQUE(canonical_name, entity_type)constraintgraph_entity_aliaseslookup table mapping variant names to canonical entity IDsEntityResolver.resolve()with alias-first resolution pipelinegraph_recallresults by canonical name instead of surface nameCloses #1231
Epic: #1222
Details
Multiple surface forms (e.g., "Rust", "rust-lang", "Rust language") now resolve to the same canonical entity. The resolver normalizes input (trim, lowercase, strip control chars, truncate to 512 bytes), checks aliases first, then canonical name, and only creates a new entity if no match is found.
find_entity_by_aliasfilters by entity type and usesORDER BY id ASCfor deterministic first-registered-wins semantics.Migration 023 uses
-- @@DISABLE_TRANSACTIONto allowPRAGMA foreign_keys = OFFduring table recreation, preventing cascade deletion of graph edges.Files changed
crates/zeph-memory/migrations/023_entity_canonicalization.sql— new migrationcrates/zeph-memory/src/graph/types.rs—Entity.canonical_name,EntityAliasstructcrates/zeph-memory/src/graph/store.rs— alias CRUD methods, updated SELECT queriescrates/zeph-memory/src/graph/resolver.rs— alias-first resolve pipelinecrates/zeph-memory/src/graph/retrieval.rs— canonical-name dedup ingraph_recallcrates/zeph-memory/src/graph/mod.rs— re-exportEntityAliasdocs/src/concepts/graph-memory.md— entity aliases documentationcrates/zeph-memory/README.md— updated graph memory sectionCHANGELOG.md— added to[Unreleased]Test plan
cargo +nightly fmt --checkcleancargo clippy --workspace --features full -- -D warningscleancargo nextest run --workspace --features full --lib --bins— 4261 passed, 11 skipped