-
Notifications
You must be signed in to change notification settings - Fork 2
fix(memory): graph entity extraction populates zeph_graph_entities with structural noise instead of semantic facts #1912
Copy link
Copy link
Closed
Labels
bugSomething isn't workingSomething isn't workingmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)
Description
Problem
The zeph_graph_entities Qdrant collection (121 points) contains low-value structural tokens extracted from code and config rather than meaningful semantic entities about the user, their projects, and domain knowledge.
Observed examples (actual data from collection)
Extracted entities include:
type,provider_type,allowed_commands,recall_limit— TOML config keyshttps,orchestrator,vector_collections— generic termsgo— programming language with no contextsrc/— file path fragmentread_file,wget— tool names
Expected behavior
Graph entities should represent:
- People: user identity, collaborators
- Projects: repo names, purpose, tech stack
- Decisions: architectural choices, preferences
- Domain concepts: meaningful nouns from conversations
Root cause hypothesis
The LLM-based entity extractor in graph_commands.rs / GraphMemory is:
- Applied to tool outputs (code, config files) in addition to conversational messages, or
- Using a prompt that does not filter structural/syntactic tokens, or
- Applying extraction to raw JSON/TOML content that floods the entity graph with keys
Impact
- Entity graph is polluted — semantic search returns noise
- Graph-based recall (
fetch_graph_facts) retrieves irrelevant context - BFS traversal from meaningful nodes hits dead ends through noise nodes
- 121 noisy entities degrade rather than improve response quality
Suggested fix
- Limit entity extraction to conversational messages only (exclude tool results)
- Add entity type filter to extraction prompt: only extract Person, Project, Technology, Decision, Preference, Location
- Add minimum entity length filter (reject single words like "go", "type")
- Add deduplication pass with
EntityResolverbefore upsert
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingmemoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)