Skip to content

fix(memory): graph entity extraction populates zeph_graph_entities with structural noise instead of semantic facts #1912

@bug-ops

Description

@bug-ops

Problem

The zeph_graph_entities Qdrant collection (121 points) contains low-value structural tokens extracted from code and config rather than meaningful semantic entities about the user, their projects, and domain knowledge.

Observed examples (actual data from collection)

Extracted entities include:

  • type, provider_type, allowed_commands, recall_limit — TOML config keys
  • https, orchestrator, vector_collections — generic terms
  • go — programming language with no context
  • src/ — file path fragment
  • read_file, wget — tool names

Expected behavior

Graph entities should represent:

  • People: user identity, collaborators
  • Projects: repo names, purpose, tech stack
  • Decisions: architectural choices, preferences
  • Domain concepts: meaningful nouns from conversations

Root cause hypothesis

The LLM-based entity extractor in graph_commands.rs / GraphMemory is:

  1. Applied to tool outputs (code, config files) in addition to conversational messages, or
  2. Using a prompt that does not filter structural/syntactic tokens, or
  3. Applying extraction to raw JSON/TOML content that floods the entity graph with keys

Impact

  • Entity graph is polluted — semantic search returns noise
  • Graph-based recall (fetch_graph_facts) retrieves irrelevant context
  • BFS traversal from meaningful nodes hits dead ends through noise nodes
  • 121 noisy entities degrade rather than improve response quality

Suggested fix

  1. Limit entity extraction to conversational messages only (exclude tool results)
  2. Add entity type filter to extraction prompt: only extract Person, Project, Technology, Decision, Preference, Location
  3. Add minimum entity length filter (reject single words like "go", "type")
  4. Add deduplication pass with EntityResolver before upsert

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmemoryzeph-memory crate (SQLite)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions