Skip to content

feat(memory): LLM-powered entity and relation extraction pipeline #1225

@bug-ops

Description

@bug-ops

Summary

Phase 2 of graph memory (#1222): LLM extraction pipeline for entities and relationships from conversation messages.

Depends on: #1224 (schema & types)

Tasks

1. Extraction Types (graph/extractor.rs)

ExtractionResult, ExtractedEntity, ExtractedEdge — all derive JsonSchema for structured LLM output. GraphExtractor struct holds &AnyProvider reference.

2. Extraction Prompt

System prompt for entity/relation extraction with rules:

  1. Extract named entities (people, tools, concepts, projects, languages, files, configs, organizations)
  2. Extract relationships as (source, target, relation_verb, fact_sentence)
  3. Use context window of last 4 messages for coreference resolution
  4. Normalize entity names (capitalize proper nouns, use canonical tool names)
  5. Output empty arrays for messages with no extractable content
  6. Maximum entities/edges per config limits
  7. Skip greetings, acknowledgments, short conversational messages
  8. Do not extract PII (emails, phone numbers, addresses)
  9. Always output in English regardless of conversation language
  10. Temporal hints: if message implies time ("last week", "since January"), include temporal_hint

3. Entity Resolver (graph/resolver.rs)

EntityResolver with methods:

  • resolve_entity(extracted: &ExtractedEntity, store: &GraphStore) — exact name+type match (MVP). Returns existing entity ID or creates new.
  • resolve_edge(extracted: &ExtractedEdge, source_id: i64, target_id: i64, store: &GraphStore) — check for semantically similar existing edges between same entity pair. If contradictory, invalidate old edge.
  • scrub_content integration: when redact_credentials is enabled, entity names pass through scrub before storage.

4. Extraction Pipeline

GraphExtractor::extract(&self, message: &str, context: &[Message]) -> Result<ExtractionResult>:

  1. Build prompt with context window
  2. Call provider.chat_typed_erased::<ExtractionResult>()
  3. Validate: filter entities with unknown types (coerce to Concept), truncate to limits
  4. Return structured result

GraphExtractor::process(&self, result: ExtractionResult, store: &GraphStore, resolver: &EntityResolver, episode_id: MessageId):

  1. For each entity: resolve → upsert
  2. For each edge: resolve source+target entities → check duplicates → insert or invalidate+insert

Architecture Reference

See .local/plan/graph-memory-architecture.md Section 4 for prompt template, resolution algorithm, and PII scrubbing details.

Acceptance Criteria

  • Entity extraction from sample conversations produces reasonable entities
  • Unknown entity types coerced to Concept (no schema drift)
  • Entity resolution merges duplicates on exact name+type match
  • Contradictory edges invalidated with correct timestamps
  • Empty/short messages produce empty extraction (not errors)
  • PII scrubbing applied when redact_credentials enabled
  • English output regardless of input language
  • ~18 tests (15 unit + 3 integration)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requestgraph-memoryKnowledge graph memory featurellmzeph-llm crate (Ollama, Claude)memoryzeph-memory crate (SQLite)

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions