-
Notifications
You must be signed in to change notification settings - Fork 2
feat(memory): LLM-powered entity and relation extraction pipeline #1225
Copy link
Copy link
Closed
Labels
enhancementNew feature or requestNew feature or requestgraph-memoryKnowledge graph memory featureKnowledge graph memory featurellmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)memoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)
Description
Summary
Phase 2 of graph memory (#1222): LLM extraction pipeline for entities and relationships from conversation messages.
Depends on: #1224 (schema & types)
Tasks
1. Extraction Types (graph/extractor.rs)
ExtractionResult, ExtractedEntity, ExtractedEdge — all derive JsonSchema for structured LLM output. GraphExtractor struct holds &AnyProvider reference.
2. Extraction Prompt
System prompt for entity/relation extraction with rules:
- Extract named entities (people, tools, concepts, projects, languages, files, configs, organizations)
- Extract relationships as (source, target, relation_verb, fact_sentence)
- Use context window of last 4 messages for coreference resolution
- Normalize entity names (capitalize proper nouns, use canonical tool names)
- Output empty arrays for messages with no extractable content
- Maximum entities/edges per config limits
- Skip greetings, acknowledgments, short conversational messages
- Do not extract PII (emails, phone numbers, addresses)
- Always output in English regardless of conversation language
- Temporal hints: if message implies time ("last week", "since January"), include temporal_hint
3. Entity Resolver (graph/resolver.rs)
EntityResolver with methods:
resolve_entity(extracted: &ExtractedEntity, store: &GraphStore)— exact name+type match (MVP). Returns existing entity ID or creates new.resolve_edge(extracted: &ExtractedEdge, source_id: i64, target_id: i64, store: &GraphStore)— check for semantically similar existing edges between same entity pair. If contradictory, invalidate old edge.scrub_contentintegration: whenredact_credentialsis enabled, entity names pass through scrub before storage.
4. Extraction Pipeline
GraphExtractor::extract(&self, message: &str, context: &[Message]) -> Result<ExtractionResult>:
- Build prompt with context window
- Call
provider.chat_typed_erased::<ExtractionResult>() - Validate: filter entities with unknown types (coerce to Concept), truncate to limits
- Return structured result
GraphExtractor::process(&self, result: ExtractionResult, store: &GraphStore, resolver: &EntityResolver, episode_id: MessageId):
- For each entity: resolve → upsert
- For each edge: resolve source+target entities → check duplicates → insert or invalidate+insert
Architecture Reference
See .local/plan/graph-memory-architecture.md Section 4 for prompt template, resolution algorithm, and PII scrubbing details.
Acceptance Criteria
- Entity extraction from sample conversations produces reasonable entities
- Unknown entity types coerced to Concept (no schema drift)
- Entity resolution merges duplicates on exact name+type match
- Contradictory edges invalidated with correct timestamps
- Empty/short messages produce empty extraction (not errors)
- PII scrubbing applied when redact_credentials enabled
- English output regardless of input language
- ~18 tests (15 unit + 3 integration)
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requestgraph-memoryKnowledge graph memory featureKnowledge graph memory featurellmzeph-llm crate (Ollama, Claude)zeph-llm crate (Ollama, Claude)memoryzeph-memory crate (SQLite)zeph-memory crate (SQLite)