A Deno proof-of-concept demonstrating two nested LLM loops for grounded, evidence-backed document analysis:
- Ralph loop (outer quality loop): generate, validate, feedback, retry
- LCM mode (inner long-context pattern): use bounded active context, retrieval operators, and memory compaction rather than stuffing full corpora into a single prompt
- Two-model split: Claude (Anthropic) for generation, GPT (OpenAI) for validation
- GEPA optimization: closed-loop tuning of agent configuration (temperature, step budgets) based on real session trace metrics, with Pareto-front variant selection
The system uses @anthropic-ai/sdk and openai directly - no framework
intermediary.
- Deno (v2.6+)
- API keys:
ANTHROPIC_APIKEYfor Claude generationOPENAI_APIKEYfor GPT validation and embeddings
cp .env.example .env
# fill in OPENAI_APIKEY and ANTHROPIC_APIKEYThe system supports two modes: QA (question-answering with evidence) and Task (general task completion with iterative reasoning).
Generates a structured answer with verbatim evidence quotes from a document.
deno task demo -- --mode qa --query "Explain Ralph loop and LCM" --doc docs/long.txtOutput: answer (3-7 bullet lines) + evidence (3-8 verbatim quotes that must
appear in the document).
Reads a document, reasons about a task, and iteratively improves the output using accumulated memory and retrieval.
deno task demo -- --query "Summarize the key architectural decisions" --doc docs/long.txtOutput: output (task completion) + memoryUpdate (findings persisted across
iterations).
Breaks complex tasks into a sub-task DAG before executing them. Each sub-task runs through the full pipeline independently, then results are merged by an aggregation pass.
deno task demo -- --query "..." --doc docs/long.txt --decomposeEvaluates stored session traces and generates Pareto-optimal config variants by
tuning temperature and step budgets. The optimizer forms a closed loop with
normal sessions: run sessions to produce traces, run --optimize to select the
best variant, then run more sessions under that variant to measure real
performance.
Run at least one session first so traces exist in the output directory:
# 1. Run a session (produces traces)
deno task demo -- --query "..." --doc docs/long.txt --out out
# 2. Optimize (reads traces, sets active variant)
deno task demo -- --optimize --out out
# 3. Run another session (uses active variant, tags traces with variant ID)
deno task demo -- --query "..." --doc docs/long.txt --out out
# 4. Optimize again (variant now has real metrics instead of projections)
deno task demo -- --optimize --out outOn first run, candidates use heuristic projections (logged as "provisional"). After real sessions with the active variant, subsequent optimize runs use measured metrics. The Pareto front converges to reality over successive cycles.
deno task demo -- \
--mode task \
--query "..." \
--doc docs/long.txt \
--maxIters 6 \
--out out \
--progressMs 5000 \
--externalDocs https://docs.vendor.com/api \
--externalDocs https://developer.vendor-b.com/reference \
--crawlDepth 2 \
--externalRefresh ttl \
--externalProvider pageindex \
--externalTopN 20 \
--externalTopK 6 \
--externalWeight 0.35-
Generate (Claude + LCM): the worker agent explores the document with LCM operators (
retrieve,llm_map) and produces bullet-point answers with verbatim evidence quotes. -
Validate:
- Hard checks (local): format, bullet count (3-7), evidence count (3-8), quote length (<= 160 chars), no duplicates, each quote is a verbatim substring of the document.
- Semantic judge (GPT): checks whether each bullet is supported by the provided evidence contexts (220-char windows around each cited quote).
-
Feedback + retry: validation failures become explicit constraints appended to the next generation request, up to
maxItersattempts.
-
Retrieval (pre-phase): if the OpenAI client supports embeddings, the loop runs a two-stage retrieval over primary document chunks, past episodes, and optional external-doc chunks before calling DocReader. Stage 1 is ANN by cosine similarity; stage 2 is optional late-interaction reranking (token-level MaxSim). Top-K ranked snippets are passed to DocReader as
retrievedContext. -
DocReader (Claude + LCM): extracts and summarizes relevant information from the document, accumulated memory, and retrieved context into a compact brief.
-
TaskReasoner (Claude): reasons about the brief to complete the task and produces findings for memory.
-
Validate:
- Hard checks (local): output and memoryUpdate must meet minimum length thresholds. Extended checks run on non-trivial output: duplicate paragraph detection, lightweight contradiction heuristics (negation pattern matching), evidence linkage (phrase overlap between brief and output), and sub-task coverage (when decomposition is active).
- TaskJudge (GPT): evaluates whether the output substantively completes
the task. Issues are categorized:
missing_scope,unsupported_claim,low_specificity,conflict. Each category drives specific targeted constraints for the next iteration.
-
Memory + retry: the reasoner's
memoryUpdateis appended to a persistentcontext.mdfile and written as a structured episode toepisodes.jsonl. When episode count exceeds a threshold (default 6), aMemoryCompactorLLM call synthesizes a compact summary across all episodes rather than dropping the oldest blocks. Failures produce feedback constraints (base + category-targeted) for the next iteration. Phase timings and failure tags are recorded in every trace.
In this codebase, LCM is the long-context runtime pattern used by task mode: bounded context assembly, retrieval-first DocReader input, persistent memory updates, and semantic compaction.
LCM advantages (internal task execution):
- Token-pressure control: DocReader consumes selected context rather than raw full corpora, reducing prompt bloat and irrelevant tokens.
- Iteration continuity: memory updates and compaction preserve useful findings across retries, which reduces repeated extraction work.
- Feedback targeting: judge failures become
docReaderHints, so extraction converges on missing evidence instead of re-reading blindly. - Operational observability: retrieval diagnostics, phase timings, and failure tags are trace-visible for debugging and optimization.
PageIndex advantages (external-doc ingestion and structure):
- Hierarchy-aware extraction: external documents are represented as section trees (node ids, titles, summaries), preserving document structure.
- Better API/spec recall: structured section nodes are robust for endpoint docs where terminology and hierarchy matter more than loose semantic similarity.
- Stable provenance: node-level identifiers and URLs make external evidence auditable in traces and output citations.
Why combining them is better than either alone:
- LCM without PageIndex: strong loop control but weaker structure-aware external-doc semantics.
- PageIndex without LCM: good external indexing, but no iterative task/judge/memory loop to refine answers over retries.
- Combined: LCM orchestrates iterative extraction/reasoning/validation while PageIndex contributes high-fidelity external structure; this improves precision, reduces hallucination risk, and keeps provenance explicit.
When --decompose is passed, a TaskDecomposer agent first converts the task
into a sub-task DAG (max 5 sub-tasks, max depth 2). Sub-tasks are executed in
topological order (with priority tie-breaking); each runs through the full
DocReader->TaskReasoner->TaskJudge pipeline with a bounded iteration budget
(floor(maxIters / 2), minimum 2). A TaskAggregator agent then merges the
sub-task outputs into a single coherent result.
src/
main.ts CLI entry point, wires all modes
lib/
llm_client.ts Unified LLMClient interface + Anthropic/OpenAI implementations
(includes embed() on the OpenAI client)
ai.ts Client factories with model validation
agent.ts Non-LCM agent: tool-based structured output loop
lcm/
lcm_agent.ts LCM agent loop with operators + output tool
lcm_prompt.ts LCM system prompt builder
operators.ts LCM operators (`retrieve`, `llm_map`, `expand`)
context_assembler.ts Bounded active-context assembly
store.ts Session message store
summary_dag.ts Summary DAG for compaction-aware context
compactor.ts Context compaction orchestration
file_handler.ts File reference storage and summarization helpers
types.ts LCM config and reference types
worker.ts QA worker agent config (LCM)
doc_reader.ts Task-mode document reader agent config (LCM, accepts retrievedContext)
judge.ts QA semantic judge agent config
task_reasoner.ts Task-mode reasoning agent config
task_judge.ts Task-mode judge config (categorized issues)
task_decomposer.ts Task decomposition agent (sub-task DAG, topological sort)
task_aggregator.ts Sub-task result aggregation agent
ralph.ts QA outer loop orchestration
task_loop.ts Task-mode outer loop (retrieval, decomposition, episode memory, phase timings)
hard_validate.ts Deterministic QA validation rules
task_validate.ts Deterministic task validation (duplicates, contradictions, linkage, sub-task coverage)
loop_helpers.ts Shared loop utilities (heartbeat, phase/worker error classification)
retrieval.ts Two-stage retrieval: chunker, embedder, ANN, late-interaction reranker
external_docs.ts External docs crawling, normalization, cache, and chunk materialization
types.ts Shared type definitions (Episode, PhaseTimings, FailureTag, SubTask, etc.)
env.ts Environment variable helpers
memory.ts Persistent memory: context.md writes, JSONL episode store, LLM compaction
git_memory.ts Session trace indexing and archival
gepa/
evaluator.ts Load traces, compute per-session metrics (pass rate, latency, issue density)
optimizer.ts Candidate prompt/config evolution, Pareto front computation
config_store.ts Versioned agent config storage and retrieval
llm_client.ts defines a LLMClient type that both Anthropic and OpenAI
implementations satisfy. It handles message format translation, tool definition
mapping, and token usage extraction. The OpenAI client additionally implements
embed(texts) using text-embedding-3-small, used by the retrieval pipeline.
Non-LCM agents (agent.ts) run a simple loop: send messages with a structured
output tool, parse the tool call response, retry up to maxSteps. Used by the
judge, task reasoner, task judge, task decomposer, and task aggregator.
LCM agents (lcm/lcm_agent.ts) run with bounded context assembly from the LCM
store and support operator tools (retrieve, llm_map, optional expand).
They persist interaction state in the message store, enforce operator budgets,
and trigger compaction when thresholds are exceeded.
lcm/context_assembler.ts builds an active context window from recent messages
plus compact summaries (lcm/summary_dag.ts). lcm/compactor.ts decides when
to compact and executes compaction levels to keep context within token
thresholds. lcm/operators.ts provides retrieval and parallel analysis tools.
retrieval.ts runs before each DocReader call when embeddings are available. It
chunks the primary document (500-char windows, 100-char overlap), adds past
episodes, and optionally adds external-doc chunks from external_docs.ts.
External chunks come from either:
localprovider: crawler + HTML/text normalization + chunkingpageindexprovider: crawler + normalization + PageIndex markdown tree nodes
The pipeline embeds chunks plus query via the OpenAI client, performs ANN
retrieval by cosine similarity, then optionally reranks with token-level MaxSim
(ColBERT-style late interaction). Top-K results are injected into DocReader as
retrievedContext with provenance labels (source=doc|episode|external).
Diagnostics (candidate count, source counts, latency, selected chunk IDs and
source refs) are recorded in the iteration trace.
Each iteration writes a structured episode record to <memDir>/episodes.jsonl
containing the task, brief, output, memoryUpdate, and validation results. When
the episode count reaches the compaction threshold (default 6), a
compactMemory call sends all episodes to Claude with instructions to
synthesize a compact summary within the memory budget, then clears the episode
file. The context.md file receives both per-iteration appends and compacted
summaries.
gepa/evaluator.ts reads all sessions from the session index, loads their
traces, and computes per-session metrics: pass rate, average latency, issue
density, and retry count. gepa/optimizer.ts generates candidate config
variants (temperature, step budgets, prompt variants), scores them against
session metrics, and computes the Pareto front across pass rate vs. latency.
Variants are stored via gepa/config_store.ts.
- Per-iteration traces:
out/iter-XX.json - Session archives:
out/sessions/<session-id>/iter-XX.json - Session index:
out/session-index.json - Episode store:
<memDir>/episodes.jsonl - Sub-task traces:
out/subtask-<id>/iter-XX.json(decompose mode)
Each trace includes phase timings (docReaderMs, reasonerMs, judgeMs,
memoryMs), an optional failureTag (doc_reader_error, reasoner_error,
judge_error, memory_error), compiled feedback, and retrieval diagnostics.
Query traces programmatically:
import { querySessionTraces } from "./src/lib/git_memory.ts";
const traces = await querySessionTraces("2026-02-20/ralph-d8eb40c5");| Variable | Default | Description |
|---|---|---|
ANTHROPIC_APIKEY |
(required) | Anthropic API key |
OPENAI_APIKEY |
(required) | OpenAI API key (also used for embeddings) |
GENERATE_MODEL |
claude-sonnet-4-20250514 |
Claude model for generation |
VALIDATE_MODEL |
gpt-4o-mini |
OpenAI model for validation |
MAX_ITERS |
4 |
Max outer loop iterations |
WORKER_MAX_STEPS |
80 |
Max LCM agent steps per iteration |
WORKER_MAX_LLM_CALLS |
60 |
Max LCM operator calls per iteration |
PROGRESS_HEARTBEAT_MS |
8000 |
Progress log interval during long phases |
OUT_DIR |
out |
Output directory for traces |
EXTERNAL_DOCS_DEFAULT_DEPTH |
1 |
Default crawl depth for external docs |
EXTERNAL_DOCS_REFRESH |
ttl |
External doc refresh policy |
EXTERNAL_DOCS_PROVIDER |
local |
External docs pipeline (local or pageindex) |
EXTERNAL_DOCS_CACHE_DIR |
out/external_docs |
External doc cache directory |
EXTERNAL_DOCS_TOP_N |
15 |
ANN candidate count from external docs |
EXTERNAL_DOCS_TOP_K |
5 |
Final selected chunks when external docs enabled |
EXTERNAL_DOCS_WEIGHT |
0.3 |
External source weight in fusion ranking |
EXTERNAL_DOCS_TTL_MIN |
1440 |
Cache TTL in minutes for refresh=ttl |
EXTERNAL_DOCS_MAX_PAGES |
200 |
Maximum pages crawled per source |
PAGEINDEX_API_KEY |
(empty) | API key for PageIndex markdown tree extraction |
PAGEINDEX_BASE_URL |
https://api.pageindex.ai |
Base URL for PageIndex API |
- Worker step-budget errors: increase
WORKER_MAX_STEPS(try doubling) andWORKER_MAX_LLM_CALLSproportionally. - Long silent pauses: reduce
PROGRESS_HEARTBEAT_MSor pass--progressMs 3000. - Model not recognized: check the allowed model sets in
src/lib/ai.ts. The system falls back to defaults for unrecognized model names. - Retrieval skipped: retrieval requires an OpenAI client with embedding
support. If
gptAI.embedis undefined, the pre-phase is silently skipped and DocReader runs withoutretrievedContext. - External docs not used: pass one or more
--externalDocs <url>flags (or--externalDocsFile <path>). Crawled/cache artifacts are stored underout/external_docsby default. - PageIndex provider not active: set
--externalProvider pageindexandPAGEINDEX_API_KEY. If API calls fail, the system automatically falls back to local chunking and logs the reason. - GEPA no sessions found: run at least one task-mode session first so traces
are written to the session index before running
--optimize.