Ralph Loop + LCM + PageIndex

A Deno proof-of-concept demonstrating two nested LLM loops for grounded, evidence-backed document analysis:

Ralph loop (outer quality loop): generate, validate, feedback, retry
LCM mode (inner long-context pattern): use bounded active context, retrieval operators, and memory compaction rather than stuffing full corpora into a single prompt
Two-model split: Claude (Anthropic) for generation, GPT (OpenAI) for validation
GEPA optimization: closed-loop tuning of agent configuration (temperature, step budgets) based on real session trace metrics, with Pareto-front variant selection

The system uses @anthropic-ai/sdk and openai directly - no framework intermediary.

Prerequisites

Deno (v2.6+)
API keys:
- ANTHROPIC_APIKEY for Claude generation
- OPENAI_APIKEY for GPT validation and embeddings

Setup

cp .env.example .env
# fill in OPENAI_APIKEY and ANTHROPIC_APIKEY

Modes

The system supports two modes: QA (question-answering with evidence) and Task (general task completion with iterative reasoning).

QA mode

Generates a structured answer with verbatim evidence quotes from a document.

deno task demo -- --mode qa --query "Explain Ralph loop and LCM" --doc docs/long.txt

Output: answer (3-7 bullet lines) + evidence (3-8 verbatim quotes that must appear in the document).

Task mode (default)

Reads a document, reasons about a task, and iteratively improves the output using accumulated memory and retrieval.

deno task demo -- --query "Summarize the key architectural decisions" --doc docs/long.txt

Output: output (task completion) + memoryUpdate (findings persisted across iterations).

Task mode with decomposition

Breaks complex tasks into a sub-task DAG before executing them. Each sub-task runs through the full pipeline independently, then results are merged by an aggregation pass.

deno task demo -- --query "..." --doc docs/long.txt --decompose

GEPA optimization

Evaluates stored session traces and generates Pareto-optimal config variants by tuning temperature and step budgets. The optimizer forms a closed loop with normal sessions: run sessions to produce traces, run --optimize to select the best variant, then run more sessions under that variant to measure real performance.

Run at least one session first so traces exist in the output directory:

# 1. Run a session (produces traces)
deno task demo -- --query "..." --doc docs/long.txt --out out

# 2. Optimize (reads traces, sets active variant)
deno task demo -- --optimize --out out

# 3. Run another session (uses active variant, tags traces with variant ID)
deno task demo -- --query "..." --doc docs/long.txt --out out

# 4. Optimize again (variant now has real metrics instead of projections)
deno task demo -- --optimize --out out

On first run, candidates use heuristic projections (logged as "provisional"). After real sessions with the active variant, subsequent optimize runs use measured metrics. The Pareto front converges to reality over successive cycles.

Full flag set

deno task demo -- \
  --mode task \
  --query "..." \
  --doc docs/long.txt \
  --maxIters 6 \
  --out out \
  --progressMs 5000 \
  --externalDocs https://docs.vendor.com/api \
  --externalDocs https://developer.vendor-b.com/reference \
  --crawlDepth 2 \
  --externalRefresh ttl \
  --externalProvider pageindex \
  --externalTopN 20 \
  --externalTopK 6 \
  --externalWeight 0.35

How the QA loop works

Generate (Claude + LCM): the worker agent explores the document with LCM operators (retrieve, llm_map) and produces bullet-point answers with verbatim evidence quotes.
Validate:
- Hard checks (local): format, bullet count (3-7), evidence count (3-8), quote length (<= 160 chars), no duplicates, each quote is a verbatim substring of the document.
- Semantic judge (GPT): checks whether each bullet is supported by the provided evidence contexts (220-char windows around each cited quote).
Feedback + retry: validation failures become explicit constraints appended to the next generation request, up to maxIters attempts.

How the Task loop works

Retrieval (pre-phase): if the OpenAI client supports embeddings, the loop runs a two-stage retrieval over primary document chunks, past episodes, and optional external-doc chunks before calling DocReader. Stage 1 is ANN by cosine similarity; stage 2 is optional late-interaction reranking (token-level MaxSim). Top-K ranked snippets are passed to DocReader as retrievedContext.
DocReader (Claude + LCM): extracts and summarizes relevant information from the document, accumulated memory, and retrieved context into a compact brief.
TaskReasoner (Claude): reasons about the brief to complete the task and produces findings for memory.
Validate:
- Hard checks (local): output and memoryUpdate must meet minimum length thresholds. Extended checks run on non-trivial output: duplicate paragraph detection, lightweight contradiction heuristics (negation pattern matching), evidence linkage (phrase overlap between brief and output), and sub-task coverage (when decomposition is active).
- TaskJudge (GPT): evaluates whether the output substantively completes the task. Issues are categorized: missing_scope, unsupported_claim, low_specificity, conflict. Each category drives specific targeted constraints for the next iteration.
Memory + retry: the reasoner's memoryUpdate is appended to a persistent context.md file and written as a structured episode to episodes.jsonl. When episode count exceeds a threshold (default 6), a MemoryCompactor LLM call synthesizes a compact summary across all episodes rather than dropping the oldest blocks. Failures produce feedback constraints (base + category-targeted) for the next iteration. Phase timings and failure tags are recorded in every trace.

Why LCM + PageIndex is advantageous

In this codebase, LCM is the long-context runtime pattern used by task mode: bounded context assembly, retrieval-first DocReader input, persistent memory updates, and semantic compaction.

LCM advantages (internal task execution):

Token-pressure control: DocReader consumes selected context rather than raw full corpora, reducing prompt bloat and irrelevant tokens.
Iteration continuity: memory updates and compaction preserve useful findings across retries, which reduces repeated extraction work.
Feedback targeting: judge failures become docReaderHints, so extraction converges on missing evidence instead of re-reading blindly.
Operational observability: retrieval diagnostics, phase timings, and failure tags are trace-visible for debugging and optimization.

PageIndex advantages (external-doc ingestion and structure):

Hierarchy-aware extraction: external documents are represented as section trees (node ids, titles, summaries), preserving document structure.
Better API/spec recall: structured section nodes are robust for endpoint docs where terminology and hierarchy matter more than loose semantic similarity.
Stable provenance: node-level identifiers and URLs make external evidence auditable in traces and output citations.

Why combining them is better than either alone:

LCM without PageIndex: strong loop control but weaker structure-aware external-doc semantics.
PageIndex without LCM: good external indexing, but no iterative task/judge/memory loop to refine answers over retries.
Combined: LCM orchestrates iterative extraction/reasoning/validation while PageIndex contributes high-fidelity external structure; this improves precision, reduces hallucination risk, and keeps provenance explicit.

Decomposed task loop

When --decompose is passed, a TaskDecomposer agent first converts the task into a sub-task DAG (max 5 sub-tasks, max depth 2). Sub-tasks are executed in topological order (with priority tie-breaking); each runs through the full DocReader->TaskReasoner->TaskJudge pipeline with a bounded iteration budget (floor(maxIters / 2), minimum 2). A TaskAggregator agent then merges the sub-task outputs into a single coherent result.

Architecture

src/
  main.ts                    CLI entry point, wires all modes
  lib/
    llm_client.ts            Unified LLMClient interface + Anthropic/OpenAI implementations
                             (includes embed() on the OpenAI client)
    ai.ts                    Client factories with model validation
    agent.ts                 Non-LCM agent: tool-based structured output loop
    lcm/
      lcm_agent.ts           LCM agent loop with operators + output tool
      lcm_prompt.ts          LCM system prompt builder
      operators.ts           LCM operators (`retrieve`, `llm_map`, `expand`)
      context_assembler.ts   Bounded active-context assembly
      store.ts               Session message store
      summary_dag.ts         Summary DAG for compaction-aware context
      compactor.ts           Context compaction orchestration
      file_handler.ts        File reference storage and summarization helpers
      types.ts               LCM config and reference types
    worker.ts                QA worker agent config (LCM)
    doc_reader.ts            Task-mode document reader agent config (LCM, accepts retrievedContext)
    judge.ts                 QA semantic judge agent config
    task_reasoner.ts         Task-mode reasoning agent config
    task_judge.ts            Task-mode judge config (categorized issues)
    task_decomposer.ts       Task decomposition agent (sub-task DAG, topological sort)
    task_aggregator.ts       Sub-task result aggregation agent
    ralph.ts                 QA outer loop orchestration
    task_loop.ts             Task-mode outer loop (retrieval, decomposition, episode memory, phase timings)
    hard_validate.ts         Deterministic QA validation rules
    task_validate.ts         Deterministic task validation (duplicates, contradictions, linkage, sub-task coverage)
    loop_helpers.ts          Shared loop utilities (heartbeat, phase/worker error classification)
    retrieval.ts             Two-stage retrieval: chunker, embedder, ANN, late-interaction reranker
    external_docs.ts         External docs crawling, normalization, cache, and chunk materialization
    types.ts                 Shared type definitions (Episode, PhaseTimings, FailureTag, SubTask, etc.)
    env.ts                   Environment variable helpers
    memory.ts                Persistent memory: context.md writes, JSONL episode store, LLM compaction
    git_memory.ts            Session trace indexing and archival
    gepa/
      evaluator.ts           Load traces, compute per-session metrics (pass rate, latency, issue density)
      optimizer.ts           Candidate prompt/config evolution, Pareto front computation
      config_store.ts        Versioned agent config storage and retrieval

LLM client layer

llm_client.ts defines a LLMClient type that both Anthropic and OpenAI implementations satisfy. It handles message format translation, tool definition mapping, and token usage extraction. The OpenAI client additionally implements embed(texts) using text-embedding-3-small, used by the retrieval pipeline.

Agent layer

Non-LCM agents (agent.ts) run a simple loop: send messages with a structured output tool, parse the tool call response, retry up to maxSteps. Used by the judge, task reasoner, task judge, task decomposer, and task aggregator.

LCM agents (lcm/lcm_agent.ts) run with bounded context assembly from the LCM store and support operator tools (retrieve, llm_map, optional expand). They persist interaction state in the message store, enforce operator budgets, and trigger compaction when thresholds are exceeded.

LCM runtime

lcm/context_assembler.ts builds an active context window from recent messages plus compact summaries (lcm/summary_dag.ts). lcm/compactor.ts decides when to compact and executes compaction levels to keep context within token thresholds. lcm/operators.ts provides retrieval and parallel analysis tools.

Retrieval pipeline

retrieval.ts runs before each DocReader call when embeddings are available. It chunks the primary document (500-char windows, 100-char overlap), adds past episodes, and optionally adds external-doc chunks from external_docs.ts. External chunks come from either:

local provider: crawler + HTML/text normalization + chunking
pageindex provider: crawler + normalization + PageIndex markdown tree nodes

The pipeline embeds chunks plus query via the OpenAI client, performs ANN retrieval by cosine similarity, then optionally reranks with token-level MaxSim (ColBERT-style late interaction). Top-K results are injected into DocReader as retrievedContext with provenance labels (source=doc|episode|external). Diagnostics (candidate count, source counts, latency, selected chunk IDs and source refs) are recorded in the iteration trace.

Memory and episode store

Each iteration writes a structured episode record to <memDir>/episodes.jsonl containing the task, brief, output, memoryUpdate, and validation results. When the episode count reaches the compaction threshold (default 6), a compactMemory call sends all episodes to Claude with instructions to synthesize a compact summary within the memory budget, then clears the episode file. The context.md file receives both per-iteration appends and compacted summaries.

GEPA optimization

gepa/evaluator.ts reads all sessions from the session index, loads their traces, and computes per-session metrics: pass rate, average latency, issue density, and retry count. gepa/optimizer.ts generates candidate config variants (temperature, step budgets, prompt variants), scores them against session metrics, and computes the Pareto front across pass rate vs. latency. Variants are stored via gepa/config_store.ts.

Trace output

Per-iteration traces: out/iter-XX.json
Session archives: out/sessions/<session-id>/iter-XX.json
Session index: out/session-index.json
Episode store: <memDir>/episodes.jsonl
Sub-task traces: out/subtask-<id>/iter-XX.json (decompose mode)

Each trace includes phase timings (docReaderMs, reasonerMs, judgeMs, memoryMs), an optional failureTag (doc_reader_error, reasoner_error, judge_error, memory_error), compiled feedback, and retrieval diagnostics.

Query traces programmatically:

import { querySessionTraces } from "./src/lib/git_memory.ts";
const traces = await querySessionTraces("2026-02-20/ralph-d8eb40c5");

Environment variables

Variable	Default	Description
`ANTHROPIC_APIKEY`	(required)	Anthropic API key
`OPENAI_APIKEY`	(required)	OpenAI API key (also used for embeddings)
`GENERATE_MODEL`	`claude-sonnet-4-20250514`	Claude model for generation
`VALIDATE_MODEL`	`gpt-4o-mini`	OpenAI model for validation
`MAX_ITERS`	`4`	Max outer loop iterations
`WORKER_MAX_STEPS`	`80`	Max LCM agent steps per iteration
`WORKER_MAX_LLM_CALLS`	`60`	Max LCM operator calls per iteration
`PROGRESS_HEARTBEAT_MS`	`8000`	Progress log interval during long phases
`OUT_DIR`	`out`	Output directory for traces
`EXTERNAL_DOCS_DEFAULT_DEPTH`	`1`	Default crawl depth for external docs
`EXTERNAL_DOCS_REFRESH`	`ttl`	External doc refresh policy
`EXTERNAL_DOCS_PROVIDER`	`local`	External docs pipeline (`local` or `pageindex`)
`EXTERNAL_DOCS_CACHE_DIR`	`out/external_docs`	External doc cache directory
`EXTERNAL_DOCS_TOP_N`	`15`	ANN candidate count from external docs
`EXTERNAL_DOCS_TOP_K`	`5`	Final selected chunks when external docs enabled
`EXTERNAL_DOCS_WEIGHT`	`0.3`	External source weight in fusion ranking
`EXTERNAL_DOCS_TTL_MIN`	`1440`	Cache TTL in minutes for refresh=ttl
`EXTERNAL_DOCS_MAX_PAGES`	`200`	Maximum pages crawled per source
`PAGEINDEX_API_KEY`	(empty)	API key for PageIndex markdown tree extraction
`PAGEINDEX_BASE_URL`	`https://api.pageindex.ai`	Base URL for PageIndex API

Troubleshooting

Worker step-budget errors: increase WORKER_MAX_STEPS (try doubling) and WORKER_MAX_LLM_CALLS proportionally.
Long silent pauses: reduce PROGRESS_HEARTBEAT_MS or pass --progressMs 3000.
Model not recognized: check the allowed model sets in src/lib/ai.ts. The system falls back to defaults for unrecognized model names.
Retrieval skipped: retrieval requires an OpenAI client with embedding support. If gptAI.embed is undefined, the pre-phase is silently skipped and DocReader runs without retrievedContext.
External docs not used: pass one or more --externalDocs <url> flags (or --externalDocsFile <path>). Crawled/cache artifacts are stored under out/external_docs by default.
PageIndex provider not active: set --externalProvider pageindex and PAGEINDEX_API_KEY. If API calls fail, the system automatically falls back to local chunking and logs the reason.
GEPA no sessions found: run at least one task-mode session first so traces are written to the session index before running --optimize.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
docs		docs
src		src
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
README.md		README.md
deno.json		deno.json
deno.lock		deno.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ralph Loop + LCM + PageIndex

Prerequisites

Setup

Modes

QA mode

Task mode (default)

Task mode with decomposition

GEPA optimization

Full flag set

How the QA loop works

How the Task loop works

Why LCM + PageIndex is advantageous

Decomposed task loop

Architecture

LLM client layer

Agent layer

LCM runtime

Retrieval pipeline

Memory and episode store

GEPA optimization

Trace output

Environment variables

Troubleshooting

About

Uh oh!

Releases 1

Packages

Contributors 2

Uh oh!

Languages

srdjan/losless

Folders and files

Latest commit

History

Repository files navigation

Ralph Loop + LCM + PageIndex

Prerequisites

Setup

Modes

QA mode

Task mode (default)

Task mode with decomposition

GEPA optimization

Full flag set

How the QA loop works

How the Task loop works

Why LCM + PageIndex is advantageous

Decomposed task loop

Architecture

LLM client layer

Agent layer

LCM runtime

Retrieval pipeline

Memory and episode store

GEPA optimization

Trace output

Environment variables

Troubleshooting

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Contributors 2

Uh oh!

Languages

Packages