Skip to content
/ losless Public

A Deno proof-of-concept demonstrating two nested LLM loops for grounded, evidence-backed document analysis

Notifications You must be signed in to change notification settings

srdjan/losless

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Ralph Loop + LCM + PageIndex

System design

A Deno proof-of-concept demonstrating two nested LLM loops for grounded, evidence-backed document analysis:

  • Ralph loop (outer quality loop): generate, validate, feedback, retry
  • LCM mode (inner long-context pattern): use bounded active context, retrieval operators, and memory compaction rather than stuffing full corpora into a single prompt
  • Two-model split: Claude (Anthropic) for generation, GPT (OpenAI) for validation
  • GEPA optimization: closed-loop tuning of agent configuration (temperature, step budgets) based on real session trace metrics, with Pareto-front variant selection

The system uses @anthropic-ai/sdk and openai directly - no framework intermediary.


Prerequisites

  • Deno (v2.6+)
  • API keys:
    • ANTHROPIC_APIKEY for Claude generation
    • OPENAI_APIKEY for GPT validation and embeddings

Setup

cp .env.example .env
# fill in OPENAI_APIKEY and ANTHROPIC_APIKEY

Modes

The system supports two modes: QA (question-answering with evidence) and Task (general task completion with iterative reasoning).

QA mode

Generates a structured answer with verbatim evidence quotes from a document.

deno task demo -- --mode qa --query "Explain Ralph loop and LCM" --doc docs/long.txt

Output: answer (3-7 bullet lines) + evidence (3-8 verbatim quotes that must appear in the document).

Task mode (default)

Reads a document, reasons about a task, and iteratively improves the output using accumulated memory and retrieval.

deno task demo -- --query "Summarize the key architectural decisions" --doc docs/long.txt

Output: output (task completion) + memoryUpdate (findings persisted across iterations).

Task mode with decomposition

Breaks complex tasks into a sub-task DAG before executing them. Each sub-task runs through the full pipeline independently, then results are merged by an aggregation pass.

deno task demo -- --query "..." --doc docs/long.txt --decompose

GEPA optimization

Evaluates stored session traces and generates Pareto-optimal config variants by tuning temperature and step budgets. The optimizer forms a closed loop with normal sessions: run sessions to produce traces, run --optimize to select the best variant, then run more sessions under that variant to measure real performance.

Run at least one session first so traces exist in the output directory:

# 1. Run a session (produces traces)
deno task demo -- --query "..." --doc docs/long.txt --out out

# 2. Optimize (reads traces, sets active variant)
deno task demo -- --optimize --out out

# 3. Run another session (uses active variant, tags traces with variant ID)
deno task demo -- --query "..." --doc docs/long.txt --out out

# 4. Optimize again (variant now has real metrics instead of projections)
deno task demo -- --optimize --out out

On first run, candidates use heuristic projections (logged as "provisional"). After real sessions with the active variant, subsequent optimize runs use measured metrics. The Pareto front converges to reality over successive cycles.

Full flag set

deno task demo -- \
  --mode task \
  --query "..." \
  --doc docs/long.txt \
  --maxIters 6 \
  --out out \
  --progressMs 5000 \
  --externalDocs https://docs.vendor.com/api \
  --externalDocs https://developer.vendor-b.com/reference \
  --crawlDepth 2 \
  --externalRefresh ttl \
  --externalProvider pageindex \
  --externalTopN 20 \
  --externalTopK 6 \
  --externalWeight 0.35

How the QA loop works

  1. Generate (Claude + LCM): the worker agent explores the document with LCM operators (retrieve, llm_map) and produces bullet-point answers with verbatim evidence quotes.

  2. Validate:

    • Hard checks (local): format, bullet count (3-7), evidence count (3-8), quote length (<= 160 chars), no duplicates, each quote is a verbatim substring of the document.
    • Semantic judge (GPT): checks whether each bullet is supported by the provided evidence contexts (220-char windows around each cited quote).
  3. Feedback + retry: validation failures become explicit constraints appended to the next generation request, up to maxIters attempts.

How the Task loop works

  1. Retrieval (pre-phase): if the OpenAI client supports embeddings, the loop runs a two-stage retrieval over primary document chunks, past episodes, and optional external-doc chunks before calling DocReader. Stage 1 is ANN by cosine similarity; stage 2 is optional late-interaction reranking (token-level MaxSim). Top-K ranked snippets are passed to DocReader as retrievedContext.

  2. DocReader (Claude + LCM): extracts and summarizes relevant information from the document, accumulated memory, and retrieved context into a compact brief.

  3. TaskReasoner (Claude): reasons about the brief to complete the task and produces findings for memory.

  4. Validate:

    • Hard checks (local): output and memoryUpdate must meet minimum length thresholds. Extended checks run on non-trivial output: duplicate paragraph detection, lightweight contradiction heuristics (negation pattern matching), evidence linkage (phrase overlap between brief and output), and sub-task coverage (when decomposition is active).
    • TaskJudge (GPT): evaluates whether the output substantively completes the task. Issues are categorized: missing_scope, unsupported_claim, low_specificity, conflict. Each category drives specific targeted constraints for the next iteration.
  5. Memory + retry: the reasoner's memoryUpdate is appended to a persistent context.md file and written as a structured episode to episodes.jsonl. When episode count exceeds a threshold (default 6), a MemoryCompactor LLM call synthesizes a compact summary across all episodes rather than dropping the oldest blocks. Failures produce feedback constraints (base + category-targeted) for the next iteration. Phase timings and failure tags are recorded in every trace.

Why LCM + PageIndex is advantageous

In this codebase, LCM is the long-context runtime pattern used by task mode: bounded context assembly, retrieval-first DocReader input, persistent memory updates, and semantic compaction.

LCM advantages (internal task execution):

  • Token-pressure control: DocReader consumes selected context rather than raw full corpora, reducing prompt bloat and irrelevant tokens.
  • Iteration continuity: memory updates and compaction preserve useful findings across retries, which reduces repeated extraction work.
  • Feedback targeting: judge failures become docReaderHints, so extraction converges on missing evidence instead of re-reading blindly.
  • Operational observability: retrieval diagnostics, phase timings, and failure tags are trace-visible for debugging and optimization.

PageIndex advantages (external-doc ingestion and structure):

  • Hierarchy-aware extraction: external documents are represented as section trees (node ids, titles, summaries), preserving document structure.
  • Better API/spec recall: structured section nodes are robust for endpoint docs where terminology and hierarchy matter more than loose semantic similarity.
  • Stable provenance: node-level identifiers and URLs make external evidence auditable in traces and output citations.

Why combining them is better than either alone:

  • LCM without PageIndex: strong loop control but weaker structure-aware external-doc semantics.
  • PageIndex without LCM: good external indexing, but no iterative task/judge/memory loop to refine answers over retries.
  • Combined: LCM orchestrates iterative extraction/reasoning/validation while PageIndex contributes high-fidelity external structure; this improves precision, reduces hallucination risk, and keeps provenance explicit.

Decomposed task loop

When --decompose is passed, a TaskDecomposer agent first converts the task into a sub-task DAG (max 5 sub-tasks, max depth 2). Sub-tasks are executed in topological order (with priority tie-breaking); each runs through the full DocReader->TaskReasoner->TaskJudge pipeline with a bounded iteration budget (floor(maxIters / 2), minimum 2). A TaskAggregator agent then merges the sub-task outputs into a single coherent result.


Architecture

src/
  main.ts                    CLI entry point, wires all modes
  lib/
    llm_client.ts            Unified LLMClient interface + Anthropic/OpenAI implementations
                             (includes embed() on the OpenAI client)
    ai.ts                    Client factories with model validation
    agent.ts                 Non-LCM agent: tool-based structured output loop
    lcm/
      lcm_agent.ts           LCM agent loop with operators + output tool
      lcm_prompt.ts          LCM system prompt builder
      operators.ts           LCM operators (`retrieve`, `llm_map`, `expand`)
      context_assembler.ts   Bounded active-context assembly
      store.ts               Session message store
      summary_dag.ts         Summary DAG for compaction-aware context
      compactor.ts           Context compaction orchestration
      file_handler.ts        File reference storage and summarization helpers
      types.ts               LCM config and reference types
    worker.ts                QA worker agent config (LCM)
    doc_reader.ts            Task-mode document reader agent config (LCM, accepts retrievedContext)
    judge.ts                 QA semantic judge agent config
    task_reasoner.ts         Task-mode reasoning agent config
    task_judge.ts            Task-mode judge config (categorized issues)
    task_decomposer.ts       Task decomposition agent (sub-task DAG, topological sort)
    task_aggregator.ts       Sub-task result aggregation agent
    ralph.ts                 QA outer loop orchestration
    task_loop.ts             Task-mode outer loop (retrieval, decomposition, episode memory, phase timings)
    hard_validate.ts         Deterministic QA validation rules
    task_validate.ts         Deterministic task validation (duplicates, contradictions, linkage, sub-task coverage)
    loop_helpers.ts          Shared loop utilities (heartbeat, phase/worker error classification)
    retrieval.ts             Two-stage retrieval: chunker, embedder, ANN, late-interaction reranker
    external_docs.ts         External docs crawling, normalization, cache, and chunk materialization
    types.ts                 Shared type definitions (Episode, PhaseTimings, FailureTag, SubTask, etc.)
    env.ts                   Environment variable helpers
    memory.ts                Persistent memory: context.md writes, JSONL episode store, LLM compaction
    git_memory.ts            Session trace indexing and archival
    gepa/
      evaluator.ts           Load traces, compute per-session metrics (pass rate, latency, issue density)
      optimizer.ts           Candidate prompt/config evolution, Pareto front computation
      config_store.ts        Versioned agent config storage and retrieval

LLM client layer

llm_client.ts defines a LLMClient type that both Anthropic and OpenAI implementations satisfy. It handles message format translation, tool definition mapping, and token usage extraction. The OpenAI client additionally implements embed(texts) using text-embedding-3-small, used by the retrieval pipeline.

Agent layer

Non-LCM agents (agent.ts) run a simple loop: send messages with a structured output tool, parse the tool call response, retry up to maxSteps. Used by the judge, task reasoner, task judge, task decomposer, and task aggregator.

LCM agents (lcm/lcm_agent.ts) run with bounded context assembly from the LCM store and support operator tools (retrieve, llm_map, optional expand). They persist interaction state in the message store, enforce operator budgets, and trigger compaction when thresholds are exceeded.

LCM runtime

lcm/context_assembler.ts builds an active context window from recent messages plus compact summaries (lcm/summary_dag.ts). lcm/compactor.ts decides when to compact and executes compaction levels to keep context within token thresholds. lcm/operators.ts provides retrieval and parallel analysis tools.

Retrieval pipeline

retrieval.ts runs before each DocReader call when embeddings are available. It chunks the primary document (500-char windows, 100-char overlap), adds past episodes, and optionally adds external-doc chunks from external_docs.ts. External chunks come from either:

  • local provider: crawler + HTML/text normalization + chunking
  • pageindex provider: crawler + normalization + PageIndex markdown tree nodes

The pipeline embeds chunks plus query via the OpenAI client, performs ANN retrieval by cosine similarity, then optionally reranks with token-level MaxSim (ColBERT-style late interaction). Top-K results are injected into DocReader as retrievedContext with provenance labels (source=doc|episode|external). Diagnostics (candidate count, source counts, latency, selected chunk IDs and source refs) are recorded in the iteration trace.

Memory and episode store

Each iteration writes a structured episode record to <memDir>/episodes.jsonl containing the task, brief, output, memoryUpdate, and validation results. When the episode count reaches the compaction threshold (default 6), a compactMemory call sends all episodes to Claude with instructions to synthesize a compact summary within the memory budget, then clears the episode file. The context.md file receives both per-iteration appends and compacted summaries.

GEPA optimization

gepa/evaluator.ts reads all sessions from the session index, loads their traces, and computes per-session metrics: pass rate, average latency, issue density, and retry count. gepa/optimizer.ts generates candidate config variants (temperature, step budgets, prompt variants), scores them against session metrics, and computes the Pareto front across pass rate vs. latency. Variants are stored via gepa/config_store.ts.


Trace output

  • Per-iteration traces: out/iter-XX.json
  • Session archives: out/sessions/<session-id>/iter-XX.json
  • Session index: out/session-index.json
  • Episode store: <memDir>/episodes.jsonl
  • Sub-task traces: out/subtask-<id>/iter-XX.json (decompose mode)

Each trace includes phase timings (docReaderMs, reasonerMs, judgeMs, memoryMs), an optional failureTag (doc_reader_error, reasoner_error, judge_error, memory_error), compiled feedback, and retrieval diagnostics.

Query traces programmatically:

import { querySessionTraces } from "./src/lib/git_memory.ts";
const traces = await querySessionTraces("2026-02-20/ralph-d8eb40c5");

Environment variables

Variable Default Description
ANTHROPIC_APIKEY (required) Anthropic API key
OPENAI_APIKEY (required) OpenAI API key (also used for embeddings)
GENERATE_MODEL claude-sonnet-4-20250514 Claude model for generation
VALIDATE_MODEL gpt-4o-mini OpenAI model for validation
MAX_ITERS 4 Max outer loop iterations
WORKER_MAX_STEPS 80 Max LCM agent steps per iteration
WORKER_MAX_LLM_CALLS 60 Max LCM operator calls per iteration
PROGRESS_HEARTBEAT_MS 8000 Progress log interval during long phases
OUT_DIR out Output directory for traces
EXTERNAL_DOCS_DEFAULT_DEPTH 1 Default crawl depth for external docs
EXTERNAL_DOCS_REFRESH ttl External doc refresh policy
EXTERNAL_DOCS_PROVIDER local External docs pipeline (local or pageindex)
EXTERNAL_DOCS_CACHE_DIR out/external_docs External doc cache directory
EXTERNAL_DOCS_TOP_N 15 ANN candidate count from external docs
EXTERNAL_DOCS_TOP_K 5 Final selected chunks when external docs enabled
EXTERNAL_DOCS_WEIGHT 0.3 External source weight in fusion ranking
EXTERNAL_DOCS_TTL_MIN 1440 Cache TTL in minutes for refresh=ttl
EXTERNAL_DOCS_MAX_PAGES 200 Maximum pages crawled per source
PAGEINDEX_API_KEY (empty) API key for PageIndex markdown tree extraction
PAGEINDEX_BASE_URL https://api.pageindex.ai Base URL for PageIndex API

Troubleshooting

  • Worker step-budget errors: increase WORKER_MAX_STEPS (try doubling) and WORKER_MAX_LLM_CALLS proportionally.
  • Long silent pauses: reduce PROGRESS_HEARTBEAT_MS or pass --progressMs 3000.
  • Model not recognized: check the allowed model sets in src/lib/ai.ts. The system falls back to defaults for unrecognized model names.
  • Retrieval skipped: retrieval requires an OpenAI client with embedding support. If gptAI.embed is undefined, the pre-phase is silently skipped and DocReader runs without retrievedContext.
  • External docs not used: pass one or more --externalDocs <url> flags (or --externalDocsFile <path>). Crawled/cache artifacts are stored under out/external_docs by default.
  • PageIndex provider not active: set --externalProvider pageindex and PAGEINDEX_API_KEY. If API calls fail, the system automatically falls back to local chunking and logs the reason.
  • GEPA no sessions found: run at least one task-mode session first so traces are written to the session index before running --optimize.

About

A Deno proof-of-concept demonstrating two nested LLM loops for grounded, evidence-backed document analysis

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •