Skip to content

JinyiHan99/GA-Technical-Report

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

97 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GenericAgent

GenericAgent (GA)

A Self-Evolving LLM Agent via Contextual Information Density Maximization

Technical Report Code Version License

Advantage AI Agent Lab (A³ LAB) · Shenzhen Aquaintelling Technology × Fudan University


🎉 News

🔥 Highlights

  • 🧠 Self-evolving by design — an autonomous trajectory → SOP → executable-code distillation pipeline, no manual prompt tuning
  • 🪶 Nine atomic tools, not fifty — broad capability through composition, not tool enumeration
  • 📉 ~1/3 the token cost of today's leading agent systems, at matched or better task success
  • 📚 No external vector DB needed — beats embedding-based retrievers on LoCoMo with pure hierarchical memory
  • 🔁 Evolves with use — nine-round longitudinal runs show –89.6% tokens, –78% runtime, –84% LLM calls

🏗️ Architecture at a Glance

GA framework

GA follows a unified agent loop that builds an execution context from the current task and relevant memory, emits an output or tool call, and updates the system through structured feedback. The loop is supported by four mechanisms, a minimal atomic tool set, hierarchical memory, reflection-driven self-evolution, and structured browser extraction, that together maximize context information density across the full lifecycle of an interaction.


⚙️ The Four Core Mechanisms

GA instantiates context information-density maximization across the entire lifecycle of contextual information.

1. Minimal Atomic Toolset — density before execution

Instead of exposing dozens of specialized tools, GA ships 9 atomic primitives across 5 capability classes (file ops, code execution, web interaction, memory management, human-in-the-loop). Broad capability emerges from composition, not enumeration. Result: a smaller tool schema, a smaller action space, fewer selection errors, and no prompt bloat.

2. Hierarchical Memory — density during execution

A layered memory system where only a compact "always-on" orientation layer sits in the prompt. Richer factual knowledge (L2), procedural SOPs (L3), and archived interaction history are kept off-prompt and retrieved on demand.

3. Self-Evolution — density that compounds over time

A reflection-driven pipeline that compresses verified trajectories into reusable SOPs → executable code, in three autonomous stages (natural-language → textual SOP → codified SOP). Transitions are triggered by the memory system itself, not by the user.

4. Context Truncation & Compression — density preserved under pressure

Layered management of historical content: head/tail truncation of tool outputs, tag-level compression of older messages, temporal eviction past budget, plus a continuously injected working-memory anchor. The active context stays task-relevant instead of growing linearly with turns.


🌟 Emergent System Capabilities

On top of the four mechanisms, GA exhibits a set of system-level behaviors that together make it deployable as a self-driving agent:

  • Subagent dispatch — spawn bounded-scope workers with their own tool sets and context budgets
  • Reflect Mode - continuously monitors for environmental changes and automatically triggers the corresponding task once a specific condition is detected.
    • Watchdog mode — reactive execution triggered by environmental events, no user prompt required
    • Scheduled tasks — cron-style recurring execution reusing the main agent loop

📈 Evaluation — Five Dimensions

Dimension Question Benchmarks used
1. Task Completion & Token Efficiency Can GA complete hard tasks more cheaply than leading agents? SOP-Bench, Lifelong AgentBench, RealFin-Benchmark
2. Tool-Use Efficiency Can a minimal atomic toolset solve what specialized toolsets solve, with less overhead? Tool Efficiency Benchmark (11 simple + 5 long-horizon tasks)
3. Memory System Effectiveness Does condensed hierarchical memory beat full/redundant memory and embedding-based retrievers? SOP-Bench (dangerous goods), LoCoMo, 20-skill stress test
4. Self-Evolution Capability Can the agent distill experience into reusable SOPs and code, without intervention? 9-round LangChain longitudinal study, 8-task cross-task web benchmark
5. Web Browsing Capability Does density-driven design survive the open web? WebCanvas, BrowseComp-ZH, Custom Tasks (22)

Baselines across these dimensions include Claude Code, OpenAI CodeX, and OpenClaw, evaluated under Claude Sonnet 4.6, Claude Opus 4.6, GPT-5.4, and MiniMax M2.7 backbones.

Tool-use efficiency radar
Tool-use efficiency radar. GA dominates token, request, and tool-call axes while preserving quality across four task dimensions.
Cross-task self-evolution convergence
Cross-task self-evolution. Second- and third-run GA executions converge to a stable low-cost regime across eight web tasks, while OpenClaw shows no such convergence.

📁 Repository Layout

GA-Technical-Report/
├── main.pdf                       ← Full technical report (V1.0)
├── README.md                      ← This file
├── assets/                        ← README visuals (logo, framework, demos, result charts)
└── datasets/                      ← All evaluation datasets used in the report
    ├── sop_bench/                    — SOP-Bench (dangerous goods subset, 20 tasks)
    ├── lifelong_agentbench/          — Lifelong AgentBench (DB-Bench, 20 SQL tasks)
    ├── realfin_benchmark/            — RealFin-Benchmark (40 financial analysis tasks)
    ├── tool_efficiency_benchmark/    — 11 simple + 5 long-horizon tool-use tasks (+ assets & graders)
    ├── locomo/                       — LoCoMo long-conversation memory (10 conversations, ~2k QA)
    └── web_browsing/                 — WebCanvas (12) + BrowseComp-ZH (10) per-task runs vs. OpenClaw

📝 Citing GA

@techreport{generic_agent_2026,
  title        = {GenericAgent: A Self-Evolving LLM Agent via Contextual Information Density Maximization},
  author       = {Jiaqing Liang, Jinyi Han, Weijia Li, Xinyi Wang, Zhoujia Zhang, Zishang Jiang, Ying Liao, Tingyun Li, Ying Huang, Hao Shen, Hanyu Wu, Fang Guo, Keyi Wang, Zhonghua Hong, Zhiyu Lu, Lipeng Ma, Sihang Jiang, Yanghua Xiao
},
  institution  = {Shenzhen Aquaintelling and Technology Fudan University},
  year         = {2026},
  type         = {Technical Report},
  version      = {V1.0},
  url          = {https://github.com/JinyiHan99/GA-Technical-Report}
}

© 2026 Advantage AI Agent Lab (A³ LAB). Released alongside the GenericAgent open-source system at github.com/lsdefine/GenericAgent.

About

Technical Report & Evaluation Datasets for GenericAgent (GA)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors