A curated list of tools, frameworks, and resources for agent harness engineering — the discipline of designing environments, constraints, and feedback loops that make AI coding agents reliable at scale.
An agent harness is the infrastructure that wraps around an LLM coding agent. It's everything except the model itself: session management, context delivery, tool design, architectural enforcement, failure recovery, and human oversight.
OpenAI's Harness Engineering blog defined the term: "When a software engineering team's primary job is no longer to write code, but to design environments, specify intent, and build feedback loops that allow agents to do reliable work." Their team built 1M+ lines of production code with zero human-written lines using this approach.
Anthropic's Claude Code team discovered the same principles from the tool design side: the harness matters more than the model. Fewer, more expressive tools beat a long menu of narrow ones. Progressive disclosure — letting the agent recursively discover context across layers — outperforms loading everything upfront. "Designing an agent's action space is as much an art as it is a science."
From the two seminal references above:
- Humans steer, agents execute — Engineers design environments and review outcomes, not write code
- Repository knowledge is the system of record — If it's not in the repo, it doesn't exist to the agent. Slack threads, Google Docs, and tribal knowledge are invisible
- AGENTS.md is a table of contents, not an encyclopedia — Point to deeper sources of truth; don't dump everything in one file
- Enforce architecture mechanically — Custom linters, structural tests, and CI checks replace code review for invariants
- Agent legibility is the goal — Optimize code for agent readability first, human readability second
- Fewer tools, more expressiveness — Progressive disclosure and composable primitives beat sprawling toolkits
- See like an agent — Read the model's outputs, watch where it struggles, and evolve the harness accordingly
- Corrections are cheap, waiting is expensive — At high agent throughput, fix-forward beats blocking merge gates
- What is an Agent Harness?
- Core Principles
- Full Lifecycle Platforms
- Agent Orchestrators
- Task Runners
- Agent Harness Frameworks
- Agent Runtimes
- Coding Agents
- Requirements & Spec Tools
- Standards & Protocols
- Methodologies & Workflows
- Reference & Knowledge
- Contributing
Tools that span from requirements to delivery with human-in-the-loop approval.
- Chorus — Agent harness for requirements-to-delivery. Task DAGs, sub-agent orchestration (Agent Teams), proof of work, human approval gates. AI proposes, humans verify.
- GitHub Agentic Workflows — GitHub Actions with coding agent engines (Copilot, Claude Code, Codex). Issue → agent → PR with sandboxing and permissions.
- Almirant — Operating system for human-agent teams. Persistent context across sessions, shared memory between agents, structured task lifecycle (plan → implement → review → deploy), and human approval gates. Designed for teams where humans and agents work together continuously — not just one-shot task execution.
Orchestrators solve the throughput problem: at high agent velocity, you need parallel execution with worktree isolation so agents don't step on each other. As OpenAI found, "corrections are cheap, waiting is expensive" — these tools maximize concurrent agent throughput.
- Vibe Kanban — Kanban-based orchestrator with git worktree isolation per agent. Supports 10+ coding agents. Enforces the "one agent, one worktree" pattern that keeps parallel execution clean.
- Emdash — Open-source Agentic Development Environment (YC W26). Runs parallel agents in isolated worktrees, locally or over SSH — making the "corrections are cheap" principle practical for remote teams.
- Warp — Agentic development environment built for coding with multiple AI agents.
- Oh My OpenCode — Performance optimization harness for OpenCode with 44 lifecycle hooks.
- Everything Claude Code — Skills, instincts, memory, and security harness for Claude Code and Codex.
- Desplega Agent Swarm — Open-source multi-agent orchestration framework. Coordinates specialized AI agents (Claude Code-powered) through task delegation, session continuity, shared memory, and service discovery. Features include epics, scheduling, Slack integration, and cross-agent communication channels.
- Composio Agent Orchestrator — Agentic orchestrator for parallel coding agents. Plans tasks, spawns agents in isolated worktrees, autonomously handles CI fixes, merge conflicts, and code reviews.
- Oh My AG — Multi-agent harness for Google Antigravity with 6 specialized agents.
Task runners bridge the gap between issue trackers and coding agents. They embody the "humans steer, agents execute" principle: a human (or PM agent) creates the issue, the runner spawns an agent, and the output is a PR ready for review.
- Symphony — OpenAI's reference implementation of harness engineering. A daemon that polls Linear issues, spawns isolated Codex agents per task, and delivers PRs. Embodies "humans steer, agents execute" at scale.
- Baton — Go implementation of Symphony. Polls Linear for claimable issues, spawns isolated Codex workspaces per issue, streams workflow prompts, and cleans up on completion.
- Linear Coding Agent Harness — Linear → autonomous coding agent → PR pipeline.
- GitHub Copilot Coding Agent — Built-in GitHub issue → Copilot agent → PR.
- Axon — Kubernetes-native framework. Apply a Task CRD, get back a PR and cost in USD. TaskSpawner watches GitHub Issues.
- Dexto — Coding agent and general agent harness for building agentic applications.
Frameworks for building custom harnesses. Following the principle that "fewer tools, more expressiveness" beats sprawling toolkits, these provide composable primitives rather than opinionated workflows.
- Deep Agents — Agent harness built on LangChain/LangGraph. Implements progressive disclosure through planning tools and subagent spawning — agents discover context layer by layer rather than loading everything upfront.
- Gambit — Framework for building, running, and verifying LLM workflows.
- Harness Kit — Patterns and engineering practices for building with AI agents.
- Desloppify — Agent harness focused on making AI-generated code well-engineered.
- Bridle — TUI/CLI config manager for agent harnesses (Amp, Claude Code, OpenCode, Goose, Copilot CLI, Droid).
- DeerFlow 2.0 — ByteDance's open-source SuperAgent harness. Skill system with on-demand loading, sub-agent orchestration, sandboxed execution, and persistent memory. Built on LangGraph/LangChain.
- Zylos — Persistent agent harness for Claude Code. Tiered memory system, skill-based progressive disclosure, multi-channel communication bridge, task scheduler, and activity monitor — enabling autonomous, long-running agents that remember across sessions.
The persistent infrastructure layer. Agent runtimes give coding agents long-running capabilities they lack natively: persistent memory, cron scheduling, multi-channel messaging, and sub-agent spawning. If orchestrators solve throughput and task runners solve issue-to-PR, runtimes solve "how does an agent stay alive and connected between tasks."
- OpenClaw — AI agent runtime. Orchestrates agents across messaging channels with skill system, sub-agent spawning, and persistent session management.
The execution layer. In harness engineering, the agent is a commodity — the harness is the differentiator. These agents write code; everything above them determines whether that code is useful.
- Claude Code — Anthropic's coding agent. The team's own harness pioneered "seeing like an agent" — progressive disclosure via skill files, fewer composable tools over many narrow ones. Agent Teams enables multi-agent coordination. The Claude Agent SDK extends the harness beyond coding.
- Codex — OpenAI's coding agent. Cloud and CLI modes.
- OpenCode — Open-source coding agent with a plugin system (44 lifecycle hooks), server mode HTTP API, and TypeScript SDK. The most extensible harness integration point for custom workflows.
- Gemini CLI — Google's CLI coding agent.
- Kiro CLI — AWS's CLI coding agent with spec-driven workflow.
- Amp — Sourcegraph's coding agent.
- Cursor — Anysphere's coding agent IDE. New Automations feature (March 2026) enables event-triggered agent launches from code changes, Slack messages, or timers.
- GitHub Copilot CLI — GitHub's CLI coding agent.
- Aider — AI pair programming in your terminal.
The planning layer addresses the biggest harness gap: agents can write code, but someone has to decide what to build. "Repository knowledge is the system of record" — these tools generate the specs and requirements that agents consume.
- Kiro IDE — AWS's spec-driven development IDE. Generates structured specs and manages requirements.
- OpenSpec — Spec-driven development CLI. Generate structured specs from natural language.
- Spec Kit — GitHub's spec generation toolkit.
- agents.md — Open standard for project-level agent instructions. Following the principle that "AGENTS.md is a table of contents, not an encyclopedia" — it should point to deeper sources of truth.
- Pencil — MCP-enabled design canvas inside VSCode/Cursor. Design files live in the repo under Git version control, bridging visual spec to code generation. Closed source.
- Open Pencil — Open-source AI-native design editor (MIT). 75+ tools and an MCP server let coding agents read/write .fig files headlessly.
- MCP (Model Context Protocol) — Open standard for connecting AI models to external tools and data sources.
- agents.md — Open standard for project-level agent configuration.
- AGENTS.md — OpenAI's convention for repository-level agent instructions.
- GitAgent — Git-native, framework-agnostic standard for defining AI agents. Your repo is the agent: agent.yaml manifest + SOUL.md identity + RULES.md constraints.
- ACP (Agent Communication Protocol) — Open protocol for agent-to-agent and agent-to-harness communication. Enables interoperability across coding agents and orchestrators.
- HXA-Connect — B2B messaging hub for AI agents. WebSocket-based real-time communication with org-scoped authentication, collaboration threads, @mention routing, and reply-to threading. Enables multi-agent coordination across different harnesses.
Development methodologies and workflow definitions designed for agentic software development.
- AI-DLC Workflows — AWS's AI-Driven Development Life Cycle. A three-phase adaptive workflow (understand → plan → build) implemented as agent rules for Amazon Q, Claude Code, and other coding agents. Generates structured specs, enforces quality gates, and keeps humans in control. Based on the AI-DLC methodology.
- Harness Engineering: Leveraging Codex in an Agent-First World — OpenAI's defining blog post. How they built 1M+ lines with zero human-written code. Introduced the concepts of repository knowledge as system of record, progressive context disclosure, and mechanical architecture enforcement.
- Lessons from Building Claude Code: Seeing Like an Agent — Thariq (Claude Code lead) on designing agent action spaces. Fewer tools beat more tools. Progressive disclosure outperforms upfront loading. The harness must evolve with the model.
- Building Effective Agents — Anthropic's guide: simple, composable patterns beat complex frameworks.
- Building an AI-Native Engineering Team — OpenAI's guide to structuring teams around AI-first workflows.
- The Emerging "Harness Engineering" Playbook — How OpenAI is retooling engineering teams.
- Conductors to Orchestrators: The Future of Agentic Coding — O'Reilly overview of the orchestration landscape.
- My LLM Coding Workflow Going into 2026 — Addy Osmani's specs-first approach.
- How the Claude Code Team Designs Agent Tools — Analysis of progressive disclosure and tool subtraction in agent design.
- Your Agent Needs a Harness, Not a Framework — Inngest on why agents need runtime harnesses over frameworks. References OpenClaw and pi coding-agent patterns.
- Harness Engineering: Why Agent Context Isn't Enough — Hugo Bowne-Anderson on why the environment matters more than the model.
- Harness Engineering Is Cybernetics — George traces harness engineering back to Watt's governor and Wiener's cybernetics. The pattern repeats: sensor + actuator close the loop at a new layer. LLMs close it at the architectural layer — but only if you externalize your judgment into machine-readable specs.
- Agent Harness vs Agent Framework — Tony Kipkemboi (ex-CrewAI) maps agent development on a spectrum: raw API → framework (CrewAI/LangChain) → harness (OpenClaw). Frameworks give building blocks, harnesses give turnkey systems.
- The Anatomy of an Agent Harness — LangChain's Vivek Trivedy breaks down the harness into core components: state, tool execution, feedback loops, and enforceable constraints. Derives each piece from what models can't do out of the box.
- The Future of Coding is Agents — Andrej Karpathy (YC) — Landmark talk on the trajectory from assistants to agents.
- Agentic Coding — Armin Ronacher — Creator of Flask on adopting agentic workflows in practice.
- 12 Rules of Harness Engineering — Practical rules derived from OpenAI's approach.
- agent-harness — Principles, checklists, and invariants for AI coding workflows.
- harness-kit — Engineering patterns for building with AI agents.
Contributions welcome! When suggesting additions, include:
- A one-line description of what the tool does
- Why it belongs in this list (which layer of the stack it addresses)
- New entries should be appended to the end of their respective category, not inserted at the top