0% found this document useful (0 votes)
48 views16 pages

Agent Building - Anthropic

Anthropic's document discusses the development of effective AI agents, emphasizing the use of simple, composable patterns over complex frameworks. It outlines the distinctions between agentic systems and workflows, providing practical advice on when to use agents and various workflows for building applications with large language models (LLMs). The document concludes with principles for designing agents, highlighting the importance of simplicity, transparency, and careful tool documentation.

Uploaded by

asadsangwana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
48 views16 pages

Agent Building - Anthropic

Anthropic's document discusses the development of effective AI agents, emphasizing the use of simple, composable patterns over complex frameworks. It outlines the distinctions between agentic systems and workflows, providing practical advice on when to use agents and various workflows for building applications with large language models (LLMs). The document concludes with principles for designing agents, highlighting the importance of simplicity, transparency, and careful tool documentation.

Uploaded by

asadsangwana
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

Engineering at Anthropic

Building
effective
agents

Published Dec 19, 2024 We've worked with dozens of teams building LLM
agents across industries. Consistently, the most
successful implementations use simple,
composable patterns rather than complex
frameworks.

Over the past year, we've worked with dozens of teams building large
language model (LLM) agents across industries. Consistently, the most
successful implementations weren't using complex frameworks or
specialized libraries. Instead, they were building with simple, composable
patterns.

In this post, we share what we’ve learned from working with our customers
and building agents ourselves, and give practical advice for developers on
building effective agents.

What are agents?


https://www.anthropic.com/engineering/building-effective-agents Page 1 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

"Agent" can be defined in several ways. Some customers define agents as


fully autonomous systems that operate independently over extended
periods, using various tools to accomplish complex tasks. Others use the
term to describe more prescriptive implementations that follow predefined
workflows. At Anthropic, we categorize all these variations as agentic
systems, but draw an important architectural distinction between
workflows and agents:

Workflows are systems where LLMs and tools are orchestrated through
predefined code paths.
Agents, on the other hand, are systems where LLMs dynamically direct
their own processes and tool usage, maintaining control over how they
accomplish tasks.

Below, we will explore both types of agentic systems in detail. In Appendix 1


(“Agents in Practice”), we describe two domains where customers have found
particular value in using these kinds of systems.

When (and when not) to use agents


When building applications with LLMs, we recommend finding the simplest
solution possible, and only increasing complexity when needed. This might
mean not building agentic systems at all. Agentic systems often trade
latency and cost for better task performance, and you should consider when
this tradeoff makes sense.

When more complexity is warranted, workflows offer predictability and


consistency for well-defined tasks, whereas agents are the better option
when flexibility and model-driven decision-making are needed at scale. For
many applications, however, optimizing single LLM calls with retrieval and
in-context examples is usually enough.

When and how to use frameworks

https://www.anthropic.com/engineering/building-effective-agents Page 2 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

There are many frameworks that make agentic systems easier to implement,
including:

LangGraph from LangChain;


Amazon Bedrock's AI Agent framework;
Rivet, a drag and drop GUI LLM workflow builder; and
Vellum, another GUI tool for building and testing complex workflows.

These frameworks make it easy to get started by simplifying standard low-


level tasks like calling LLMs, defining and parsing tools, and chaining calls
together. However, they often create extra layers of abstraction that can
obscure the underlying prompts and responses, making them harder to
debug. They can also make it tempting to add complexity when a simpler
setup would suffice.

We suggest that developers start by using LLM APIs directly: many patterns
can be implemented in a few lines of code. If you do use a framework, ensure
you understand the underlying code. Incorrect assumptions about what's
under the hood are a common source of customer error.

See our cookbook for some sample implementations.

Building blocks, workflows, and agents


In this section, we’ll explore the common patterns for agentic systems we’ve
seen in production. We'll start with our foundational building block—the
augmented LLM—and progressively increase complexity, from simple
compositional workflows to autonomous agents.

Building block: The augmented LLM


The basic building block of agentic systems is an LLM enhanced with
augmentations such as retrieval, tools, and memory. Our current models can
actively use these capabilities—generating their own search queries,
selecting appropriate tools, and determining what information to retain.

https://www.anthropic.com/engineering/building-effective-agents Page 3 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

The augmented LLM

We recommend focusing on two key aspects of the implementation: tailoring


these capabilities to your specific use case and ensuring they provide an
easy, well-documented interface for your LLM. While there are many ways to
implement these augmentations, one approach is through our recently
released Model Context Protocol, which allows developers to integrate with a
growing ecosystem of third-party tools with a simple client implementation.

For the remainder of this post, we'll assume each LLM call has access to
these augmented capabilities.

Workflow: Prompt chaining


Prompt chaining decomposes a task into a sequence of steps, where each
LLM call processes the output of the previous one. You can add
programmatic checks (see "gate” in the diagram below) on any intermediate
steps to ensure that the process is still on track.

https://www.anthropic.com/engineering/building-effective-agents Page 4 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

The prompt chaining workflow

When to use this workflow: This workflow is ideal for situations where the
task can be easily and cleanly decomposed into fixed subtasks. The main
goal is to trade off latency for higher accuracy, by making each LLM call an
easier task.

Examples where prompt chaining is useful:

Generating Marketing copy, then translating it into a different language.


Writing an outline of a document, checking that the outline meets
certain criteria, then writing the document based on the outline.

Workflow: Routing
Routing classifies an input and directs it to a specialized followup task. This
workflow allows for separation of concerns, and building more specialized
prompts. Without this workflow, optimizing for one kind of input can hurt
performance on other inputs.

https://www.anthropic.com/engineering/building-effective-agents Page 5 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

The routing workflow

When to use this workflow: Routing works well for complex tasks where
there are distinct categories that are better handled separately, and where
classification can be handled accurately, either by an LLM or a more
traditional classification model/algorithm.

Examples where routing is useful:

Directing different types of customer service queries (general questions,


refund requests, technical support) into different downstream
processes, prompts, and tools.
Routing easy/common questions to smaller models like Claude 3.5
Haiku and hard/unusual questions to more capable models like Claude
3.5 Sonnet to optimize cost and speed.

Workflow: Parallelization
LLMs can sometimes work simultaneously on a task and have their outputs
aggregated programmatically. This workflow, parallelization, manifests in
two key variations:

Sectioning: Breaking a task into independent subtasks run in parallel.


Voting: Running the same task multiple times to get diverse outputs.

https://www.anthropic.com/engineering/building-effective-agents Page 6 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

The parallelization workflow

When to use this workflow: Parallelization is effective when the divided


subtasks can be parallelized for speed, or when multiple perspectives or
attempts are needed for higher confidence results. For complex tasks with
multiple considerations, LLMs generally perform better when each
consideration is handled by a separate LLM call, allowing focused attention
on each specific aspect.

Examples where parallelization is useful:

Sectioning:
Implementing guardrails where one model instance processes user
queries while another screens them for inappropriate content or
requests. This tends to perform better than having the same LLM call
handle both guardrails and the core response.
Automating evals for evaluating LLM performance, where each LLM
call evaluates a different aspect of the model’s performance on a
given prompt.

Voting:
Reviewing a piece of code for vulnerabilities, where several different
prompts review and flag the code if they find a problem.
Evaluating whether a given piece of content is inappropriate, with
multiple prompts evaluating different aspects or requiring different
vote thresholds to balance false positives and negatives.

https://www.anthropic.com/engineering/building-effective-agents Page 7 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

Workflow: Orchestrator-workers
In the orchestrator-workers workflow, a central LLM dynamically breaks
down tasks, delegates them to worker LLMs, and synthesizes their results.

The orchestrator-workers workflow

When to use this workflow: This workflow is well-suited for complex tasks
where you can’t predict the subtasks needed (in coding, for example, the
number of files that need to be changed and the nature of the change in each
file likely depend on the task). Whereas it’s topographically similar, the key
difference from parallelization is its flexibility—subtasks aren't pre-defined,
but determined by the orchestrator based on the specific input.

Example where orchestrator-workers is useful:

Coding products that make complex changes to multiple files each time.
Search tasks that involve gathering and analyzing information from
multiple sources for possible relevant information.

Workflow: Evaluator-optimizer
In the evaluator-optimizer workflow, one LLM call generates a response
while another provides evaluation and feedback in a loop.

https://www.anthropic.com/engineering/building-effective-agents Page 8 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

The evaluator-optimizer workflow

When to use this workflow: This workflow is particularly effective when we


have clear evaluation criteria, and when iterative refinement provides
measurable value. The two signs of good fit are, first, that LLM responses can
be demonstrably improved when a human articulates their feedback; and
second, that the LLM can provide such feedback. This is analogous to the
iterative writing process a human writer might go through when producing a
polished document.

Examples where evaluator-optimizer is useful:

Literary translation where there are nuances that the translator LLM
might not capture initially, but where an evaluator LLM can provide
useful critiques.
Complex search tasks that require multiple rounds of searching and
analysis to gather comprehensive information, where the evaluator
decides whether further searches are warranted.

Agents
Agents are emerging in production as LLMs mature in key capabilities—
understanding complex inputs, engaging in reasoning and planning, using
tools reliably, and recovering from errors. Agents begin their work with
either a command from, or interactive discussion with, the human user.
Once the task is clear, agents plan and operate independently, potentially

https://www.anthropic.com/engineering/building-effective-agents Page 9 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

returning to the human for further information or judgement. During


execution, it's crucial for the agents to gain “ground truth” from the
environment at each step (such as tool call results or code execution) to
assess its progress. Agents can then pause for human feedback at
checkpoints or when encountering blockers. The task often terminates upon
completion, but it’s also common to include stopping conditions (such as a
maximum number of iterations) to maintain control.

Agents can handle sophisticated tasks, but their implementation is often


straightforward. They are typically just LLMs using tools based on
environmental feedback in a loop. It is therefore crucial to design toolsets
and their documentation clearly and thoughtfully. We expand on best
practices for tool development in Appendix 2 ("Prompt Engineering your
Tools").

Autonomous agent

When to use agents: Agents can be used for open-ended problems where it’s
difficult or impossible to predict the required number of steps, and where
you can’t hardcode a fixed path. The LLM will potentially operate for many
turns, and you must have some level of trust in its decision-making. Agents'
autonomy makes them ideal for scaling tasks in trusted environments.

The autonomous nature of agents means higher costs, and the potential for

https://www.anthropic.com/engineering/building-effective-agents Page 10 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

compounding errors. We recommend extensive testing in sandboxed


environments, along with the appropriate guardrails.

Examples where agents are useful:

The following examples are from our own implementations:

A coding Agent to resolve SWE-bench tasks, which involve edits to many


files based on a task description;
Our “computer use” reference implementation, where Claude uses a
computer to accomplish tasks.

High-level flow of a coding agent

Combining and customizing these patterns

https://www.anthropic.com/engineering/building-effective-agents Page 11 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

These building blocks aren't prescriptive. They're common patterns that


developers can shape and combine to fit different use cases. The key to
success, as with any LLM features, is measuring performance and iterating
on implementations. To repeat: you should consider adding complexity only
when it demonstrably improves outcomes.

Summary
Success in the LLM space isn't about building the most sophisticated system.
It's about building the right system for your needs. Start with simple
prompts, optimize them with comprehensive evaluation, and add multi-step
agentic systems only when simpler solutions fall short.

When implementing agents, we try to follow three core principles:

1. Maintain simplicity in your agent's design.


2. Prioritize transparency by explicitly showing the agent’s planning steps.
3. Carefully craft your agent-computer interface (ACI) through thorough
tool documentation and testing.

Frameworks can help you get started quickly, but don't hesitate to reduce
abstraction layers and build with basic components as you move to
production. By following these principles, you can create agents that are not
only powerful but also reliable, maintainable, and trusted by their users.

Acknowledgements
Written by Erik Schluntz and Barry Zhang. This work draws upon our
experiences building agents at Anthropic and the valuable insights shared by
our customers, for which we're deeply grateful.

Appendix 1: Agents in practice


Our work with customers has revealed two particularly promising

https://www.anthropic.com/engineering/building-effective-agents Page 12 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

applications for AI agents that demonstrate the practical value of the


patterns discussed above. Both applications illustrate how agents add the
most value for tasks that require both conversation and action, have clear
success criteria, enable feedback loops, and integrate meaningful human
oversight.

A. Customer support
Customer support combines familiar chatbot interfaces with enhanced
capabilities through tool integration. This is a natural fit for more open-
ended agents because:

Support interactions naturally follow a conversation flow while


requiring access to external information and actions;
Tools can be integrated to pull customer data, order history, and
knowledge base articles;
Actions such as issuing refunds or updating tickets can be handled
programmatically; and
Success can be clearly measured through user-defined resolutions.

Several companies have demonstrated the viability of this approach through


usage-based pricing models that charge only for successful resolutions,
showing confidence in their agents' effectiveness.

B. Coding agents
The software development space has shown remarkable potential for LLM
features, with capabilities evolving from code completion to autonomous
problem-solving. Agents are Skip Skip
to main
to footer
particularly content because:
effective

Code solutions are verifiable through automated tests;


Agents can iterate on solutions using test results as feedback;
The problem space is well-defined and structured; and
Output quality can be measured objectively.

In our own implementation, agents can now solve real GitHub issues in the
SWE-bench Verified benchmark based on the pull request description alone.
However, whereas automated testing helps verify functionality, human
review remains crucial for ensuring solutions align with broader system

https://www.anthropic.com/engineering/building-effective-agents Page 13 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

requirements.

Appendix 2: Prompt engineering your tools


No matter which agentic system you're building, tools will likely be an
important part of your agent. Tools enable Claude to interact with external
services and APIs by specifying their exact structure and definition in our
API. When Claude responds, it will include a tool use block in the API
response if it plans to invoke a tool. Tool definitions and specifications
should be given just as much prompt engineering attention as your overall
prompts. In this brief appendix, we describe how to prompt engineer your
tools.

There are often several ways to specify the same action. For instance, you
can specify a file edit by writing a diff, or by rewriting the entire file. For
structured output, you can return code inside markdown or inside JSON. In
software engineering, differences like these are cosmetic and can be
converted losslessly from one to the other. However, some formats are much
more difficult for an LLM to write than others. Writing a diff requires
knowing how many lines are changing in the chunk header before the new
code is written. Writing code inside JSON (compared to markdown) requires
extra escaping of newlines and quotes.

Our suggestions for deciding on tool formats are the following:

Give the model enough tokens to "think" before it writes itself into a
corner.
Claude API Solutions Research Commitments Learn News Try Claude
Keep the format close to what the model has seen naturally occurring in
text on the internet.
Make sure there's no formatting "overhead" such as having to keep an
accurate count of thousands of lines of code, or string-escaping any code
it writes.

One rule of thumb is to think about how much effort goes into human-
computer interfaces (HCI), and plan to invest just as much effort in creating
good agent-computer interfaces (ACI). Here are some thoughts on how to do

https://www.anthropic.com/engineering/building-effective-agents Page 14 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

so:

Put yourself in the model's shoes. Is it obvious how to use this tool, based
on the description and parameters, or would you need to think carefully
about it? If so, then it’s probably also true for the model. A good tool
definition often includes example usage, edge cases, input format
requirements, and clear boundaries from other tools.
How can you change parameter names or descriptions to make things
more obvious? Think of this as writing a great docstring for a junior
developer on your team. This is especially important when using many
similar tools.
Test how the model uses your tools: Run many example inputs in our
workbench to see what mistakes the model makes, and iterate.
Poka-yoke your tools. Change the arguments so that it is harder to make
mistakes.

While building our agent for SWE-bench, we actually spent more time
optimizing our tools than the overall prompt. For example, we found that the
model would make mistakes with tools using relative filepaths after the
agent had moved out of the root directory. To fix this, we changed the tool to
always require absolute filepaths—and we found that the model used this
method flawlessly.

Product Research Commitments Learn Help and


security
Claude Research Transparency Anthropic
overview overview Academy Status
Responsible
Claude Code Economic scaling policy Customer Availability
Index stories
Claude team Security and Support center
plan compliance Engineering at
Claude Anthropic
Claude Terms and
models
enterprise plan Solutions policies
Claude 3.7 Explore
Claude AI agents Privacy choices
Sonnet
education plan About us Privacy policy
Claude 3.5 Coding
Download Become a Responsible
Haiku Customer
Claude apps partner

https://www.anthropic.com/engineering/building-effective-agents Page 15 of 16
Building Effective AI Agents \ Anthropic 2025-05-11, 2:07 PM

support disclosure
Claude.ai Claude 3 Careers
policy
pricing plans Opus Terms of
News
service -
Claude.ai login
consumer

API Platform Terms of


service -
API overview commercial
Developer docs Usage policy
Claude in
Amazon
Bedrock

Claude on
Google Cloud's
Vertex AI

Pricing

Console login

© 2025 Anthropic PBC

https://www.anthropic.com/engineering/building-effective-agents Page 16 of 16

You might also like