Building a minimal AI agent from scratch
for software engineering, terminal use, and more
So you want to build your own AI agent from scratch? The good news: It's super simple, especially with more recent language models. We won't be using any external packages (other than to query the LM), and our initial minimal agent is only some 60 lines long.
And if you think this is too simplified and can never work in practice: Our mini agent is built exactly the same, and is used for research at Princeton, Stanford, NVIDIA, Anyscale, essentials.ai and more.
Using this simple guide you can score up to 74% on SWE-bench verified, only a few percent below highly optimized agents.
Our first prototype in 50 lines¶
Let's get started: From a top level view, an AI agent is just a big loop: You start with a prompt, the agent proposes an action, you execute the action, tell the LM the output, and then repeat.
To keep track of what have happend we continue to append to the messages list.
Pseudocode:
messages = [{"role": "user", "content": "Help me fix the ValueError in main.py"}]
while True:
lm_output = query_lm(messages)
print("LM output", output)
messages.append({"role": "assistant", "content": lm_output}) # remember what the LM said
action = parse_action(lm_output) # separate the action from output
print("Action", action)
if action == "exit":
break
output = execute_action(action)
print("Output", output)
messages.append({"role": "user", "content": output}) # send command output back
What's up with the role field?
The role field indicates who sent the message in the conversation. Common roles are:
"user"- Messages from the user/human"assistant"- Messages from the AI model"system"- System prompts that set context/instructions
Different LM APIs may have slightly different conventions for how to structure these messages.
So to get this to work, we only need to implement two things:
- Querying the LM API (this can get annoying if you want to support all LMs, or want detailed cost information, but is very simple if you already know which model you want)
- Parsing the action (
parse_action). You don't need this if you use the tool calling functionality of your LM if it supports it, but this is more provider-specific, so we wo'nt cover it in this guide for now (don't worry, the performance should not be impacted by this). - Executing the action (very simple, in our case we will simply execute any action of the LM as a bash-command in the terminal).
Querying the LM¶
Let's start with the first step. Click on the tabs to find the right LM for you.
litellm supports most specific LMs, so this is a good default option if your LM is not explicitly mentioned.
Install the OpenAI package (docs):
Here's the minimal code to query the API:
Install the Anthropic package (docs):
Here's the minimal code to query the API:
Install the OpenAI package (OpenRouter docs) - OpenRouter uses OpenAI-compatible API:
Here's the minimal code to query the API:
from openai import OpenAI
client = OpenAI(
base_url="https://openrouter.ai/api/v1",
api_key="your-api-key-here"
) # or set OPENROUTER_API_KEY env var
def query_lm(messages):
response = client.chat.completions.create(
model="anthropic/claude-3.5-sonnet", # or any model on OpenRouter
messages=messages
)
return response.choices[0].message.content
Install the LiteLLM package (docs) - supports 100+ LLM providers:
Here's the minimal code to query the API:
Install the Zhipu AI package (docs):
Here's the minimal code to query the API:
How to set environment variables
Instead of hardcoding your API key in the code, you can set it as an environment variable. Note that these commands set the variable only for your current terminal session (not persistent).
export OPENAI_API_KEY="your-api-key-here"
export ANTHROPIC_API_KEY="your-api-key-here"
export GOOGLE_API_KEY="your-api-key-here"
To make persistent, add to ~/.bashrc, ~/.zshrc, or your shell config file.
Alternatively, you can use a .env file with python-dotenv.
Type hints in python
In case you're wondering about the list[dict[str, str]] and the -> str in the
previous code example, these are "type hints" and they are optional in python,
but they help your IDE or static checker (or even just yourself) to understand
the inputs and outputs of the function.
Let's test it
Here's a quick test to verify your LM query function works:
You should see the model's response, something like a dice roll result or explanation!
In production
If you want to see how this is done in production, check out the model classes in mini:
Parse the action¶
Let's parse the action. There's two simple ways in which the LM can "encode" the action (again, you don't need this if you use tool calls, but in this tutorial we'll keep it simpler):
For most models, either way works well and we recommend using triple backticks. However, some models (especially small or open source models) are slightly less general and you might try either. Here's a quick regular expression to parse the action:
Understanding the regular expression
r"..."- Raw string: Therprefix makes it a raw string, so backslashes are treated literally. Without it, you'd need to write\\ninstead of\n,\\sinstead of\s, etc.(.*?)- Capturing group with non-greedy matching: The parentheses()capture the content we want to extract. The.*?matches any characters, but?makes it stop at the first closing pattern (non-greedy) rather than the last.re.DOTALLflag - Makes.match newlines too, allowing multi-line commands to be captured.
findall returns only what's inside the parentheses, not the surrounding markers.
In production
If you want to see how this is done in production, check out the parse_action implementation in default.py in mini-swe-agent.
Execute the action¶
Now as for executing the action, it's actually very simple, we can just use python's subprocess module (or just os.system, though that's generally less recommended)
import subprocess
import os
def execute_action(command: str) -> str:
"""Execute action, return output"""
result = subprocess.run(
command,
shell=True,
text=True,
env=os.environ,
encoding="utf-8",
errors="replace",
stdout=subprocess.PIPE,
stderr=subprocess.STDOUT,
timeout=30,
)
return result.stdout
Understanding subprocess.run arguments
Let's break down the keyword arguments we're using:
shell=True- Allows running arbitrary shell commands given as a string (likecd,ls, pipes, etc.). Be careful with untrusted input!text=True- Returns output as strings instead of bytesenv=os.environ- Passes the current environment variables to the subprocessencoding="utf-8"- Specifies UTF-8 encoding for text outputerrors="replace"- Replaces invalid characters instead of raising errorsstdout=subprocess.PIPE- Captures standard outputstderr=subprocess.STDOUT- Redirects stderr to stdout (so we capture both in one stream)timeout=30: Stop executing after
In production
If you want to see how this is done in production, check out mini-swe-agent's environment classes:
- Local environment - the closest equivalent to the code above
- Docker environment - almost the same as local, except commands are executed via
docker execinstead ofsubprocess.run
There are a couple of limitations to this:
- The agent will not be able to
cdto a different environment - The agent cannot persist environment variables easily
However, in practice, we have found these limitations to be not very limiting at all.
In fact, reducing the amount of hidden state and forcing the agent to work with absolute paths might well be helpful for language models in many instances.
It is also similar with ClaudeCode (while it can change directories, it cannot persist environment variables, because it similarly uses subshells to execute commands).
Add a system prompt¶
We still need to tell the LM a bit more about how to behave:
messages = [{
"role": "system",
"content": "You are a helpful assistant. When you want to run a command, wrap it in ```bash-action\n<command>\n```. To finish, run the exit command."
}
]
Let's put it together & run it!¶
You should now have code that looks something like this (this example uses litellm + triple backticks):
Let's make it more robust¶
The following sections are some tweaks to improve performance.
Nothing fancy, just making sure that the agent doesn't get stuck and can deal with things that go wrong.
This section is a bit more advanced.
Instead of showing the complete code at the end, we encourage everyone to check out the source code of our mini agent; it includes all of these features with very little fluff around it (also see the next section to get started with reading the code).
Dealing with exceptions in the control flow¶
The idea here: Whenever a known exception arises (timeouts, format errors, etc.), let's just tell the LM and let it handle it itself. This means adapting our while loop a bit:
while True:
try:
# previous content
except Exception as e:
messages.append({"role": "user", "content": str(e)})
That's it!
For example, if the agent does something stupid (like calling vim) and a TimeoutError is triggered, this will cause the error message to be appended to the messages and the
LM can pick up from there, hopefully realizing what it did wrong.
However, we might only limit this behavior to some known problems or add more information to the message. In this case, we can be more specific, for example
class OurTimeoutError(RuntimeError): ...
def execute_action(action: str) -> str:
try:
# as before
except TimeoutError as e:
raise OurTimeoutError("Your last command time out, you might want to ...") from e
and just like this, we've added additional information for the LM.
You might also want to be more specific with what exceptions are handed to the LM and which just cause the program to crash. In this case it might make sense to define a custom exception class and only catch that in the while loop:
class NonterminatingException(RuntimeError): ...
class OurTimeoutError(NonterminatingException): ...
while True:
try:
...
except NonterminatingException as e:
...
mini-swe-agent additionally defines a TerminatingException class which is used instead of the if action == "exit" mechanism to stop the while loop in a graceful way:
class TerminatingException(RuntimeError): ...
class Submitted: ... # agent wants to stop
def execute_action(action: str) -> str:
if action == "exit":
raise TerminatingException("LM requested to quit")
...
while True:
try:
...
except NonterminatingException as e:
...
except TerminatingException as e:
print("Stopping because of ", str(e))
break
Handling malformatted outputs¶
Sometimes (especially with weaker LMs), the LM will not properly format it's action. It's good to remind it about the correct way in that case: This should be very straightforward now that we have the general exception handling in place:
incorrect_format_message = """Your output was malformated.
Please include exactly 1 action formatted as in the following example:
```bash-action
ls -R
```
"""
class FormatError(RuntimeError): ...
def parse_action(action: str) -> str:
matches = ...
if not len(matches) == 1:
raise FormatError(incorrect_format_message)
...
Environment variables¶
There's a couple of environment variables that we can set to disable interactive
elements in command line tools that avoid the agent getting stuck (you can see them being set in the mini-swe-agent SWE-bench config):
env_vars = {
PAGER: cat
MANPAGER: cat
LESS: -R
PIP_PROGRESS_BAR: 'off'
TQDM_DISABLE: '1'
}
# ...
def execute_action(command: str) -> str:
# ...
result = subprocess.run(
command,
# ...
env=os.environ | env_vars
# ...
)
mini-swe-agent¶
mini-swe-agent is built exactly according to the blueprint of this tutorial and it should be very easy for you to understand it's source code.
The only important thing to note is that it is built more modular, so that you can swap out all components.
The Agent class (full code) contains the big while loop in the run function
class Agent:
def __init__(self, model, environment):
self.model = model
self.environment = environment
...
def run(self, task: str):
while True:
...
The model class (example for litellm) handles different LMs
and the environment class (local environment) executes actions:
mini-swe-agent provides different environment classes that for example allow to execute actions in docker containers instead of directly in your local environment.
Sonds more complicated? It really isn't: all we do is switch from subprocess.run to calls to docker exec.
Contribute to this guide¶
We welcome contributions on GitHub to improve this guide!
Contribution
The following PRs will be merged immediately
- Bug fixes
- Typo fixes
The following PRs are much appreciated and will most likely be merged fast:
- Adding support to popular LMs that aren't mentioned yet (please make sure test your implementation)
The following things should be discussed first (via github issue):
- Additional sections
- Significant expansions of sections
Please understand that the larger your changes are, the more time we will need to review and the less likely it is we can accept them (unless we discussed beforehand).
To contribute:
- Fork the repository
- Make your changes
- Submit a pull request
You can find the source code on GitHub.
If you have questions or comments, please comment below. Note that GitHub issues are still preferred for bug reports and discussions about further developing this page.