Environments
An environment is the harness an agent operates in. It packages tools (what agents can do) and scenarios (how agents are evaluated) into a single deployable unit.Scenarios
A scenario defines how an agent is evaluated. It is an async generator function with two yields — the first yield sends a prompt to the agent, and the second yield returns a reward. Here’s the simplest possible scenario:| Section | Where | What it does |
|---|---|---|
| Setup (optional) | Before the first yield | Seed a database, navigate to a URL, prepare initial state |
| Prompt | The first yield | Sends instructions to the agent; receives the agent’s answer |
| Scoring | After the first yield, ending with the second yield | Checks results and returns a reward between 0.0 and 1.0 |
Tools
Tools are functions that an agent can call while it’s working on a task. You define a tool by decorating a function with@env.tool():
Pre-built Tools
Most real environments don’t need custom tools from scratch. HUD ships a library of standard tools you can compose into complex environments. Computer use environment — give an agent mouse, keyboard, and screenshot control:Connectors
Connectors let you pull external tools into your HUD environment. If you have tools defined somewhere else — another HUD environment, an external MCP server, or an existing API — connectors bring them in so your agent can use them alongside your own tools.Tasks
A task is a scenario instantiated with specific arguments. It’s what you actually run an agent against:How They Fit Together
- An Environment contains Tools and Scenarios
- A Scenario + arguments = a Task
- Tasks group into Tasksets
- Run a taskset across models -> collect Traces with rewards
- Use the traces to compare models, sell training data to labs, or fine-tune your own agent
Running an Agent Against a Task
Thehud.eval() context manager is how you run any agent against a task:
create_agent() is a convenience that picks the right agent class for each model. You can also bring your own agent loop:
Advanced topics
HUD covers a lot of use cases for building environments at scale and agent development.| Topic | What it is | When you’ll need it |
|---|---|---|
| Harbor conversion | Importing external benchmarks | Migrating existing benchmarks |
| REST API | Programmatic platform access | Custom integrations |
| Framework integrations | LangChain, CrewAI, AutoGen, etc. | When using those frameworks |
| Chat scenarios | Multi-turn conversational agents | Building chat products |
| AgentTool | Hierarchical sub-agent delegation | Complex multi-agent workflows |
| Slack integration | Running agents from Slack | Team workflows |
Next Steps
Quick Start
Install and run your first environment
Environments
Tools, scenarios, and local development
Best Practices
Patterns for reliable environments and evals
Tasks & Training
Run evaluations and train models