HUD Documentation — Evaluations and RL Environments.

You’ve built an environment with tools and scenarios. Deploy it to the platform and you can run evals at scale—hundreds of parallel runs across models, all traced, all generating training data.

hud deploy

The simplest path. One command builds and deploys your environment directly to HUD:

hud deploy

This:

Packages your build context (respects .dockerignore)
Uploads to HUD’s build service
Builds remotely via AWS CodeBuild
Streams logs in real-time
Links this directory to the deployed environment

Once complete, your environment appears on the platform:

See your environment’s tools, scenarios, and builds at hud.ai/environments. For full details on managing environments through the platform UI, see Platform Environments.

Rebuilding

Run hud deploy again in the same directory. HUD reads .hud/deploy.json to find your existing environment and builds a new version:

hud deploy  # v0.1.0
# make changes...
hud deploy  # v0.1.1

Configuration

Environment Variables, Build Args & Secrets

Three flags for different purposes:

Flag	When	Use For
`--env` / `-e`	Runtime	API keys, config
`--build-arg`	Build time	Repo URLs, build modes
`--secret`	Build time (not stored in image)	Private repo tokens

# Runtime env vars (encrypted, injected when container runs)
hud deploy -e API_KEY=secret

# Build args (for Dockerfile ARG directives)
hud deploy --build-arg REPO_URL=https://github.com/org/repo

# Build secrets (for private repos, not baked into image)
hud deploy --secret id=GITHUB_TOKEN,env=GITHUB_TOKEN

See hud deploy reference for full details.

GitHub Auto-Deploy

For teams and CI/CD, connect a GitHub repository. HUD rebuilds automatically when you push:

Go to hud.ai → New → Environment
Click Connect GitHub and install the HUD GitHub App
Select your repository and branch
Push changes—rebuilds happen automatically

This is better for long-term projects because:

CI/CD integration: Rebuilds on every push to your branch
Team collaboration: Anyone with repo access can trigger deploys
Version history: See which commit each build came from
Rollback: Deploy previous commits if needed

Switching Between Methods

Started with hud deploy but want GitHub integration later? Just connect the same repo on the platform. HUD links them by environment ID. Going the other way? Use hud sync env to connect a local directory to an existing platform environment:

hud sync env my-env-name

This links your local directory and verifies your scenarios match the deployed environment.

Comparison

Feature	`hud deploy`	GitHub Integration
Setup	One command	Connect repo on platform
Rebuilds	Manual (`hud deploy`)	Automatic on push
Best for	Solo dev, quick iteration	Teams, CI/CD
Env vars / Build args / Secrets	CLI flags	Platform settings

Both methods result in the same deployed environment. Choose based on your workflow.

Running Externally

Every HUD image supports scenario operations via hud scenario. Setup and grading are shell commands; agents interact with tools via the MCP server at :8080/mcp.

The default Dockerfile CMD uses --stdio for the HUD platform. For external use, override the command to start an HTTP server:

With Docker

# Build and push the image to a registry
hud build .
docker tag my-env:latest <your-registry>/my-env:latest
docker push <your-registry>/my-env:latest

# Start the environment with HTTP server (overrides default stdio CMD)
docker run -d --name my-env -p 8080:8080 my-image:latest \
  hud dev env:env --port 8080

# List available scenarios
docker exec my-env hud scenario list

# Setup a scenario (prints the prompt)
docker exec my-env hud scenario setup count \
  --args '{"text": "strawberry", "letter": "r"}'

# Your agent runs against MCP tools at localhost:8080/mcp

# Grade (prints reward as JSON)
docker exec my-env hud scenario grade count --answer "3"

# Test graders without an agent (setup + grade in one shot)
docker exec my-env hud scenario run count \
  --args '{"text": "mississippi", "letter": "s"}' --answer "4"

With a Sandbox SDK (Python)

Any platform that can run a Docker image and exec into it works. Here are two options:

Daytona
Modal

Daytona spins up HUD images as sandboxed workspaces:

import json
from daytona import Daytona, CreateSandboxFromImageParams

daytona = Daytona()
sandbox = daytona.create(CreateSandboxFromImageParams(
    image="my-image:latest",
    language="python",
))

# Setup — returns the prompt
prompt = sandbox.process.exec(
    'hud scenario setup count --args \'{"text": "strawberry", "letter": "r"}\''
).result

# Agent runs against MCP tools at the sandbox
# ... your agent loop here ...

# Grade
reward = json.loads(sandbox.process.exec(
    'hud scenario grade count --answer "3"'
).result)

daytona.delete(sandbox)

Modal runs containers serverlessly with GPU support:

import json
import modal

image = modal.Image.from_registry("my-image:latest")
app = modal.App("hud-eval")
sandbox = modal.Sandbox.create(image=image, app=app)

# Setup
prompt = sandbox.exec("hud", "scenario", "setup", "count",
    "--args", '{"text": "strawberry", "letter": "r"}').stdout.read()

# Agent runs against MCP tools
# ... your agent loop here ...

# Grade
reward = json.loads(sandbox.exec("hud", "scenario", "grade",
    "count", "--answer", "3").stdout.read())

sandbox.terminate()

The same pattern works on Kubernetes (kubectl exec), E2B, Fly.io, or any platform that runs containers.

What’s Next

Testing & Evaluation

Define tasks, test locally, sync to platform, run at scale

Platform Environments

Full platform environment guide

Best Practices

Patterns for good environments, evals, and grading

Get Started

Building Environments

Running Agents

Advanced

SDK Reference

Tools Reference

Cookbooks

CLI Reference

Community

Hosted Running

hud deploy

Rebuilding

Configuration

GitHub Auto-Deploy

Switching Between Methods

Comparison

Running Externally

With Docker

With a Sandbox SDK (Python)

What’s Next

Testing & Evaluation

Platform Environments

Best Practices

Get Started

Building Environments

Running Agents

Advanced

SDK Reference

Tools Reference

Cookbooks

CLI Reference

Community

​hud deploy

​Rebuilding

​Configuration

​GitHub Auto-Deploy

​Switching Between Methods

​Comparison

​Running Externally

​With Docker

​With a Sandbox SDK (Python)

​What’s Next

Testing & Evaluation

Platform Environments

Best Practices

hud deploy

Rebuilding

Configuration

GitHub Auto-Deploy

Switching Between Methods

Comparison

Running Externally

With Docker

With a Sandbox SDK (Python)

What’s Next