Zhaoyang Wang1,
Canwen Xu2,
Boyi Liu2,
Yite Wang2,
Siwei Han1,
Zhewei Yao2,
Huaxiu Yao1,
Yuxiong He2
1UNC-Chapel Hill ย 2Snowflake AI Research ย
Agent World Model (AWM) is a fully synthetic environment generation pipeline that synthesizes 1,000 executable, SQL database-backed tool-use environments exposed via unified MCP interface for large-scale multi-turn agentic reinforcement learning.
- Mar 16, 2026: we added the verification demo, please refer to Verification section!
- Feb 10, 2026: we open-sourced the synthesis pipeline, 1,000 synthesized environments and RL trained agents at Huggingface!
The AWM synthesis pipeline incldues:
- Start from a high-level scenario (e.g., "an online shopping platform")
- Generate user tasks that serve as functional requirements
- Synthesize a SQLite database (schema + sample data) as the state backend
- Generate a Python interface layer (FastAPI + MCP) as the action/observation space
- Generate verification code that inspects database state changes for reward signals
We released the syntheszied 1,000 executable environments and corresponding tasks, databases, and verification in huggingface. Please checkout huggingface repo at Snowflake/AgentWorldModel-1K.
| Resource | Link |
|---|---|
| ๐ Paper | ๐ arxiv.org/abs/2602.10090 |
| ๐ป Code | ๐ป Snowflake-Labs/agent-world-model |
| ๐ฆ AgentWorldModel-1K | ๐ค Snowflake/AgentWorldModel-1K |
| ๐ค Arctic-AWM-4B | ๐ค Snowflake/Arctic-AWM-4B |
| ๐ค Arctic-AWM-8B | ๐ค Snowflake/Arctic-AWM-8B |
| ๐ค Arctic-AWM-14B | ๐ค Snowflake/Arctic-AWM-14B |
If you want to directly use our synthesized environments, please download by
hf download Snowflake/AgentWorldModel-1K --repo-type dataset --local-dir ./outputs/Then you can skip to Environment Management and Agent Demo to start using the environments with the agent demo.
Run uv sync to setup the python environment. And set your LLM API credentials:
# OpenAI or any other compatible services
export AWM_SYN_LLM_PROVIDER="openai"
export OPENAI_API_KEY="your-api-key"
# optional, if you are using a custom base url
export OPENAI_BASE_URL="http://xxxxxx"
# Azure OpenAI
export AWM_SYN_LLM_PROVIDER="azure"
export AZURE_ENDPOINT_URL="https://your-endpoint.openai.azure.com/"
export AZURE_OPENAI_API_KEY="your-api-key"
# configure the model/LLM for synthesis
export AWM_SYN_OVERRIDE_MODEL="your-model-name such as gpt-5"All synthesis is exposed through the awm command-line tool. Run awm --help to see available commands:
awm --help
Available commands:
gen Synthesis pipeline commands
โโโ scenario Generate scenario names from seed set
โโโ task Generate user tasks per scenario
โโโ db Generate database schema and create SQLite databases
โโโ sample Generate and insert sample data into databases
โโโ spec Generate API specification for each scenario
โโโ env Generate MCP environment code
โโโ verifier Generate verification code for tasks
โโโ all Run the full synthesis pipeline
env Environment management commands
โโโ start Start MCP server for a scenario
โโโ check Check if an MCP server is running and list its tools
โโโ check_all Check all generated environments
โโโ reset_db Reset databases to initial state
agent Run a tool-use agent to solve a task by interacting with the environment
verify Verify agent run outputs using code-augmented LLM-as-a-Judge or purely code-based Judge
Use awm <command> --help to see options for any command, e.g. awm gen task --help.
We start with a seed set of scenarios and generate 1,000 unique scenario descriptions. Note that only the names are used as seeds; the descriptions are included in the seed file for ease of use.
export EMBEDDING_OPENAI_API_KEY="your-api-key for the embedding model"
awm gen scenario \
--input_path outputs/seed_scenario.jsonl \
--output_path outputs/gen_scenario.jsonl \
--target_count 1000We generate 10 tasks per scenario, which are also serving as the requirements for building the environment.
awm gen task \
--input outputs/gen_scenario.jsonl \
--output outputs/gen_tasks.jsonlWe define the database schema and complete the initial state to fully support the generated tasks.
# database schema
awm gen db \
--input outputs/gen_tasks.jsonl \
--output outputs/gen_db.jsonl
# sample data for initial state
awm gen sample \
--input_task outputs/gen_tasks.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_sample.jsonlWe first generate API spec for better generating the Python code of the environment in MCP interface.
# API spec (interface schema)
awm gen spec \
--input_task outputs/gen_tasks.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_spec.jsonl
# Environment code
awm gen env \
--input_spec outputs/gen_spec.jsonl \
--input_db outputs/gen_db.jsonl \
--output outputs/gen_envs.jsonlWe provide two options for verification:
- code-augmented LLM-as-a-Judge (
sql) - purely code-based Judge (
code)
awm gen verifier \
--mode sql \
--input_task outputs/gen_tasks.jsonl \
--output outputs/gen_verifier.jsonlRun and check each environment. The MCP endpoint will be available at http://localhost:8001/mcp.
# Reset databases to initial state
awm env reset_db \
--input_db outputs/gen_db.jsonl \
--input_sample outputs/gen_sample.jsonl
# Start MCP server for a scenario
awm env start \
--scenario "scenario_name" \
--envs_load_path outputs/gen_envs.jsonl \
--port 8001
# Check if MCP server is running
awm env check --url http://localhost:8001/mcp
# Batch test all generated environments
awm env check_all --output outputs/gen_envs.jsonlAWM includes a simple agent demo that connects to an MCP environment to solve tasks via multi-turn tool calling. Please start the environment and use vLLM to serve the model before running the agent.
# serve the model
vllm serve Snowflake/Arctic-AWM-4B --host 127.0.0.1 --port 8000
# start the environment, this will create an isolated folder outputs/servers/<timestamp> to save the environment related files such as initial.db, final.db, and etc.
awm env start --scenario e_commerce_33 --envs_load_path outputs/gen_envs.jsonl --port 8001
# run the agent
awm agent \
--task "show me the top 10 most expensive products" \
--mcp_url http://localhost:8001/mcp \
--api_url http://localhost:8000/v1 \
--model Snowflake/Arctic-AWM-4BAWM supports two types of verification:
sql, the recommended code-augmented LLM-as-a-Judge (requires LLM env vars, see Setup section)code, purely code-based Judge
# launch an isolated environment and run the agent to finish the task of corresponding scenario
awm agent \
--scenario e_commerce_33 \
--task_id 0 \
--api_url http://localhost:8000/v1 \
--model Snowflake/Arctic-AWM-4B
# after interaction, the trajectory will be saved to outputs/agents/<timestamp>, we can verify it by
awm verify --input outputs/agents/<timestamp> --mode sqlIf you find this work useful, please kindly cite:
@article{wang2026agentworldmodelinfinity,
title={Agent World Model: Infinity Synthetic Environments for Agentic Reinforcement Learning},
author={Zhaoyang Wang and Canwen Xu and Boyi Liu and Yite Wang and Siwei Han and Zhewei Yao and Huaxiu Yao and Yuxiong He},
year={2026},
eprint={2602.10090},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2602.10090},
}