Skip to content

trajectoryRL/trajectoryRL

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

428 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

TrajectoryRL

Bittensor Subnet 11 — Optimize AI agent policies through decentralized competition

License: MIT Python 3.10+ Bittensor

TrajectoryRL is a Bittensor subnet where miners compete to optimize AI agent policies for real-world tasks. Validators evaluate policy packs using deterministic scenarios, rewarding agents that are safe, efficient, and reliable.

Overview

┌──────────────────────────────────────────────────────────────┐
│                   TRAJECTORYRL SUBNET (SN11)                 │
│                                                              │
│  MINERS                              VALIDATORS              │
│  ┌───────────────┐                   ┌───────────────────┐   │
│  │ Upload        │   on-chain        │ Read commitments  │   │
│  │ pack.json to  │   commitment      │ from chain        │   │
│  │ public HTTP   │─────────────────> │                   │   │
│  │ endpoint      │                   │ Fetch packs via   │   │
│  └───────────────┘                   │ HTTP, verify      │   │
│        │                             │ hash + timestamp  │   │
│        │                             │                   │   │
│        │                             │ Evaluate via      │   │
│        │                             │ ClawBench         │   │
│        │                             └───────────────────┘   │
│        │                                      │              │
│        │                                      │ set_weights  │
│        ▼                                      ▼              │
│  ┌──────────────────────────────────────────────────────┐    │
│  │              BITTENSOR BLOCKCHAIN                    │    │
│  │   Commitments, weights, TAO rewards                  │    │
│  └──────────────────────────────────────────────────────┘    │
└──────────────────────────────────────────────────────────────┘
  • No server required — Miners upload packs to any HTTP endpoint and commit metadata on-chain. No public IP, no uptime needed.
  • Two-phase evaluationClawBench scenarios with fixed fixtures; LLM-as-judge scores trajectories against natural-language criteria (Phase 1: pack integrity, Phase 2: trajectory quality)
  • Content-addressed — Packs identified by SHA256 hash, verified against on-chain commitment
  • Winner-take-all — Best miner gets 100% of rewards; first-mover advantage protects early innovators
  • Anti-copy — On-chain block timestamps + NCD similarity detection + first-mover threshold (delta=0.05)

See INCENTIVE_MECHANISM.md for full scoring, rewards, and anti-gaming details.

Example ROI (1,000 tasks/day)

Unoptimized GLM-5:                       $12,300/month

Stage 1 — Prompt optimization (AGENTS.md tuning):
  Optimized prompts + stop rules:         $3,300/month  (73% reduction)

Stage 2 — Hybrid routing (AGENTS.md + injected skills):
  Multi-LLM dynamic routing:               $900/month  (93% reduction)
    ├─ Qwen 3.5 (Alibaba) handles 40% of sub-tasks (tool calls, lookups)
    ├─ GLM-5 (Z.ai) handles 25% (structured extraction, formatting)
    ├─ Gemini 3 Flash (Google) handles 20% (search, summarization)
    ├─ GPT-5.2 (OpenAI) handles 10% (reasoning, drafting)
    └─ Claude Opus 4.6 (Anthropic) handles 5% (complex judgment calls)

Quick Start

For Validators

Validators run via Docker with automatic updates from GHCR via Watchtower. When new code is pushed to prod, GitHub Actions builds a new image and Watchtower auto-pulls and restarts within 5 minutes.

1. Prerequisites (one-time)

# Install btcli
pip install bittensor-cli

# Create or import your wallet
btcli wallet create --wallet-name my-validator

# Register hotkey on SN11 (~0.2 TAO burn fee)
btcli subnets register --wallet-name my-validator --hotkey default --netuid 11

# Stake alpha so your weights count (must be top 64 by stake for validator permit)
btcli stake add --wallet-name my-validator --hotkey default --netuid 11 --amount 100

2. Configure environment

cat > .env.validator <<'EOF'
WALLET_NAME=my-validator
WALLET_HOTKEY=default
NETUID=11
NETWORK=finney
CLAWBENCH_LLM_API_KEY=your-api-key
CLAWBENCH_LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
CLAWBENCH_DEFAULT_MODEL=zhipu/glm-5
EOF

Supported providers (any OpenAI-compatible API works):

Provider CLAWBENCH_LLM_BASE_URL CLAWBENCH_DEFAULT_MODEL
Zhipu AI (default) https://open.bigmodel.cn/api/paas/v4 zhipu/glm-5
Chutes https://llm.chutes.ai/v1 chutes/zai-org/GLM-5-TEE
OpenRouter https://openrouter.ai/api/v1 openrouter/zhipu/glm-5
Variable Required Description
WALLET_NAME Yes Bittensor wallet name
WALLET_HOTKEY Yes Hotkey name (usually default)
NETUID Yes Subnet UID (11)
NETWORK Yes finney, test, or local
CLAWBENCH_LLM_API_KEY Yes API key for the LLM provider (e.g. Zhipu AI, Chutes, OpenRouter)
CLAWBENCH_LLM_BASE_URL Yes Base URL for the OpenAI-compatible API
CLAWBENCH_DEFAULT_MODEL Yes LLM model for evaluation (default: zhipu/glm-5)
JUDGE_MODEL No LLM model for judge (defaults to CLAWBENCH_DEFAULT_MODEL)
JUDGE_API_KEY No API key for judge (defaults to CLAWBENCH_LLM_API_KEY)
JUDGE_BASE_URL No Base URL for judge (defaults to CLAWBENCH_LLM_BASE_URL)

3. Start validator

# Start validator + Watchtower (auto-updates from GHCR)
docker compose -f docker/docker-compose.validator.yml --env-file .env.validator up -d

# View logs
docker compose -f docker/docker-compose.validator.yml logs -f validator

The Docker container reads wallet keyfiles from the mounted ~/.bittensor/wallets/ directory. No btcli is needed inside the container.

Tip: Watchtower checks for new images every 5 minutes. To update immediately:

docker compose -f docker/docker-compose.validator.yml pull
docker compose -f docker/docker-compose.validator.yml --env-file .env.validator up -d

See VALIDATOR_OPERATIONS.md for cost model, auto-update details, and operational guidance.

For Miners

Mining means writing policy packs — system prompts, tool usage rules, and stop conditions — that make AI agents perform tasks safely and cheaply. No GPU, no server, no uptime required.

IP Notice: All policy packs submitted to TrajectoryRL are published to public repositories and licensed under the MIT License. By submitting a pack, you agree that your submission is freely available for anyone — including TrajectoryRL, other miners, and third parties — to use, modify, and redistribute. Do not submit content you are not willing to release publicly under MIT.

1. Prerequisites (one-time)

pip install bittensor-cli

btcli wallet create --wallet-name my-miner
btcli subnets register --wallet-name my-miner --hotkey default --netuid 11

2. Configure environment

cat > .env.miner <<'EOF'
WALLET_NAME=my-miner
WALLET_HOTKEY=default
NETUID=11
NETWORK=finney
LLM_API_KEY=your-api-key
LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4/
LLM_MODEL=zhipu/glm-5
EOF

Tip: Any OpenAI-compatible provider works. For OpenRouter, use LLM_BASE_URL=https://openrouter.ai/api/v1 and LLM_MODEL=zhipu/glm-5.

3. Start mining

git clone https://github.com/trajectoryRL/trajectoryRL.git
cd trajectoryRL
pip install -e .

# Run in default mode: generates AGENTS.md → builds pack → uploads → submits
python neurons/miner.py run --mode default

Note: Simply letting the LLM randomly generate AGENTS.md may not get you a good score. You need to actively optimize and improve your policy pack — study the ClawBench scenarios, understand what makes an agent perform well, and iteratively refine your prompts, tool rules, and stop conditions.

4. Manual operations (optional)

# Build pack from your own AGENTS.md
python neurons/miner.py build --agents-md ./AGENTS.md -o pack.json

# Validate pack locally
python neurons/miner.py validate pack.json

# Check on-chain status
python neurons/miner.py status

5. Local testing with ClawBench

cd clawbench
pip install -e .
# Set CLAWBENCH_LLM_API_KEY, CLAWBENCH_LLM_BASE_URL, CLAWBENCH_DEFAULT_MODEL in .env
# Example Zhipu:      CLAWBENCH_LLM_BASE_URL=https://open.bigmodel.cn/api/paas/v4/, CLAWBENCH_DEFAULT_MODEL=zhipu/glm-5
# Example Chutes:     CLAWBENCH_LLM_BASE_URL=https://llm.chutes.ai/v1,              CLAWBENCH_DEFAULT_MODEL=chutes/zai-org/GLM-5-TEE
# Example OpenRouter: CLAWBENCH_LLM_BASE_URL=https://openrouter.ai/api/v1,           CLAWBENCH_DEFAULT_MODEL=openrouter/zhipu/glm-5

# Test a single scenario
python scripts/run_episode.py --scenario inbox_triage --variant optimized --json

# Test all scenarios
python scripts/run_batch.py

See MINER_OPERATIONS.md for full details: automated mode, S3 upload, pack format, and scoring targets.

trajrl CLI

A standalone CLI for querying live subnet data — validators, miners, scores, submissions, and eval logs. Designed for both humans and AI agents (Claude Code, Cursor, Codex, OpenClaw, Manus).

pip install trajrl

trajrl status                       # Network health overview
trajrl validators                   # List all validators
trajrl scores                       # Per-miner scores (auto-picks validator)
trajrl miner --uid <uid>            # Miner detail + diagnostics
trajrl download -u <uid>            # Download miner's pack + eval results
trajrl submissions --failed         # Recent failed submissions
trajrl logs --show                  # Download and display latest cycle log
trajrl logs --type cycle            # List cycle log archives

Outputs JSON automatically when piped, Rich tables when interactive. See trajrl/README.md for full documentation.

Documentation

Community

License

This project is licensed under the MIT License.

All miner-submitted policy packs are public and released under the same MIT License. By participating as a miner, you acknowledge that your submissions become open-source contributions available to everyone.


Built on Bittensor | Powered by ClawBench

About

Bittensor Subnet 11 - Decentralized Reinforcement Learning for optimizing agent trajectories, making agents cheaper, faster, and more reliable.

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors