AI token proxy powered by Oracle DB 26ai Free. Sits between your apps and upstream APIs (Anthropic, OpenAI-compatible), tracking every token, caching semantically similar prompts, enforcing budgets, and routing requests to the right model.
TokenWatch is a transparent HTTP proxy that intercepts AI API traffic. Point your apps at localhost:8877 instead of the upstream API, and TokenWatch handles the rest.
Every request flows through a pipeline: budget check, semantic cache lookup, smart routing, upstream forwarding, and structured logging. All data lands in Oracle DB 26ai Free, which also powers vector similarity search for the prompt cache.
v2 bolts on Oracle AI Vector Search for semantic caching, a budget kill switch, model routing, A/B testing, prompt replay, and OpenTelemetry export. The original transparent proxy and cost tracking are still there, just running on a proper database now.
- Semantic prompt caching via Oracle AI Vector Search (cosine similarity matching, configurable threshold)
- Budget enforcement with kill switch (per-app, per-model, per-tag limits with hard/warn/notify actions)
- Smart model routing (route requests by app, tag, content pattern, or time-of-day)
- A/B testing framework (split traffic between models, compare cost/latency/quality)
- Prompt replay and regression testing (re-run stored prompts against different models)
- Cost attribution and tagging (tag requests by feature, team, or session for granular cost breakdowns)
- Real-time WebSocket dashboard (4-tab UI with live updates, no polling)
- OpenTelemetry export (push traces and metrics to any OTLP-compatible collector)
- Multi-provider failover (automatic retry across upstream endpoints with health tracking)
Pick a strong system password and reuse that value in the setup command below.
docker run -d --name oracle-free -p 1521:1521 -e ORACLE_PWD='YOUR_ORACLE_SYSTEM_PASSWORD' container-registry.oracle.com/database/free:latest # pragma: allowlist secretIf you already have an oracle-free container running on a different host port, use that mapped port in the DSN instead of assuming 1521:
docker port oracle-free 1521
# Example output: 0.0.0.0:1523TokenWatch does not ship with weak default DB credentials anymore. Replace the placeholder passwords below with your real values and create a dedicated user first:
docker exec oracle-free bash -lc "cat <<'SQL' | sqlplus -s system/YOUR_ORACLE_SYSTEM_PASSWORD@localhost:1521/FREEPDB1 # pragma: allowlist secret
whenever sqlerror exit failure
begin
execute immediate 'create user TOKENWATCH identified by YOUR_TOKENWATCH_DB_PASSWORD'; -- pragma: allowlist secret
exception
when others then
if sqlcode != -1920 then raise; end if;
end;
/
alter user TOKENWATCH identified by YOUR_TOKENWATCH_DB_PASSWORD; -- pragma: allowlist secret
grant connect, resource to TOKENWATCH;
grant create session, create table, create view, create sequence, create procedure to TOKENWATCH;
exit
SQL"If your container is mapped to another host port, keep the SQL command as-is inside the container, but export the host-side DSN with the mapped port.
pip install -e .
export TOKENWATCH_ORACLE_DSN="localhost:1521/FREEPDB1"
export TOKENWATCH_ORACLE_USER="TOKENWATCH"
export TOKENWATCH_ORACLE_PASSWORD="YOUR_TOKENWATCH_DB_PASSWORD" # pragma: allowlist secret
tokenwatch startBy default, TokenWatch binds to 127.0.0.1. If you need another interface, pass --host or set TOKENWATCH_HOST.
Point your apps at the proxy:
- Anthropic:
http://127.0.0.1:8877/anthropic - OpenAI:
http://127.0.0.1:8877/openai
Open http://127.0.0.1:8878 for the dashboard.
All settings via environment variables or .env file:
| Variable | Default | Description |
|---|---|---|
TOKENWATCH_ORACLE_DSN |
localhost:1521/FREEPDB1 |
Oracle DB connection string |
TOKENWATCH_ORACLE_USER |
"" |
Oracle DB username. Required. No insecure default. |
TOKENWATCH_ORACLE_PASSWORD |
"" |
Oracle DB password. Required. No insecure default. |
TOKENWATCH_HOST |
127.0.0.1 |
Bind host for proxy and dashboard |
TOKENWATCH_PROXY_PORT |
8877 |
Proxy listen port |
TOKENWATCH_DASHBOARD_PORT |
8878 |
Dashboard listen port |
TOKENWATCH_ANTHROPIC_URL |
https://api.anthropic.com |
Anthropic upstream URL |
TOKENWATCH_OPENAI_URL |
https://api.z.ai |
OpenAI-compatible upstream URL |
TOKENWATCH_CONNECT_TIMEOUT |
10 |
Connection timeout (seconds) |
TOKENWATCH_OVERALL_TIMEOUT |
300 |
Overall request timeout (seconds) |
TOKENWATCH_CACHE_ENABLED |
false |
Enable semantic prompt caching |
TOKENWATCH_CACHE_TTL |
86400 |
Cache entry TTL (seconds) |
TOKENWATCH_CACHE_SIMILARITY_THRESHOLD |
0.05 |
Vector distance threshold for cache hits (lower = stricter) |
TOKENWATCH_STORE_PROMPTS |
false |
Store prompt and response bodies for replay |
TOKENWATCH_REDACT_STORED_PROMPTS |
true |
Redact obvious secrets and emails before stored payloads are written |
TOKENWATCH_PROMPT_RETENTION_DAYS |
30 |
Days to retain stored prompts |
TOKENWATCH_BUDGET_ENABLED |
true |
Enable budget enforcement |
TOKENWATCH_OTEL_ENABLED |
false |
Enable OpenTelemetry export |
TOKENWATCH_OTEL_ENDPOINT |
http://localhost:4317 |
OTLP collector endpoint |
TOKENWATCH_OTEL_SERVICE_NAME |
tokenwatch |
Service name for traces |
| Command | Description |
|---|---|
tokenwatch start |
Start proxy + dashboard servers |
tokenwatch explain-request |
Show how TokenWatch would route, tag, budget-check, and forward a request without sending it |
tokenwatch stats |
Show token usage statistics |
tokenwatch tail |
Show recent requests |
tokenwatch status |
Check if proxy is running |
tokenwatch reset |
Clear all usage data |
tokenwatch budget set |
Set a spending budget (per-app, per-model, per-tag) |
tokenwatch budget status |
Show all budgets and current spend |
tokenwatch budget remove |
Remove a budget |
tokenwatch cost by-tag |
Cost breakdown by feature tag |
tokenwatch cost by-app |
Cost breakdown by source application |
tokenwatch cost by-session |
Most expensive conversations |
tokenwatch cost forecast |
Project future spending from recent trends |
tokenwatch route add |
Add a smart routing rule |
tokenwatch route list |
Show all routing rules |
tokenwatch route enable/disable |
Toggle a routing rule |
tokenwatch cache stats |
Show cache hit rate and size |
tokenwatch cache clear |
Clear cached responses |
tokenwatch ab create |
Create an A/B test between two models |
tokenwatch ab list |
Show active A/B tests |
tokenwatch ab report |
Show A/B test comparison report |
tokenwatch ab pause/complete |
Pause or finish an A/B test |
tokenwatch replay |
Replay stored prompts against a different model |
tokenwatch upstream add |
Add an upstream provider endpoint |
tokenwatch upstream list |
Show all upstream endpoints |
tokenwatch upstream remove |
Remove an upstream endpoint |
The web dashboard at http://127.0.0.1:8878 has 4 tabs:
- Overview: real-time token usage charts, request table, and live request feed.
- Cost Intelligence: cost by tag, app, and session, plus a forecast panel.
- Experiments: A/B test status and replay placeholder UI.
- System: routing stats, cache stats, budgets, and upstream health.
The dashboard uses WebSocket live events plus a periodic refresh loop for summary panels.
Every request passes through the same pipeline:
- Budget gate: checks matching budgets. If a blocking budget is over limit, the request gets a
429before it leaves the proxy. - Cache lookup: if semantic caching is enabled and the request is cache-eligible, TokenWatch checks exact then semantic matches.
- Tag resolution: explicit
X-TokenWatch-Tagwins. Otherwise TokenWatch can auto-tag from configuredtag_appandtag_regexrules. - Smart router: routing rules are evaluated in priority order and can switch the model or upstream.
- A/B splitter: if no routing rule already changed the model, an active A/B test can choose the final model.
- Upstream forward: the request goes to the selected provider. If one upstream connect attempt fails or times out, TokenWatch tries the next candidate.
- Log and store: the response is parsed for token counts, cost is estimated, and everything is written to Oracle DB. If prompt storage is enabled, stored payloads are redacted by default before replay data is written.
- Broadcast: a summary is pushed to all connected WebSocket clients for the live dashboard.
| Model | Input ($/1M tokens) | Output ($/1M tokens) |
|---|---|---|
| Claude Opus 4.6 | $15.00 | $75.00 |
| Claude Sonnet 4.5 | $3.00 | $15.00 |
| Claude Haiku 4.5 | $0.80 | $4.00 |
| GLM-4.7 | $0.60 | $0.60 |
| GLM-4.7 Flash | $0.06 | $0.06 |