Bug type
Regression (worked before, now fails)
Summary
normalizeModelCompat() forces supportsUsageInStreaming: false for all non-api.openai.com endpoints, including Azure OpenAI (*.openai.azure.com). This prevents pi-ai from sending stream_options: { include_usage: true }, so Azure never returns usage data in streaming chunks. Result: /status shows Context: 0/128k (0%), totalTokens is always undefined, and memoryFlush can never trigger.
Steps to reproduce
- Configure an Azure OpenAI provider with
api: "openai-completions" and a baseUrl pointing to *.openai.azure.com
- Send a message to the agent
- Run
/status
Expected behavior
/status should show non-zero Context reflecting actual prompt token usage (e.g., Context: 17k/128k (13%)), and memoryFlush should be able to trigger when the threshold is reached.
Actual behavior
/status shows Context: 0/128k (0%). totalTokens in sessions.json is null/undefined for all Azure model sessions. memoryFlush never triggers because the threshold check depends on totalTokens > 0.
OpenClaw version
2026.3.3
Operating system
Ubuntu 24.04 (Docker)
Install method
docker (local build from source)
Logs, screenshots, and evidence
🦞 OpenClaw 2026.3.3
🧠 Model: azure-gpt5mini/gpt-5-mini · 🔑 api-key (models.json)
🗄️ Cache: 100% hit · 16k cached, 0 new
📚 Context: 0/128k (0%) · 🧹 Compactions: 0
Root cause trace:
src/agents/model-compat.ts:14-21 — isOpenAINativeEndpoint() only matches host === "api.openai.com". Azure's *.openai.azure.com returns false.
- Line 79 —
needsForce = true for all non-native endpoints (including Azure).
- Line 91 — Forces
{ supportsDeveloperRole: false, supportsUsageInStreaming: false }, overwriting any user config (even explicit supportsUsageInStreaming: true in models.json).
node_modules/@mariozechner/pi-ai/dist/providers/openai-completions.js:319 — Only adds stream_options: { include_usage: true } when compat.supportsUsageInStreaming !== false. Since it IS false, stream_options is never sent.
- Without
stream_options, Azure doesn't return usage in streaming chunks → usage stays { input: 0, output: 0 }.
src/agents/usage.ts:149 — derivePromptTokens({ input: 0, cacheRead: 0, cacheWrite: 0 }) returns undefined (sum = 0).
src/auto-reply/reply/session-usage.ts:104 — totalTokens = undefined.
After fix — /status correctly shows:
🧮 Tokens: 9.3k in / 401 out · 💵 Cost: $0.0031
🗄️ Cache: 44% hit · 7.3k cached, 0 new
📚 Context: 17k/128k (13%) · 🧹 Compactions: 0
Impact and severity
Affected: All users using Azure OpenAI with api: "openai-completions" (Chat Completions API)
Severity: High — silently breaks context tracking and memoryFlush
Frequency: 100% repro on any Azure OpenAI endpoint
Consequence: /status always shows 0 context; memoryFlush never triggers (threshold depends on totalTokens); users cannot monitor context window utilization
Additional information
Introduced by commit 2671f04 (fix(agents): disable usage streaming chunks on non-native openai-completions, 2026-03-05), which was fixing #8714 — usage-only stream chunks breaking choices[0] parser on some non-native backends. The fix was correct for generic backends but did not account for Azure OpenAI, which fully supports stream_options: { include_usage: true }.
Fix: Add isAzureOpenAIEndpoint() check matching host.endsWith(".openai.azure.com"). For Azure endpoints, only force supportsDeveloperRole: false (Azure rejects developer role) but preserve supportsUsageInStreaming (default true).
Config-only workaround (setting compat.supportsUsageInStreaming: true in models.json) does NOT work because normalizeModelCompat explicitly overrides user-set values for non-native endpoints.
Bug type
Regression (worked before, now fails)
Summary
normalizeModelCompat()forcessupportsUsageInStreaming: falsefor all non-api.openai.comendpoints, including Azure OpenAI (*.openai.azure.com). This prevents pi-ai from sendingstream_options: { include_usage: true }, so Azure never returns usage data in streaming chunks. Result:/statusshowsContext: 0/128k (0%),totalTokensis alwaysundefined, andmemoryFlushcan never trigger.Steps to reproduce
api: "openai-completions"and abaseUrlpointing to*.openai.azure.com/statusExpected behavior
/statusshould show non-zeroContextreflecting actual prompt token usage (e.g.,Context: 17k/128k (13%)), andmemoryFlushshould be able to trigger when the threshold is reached.Actual behavior
/statusshowsContext: 0/128k (0%).totalTokensinsessions.jsonisnull/undefinedfor all Azure model sessions.memoryFlushnever triggers because the threshold check depends ontotalTokens > 0.OpenClaw version
2026.3.3
Operating system
Ubuntu 24.04 (Docker)
Install method
docker (local build from source)
Logs, screenshots, and evidence
Root cause trace:
src/agents/model-compat.ts:14-21—isOpenAINativeEndpoint()only matcheshost === "api.openai.com". Azure's*.openai.azure.comreturnsfalse.needsForce = truefor all non-native endpoints (including Azure).{ supportsDeveloperRole: false, supportsUsageInStreaming: false }, overwriting any user config (even explicitsupportsUsageInStreaming: trueinmodels.json).node_modules/@mariozechner/pi-ai/dist/providers/openai-completions.js:319— Only addsstream_options: { include_usage: true }whencompat.supportsUsageInStreaming !== false. Since it ISfalse,stream_optionsis never sent.stream_options, Azure doesn't return usage in streaming chunks → usage stays{ input: 0, output: 0 }.src/agents/usage.ts:149—derivePromptTokens({ input: 0, cacheRead: 0, cacheWrite: 0 })returnsundefined(sum = 0).src/auto-reply/reply/session-usage.ts:104—totalTokens = undefined.After fix —
/statuscorrectly shows:Impact and severity
Affected: All users using Azure OpenAI with
api: "openai-completions"(Chat Completions API)Severity: High — silently breaks context tracking and memoryFlush
Frequency: 100% repro on any Azure OpenAI endpoint
Consequence:
/statusalways shows 0 context;memoryFlushnever triggers (threshold depends ontotalTokens); users cannot monitor context window utilizationAdditional information
Introduced by commit 2671f04 (
fix(agents): disable usage streaming chunks on non-native openai-completions, 2026-03-05), which was fixing #8714 — usage-only stream chunks breakingchoices[0]parser on some non-native backends. The fix was correct for generic backends but did not account for Azure OpenAI, which fully supportsstream_options: { include_usage: true }.Fix: Add
isAzureOpenAIEndpoint()check matchinghost.endsWith(".openai.azure.com"). For Azure endpoints, only forcesupportsDeveloperRole: false(Azure rejectsdeveloperrole) but preservesupportsUsageInStreaming(defaulttrue).Config-only workaround (setting
compat.supportsUsageInStreaming: trueinmodels.json) does NOT work becausenormalizeModelCompatexplicitly overrides user-set values for non-native endpoints.