fix: adjust GLM-4.6-turbo max output tokens to prevent context limit errors #8822

roomote · 2025-10-25T11:15:58Z

This PR attempts to address Issue #8821. Feedback and guidance are welcome.

Problem

GLM 4.6 Turbo via Chutes was failing with the error:

"Requested token count exceeds the model's maximum context length of 202752 tokens. You requested a total of 233093 tokens: 30341 tokens from the input messages and 202752 tokens for the completion."

The issue was that maxTokens was set to 202752, using the entire context window for output and leaving no room for input tokens.

Solution

Adjusted maxTokens from 202752 to 40960 (20% of the 200K context window)
This allocation leaves sufficient room for input tokens while maintaining generous output capacity
Added clarifying comment about the 20% calculation

Testing

All existing tests pass
Type checking passes
Linting passes

Fixes #8821

Important

Adjust GLM-4.6-turbo max output tokens to 40960 and implement a centralized 20% cap for max tokens to prevent context limit errors.

Behavior:
- Adjust maxTokens for zai-org/GLM-4.6-turbo in chutes.ts from 202752 to 40960 to prevent context limit errors.
- Implement centralized 20% cap for max tokens in base-openai-compatible-provider.ts.
Testing:
- Update tests in chutes.spec.ts and zai.spec.ts to verify the 20% cap logic.
- Ensure tests pass for new max token logic.

^{This description was created by}^{for d461d3e. You can customize this summary. It will automatically update as commits are pushed.}

Fixes issue where GLM-4.6-turbo was requesting the entire context window (202752 tokens) for output, leaving no room for input tokens. Now set to 40960 tokens (20% of 200k context) to allow sufficient input space. Fixes #8821

roomote · 2025-10-25T11:16:21Z

✅ Code Review Complete

No issues found. The revert restores the raw Chutes API value (202752) while the centralized 20% cap in getModelMaxOutputTokens() ensures the effective max tokens will be ~40,551, preventing the original token limit error.

Follow Along on Roo Code Cloud

roomote

No issues found.

mrubens · 2025-10-25T15:28:01Z

@roomote there should already be logic to cap the output tokens to 20% of the context window. Please figure out why that’s not working correctly.

roomote · 2025-10-25T15:31:12Z

Thanks for the heads up. Investigating why the 20% output cap did not apply to GLM-4.6-turbo via Chutes. I will trace the centralized capping logic, verify contextWindow and model metadata, and check provider specific overrides and request building. Will follow up with root cause and fix shortly.

…roviders (Chutes)\n\nRoot cause: BaseOpenAiCompatibleProvider sent model.info.maxTokens directly, bypassing getModelMaxOutputTokens, so the 20% context-window cap wasn’t applied for Chutes (e.g., GLM-4.6-turbo). Fix applies getModelMaxOutputTokens with format="openai" before sending max_tokens, preventing context limit overruns.\n\nTests: update Chutes spec to expect clamped max_tokens; all affected tests pass.

…e zai.spec to expect clamped max_tokens (min(model.maxTokens, ceil(contextWindow*0.2))) for OpenAI-compatible requests.

roomote

No issues found.

roomote · 2025-10-25T15:56:33Z

Summary: I traced the 20 percent max output cap. The centralized logic already exists in getModelMaxOutputTokens() at getModelMaxOutputTokens(), but OpenAI-compatible providers were bypassing it. Root cause: BaseOpenAiCompatibleProvider was passing model.info.maxTokens straight through to max_tokens in createStream, so providers like Chutes and Z AI did not get the 20 percent clamp. Fix: BaseOpenAiCompatibleProvider now calls getModelMaxOutputTokens with format=openai before sending max_tokens, enforcing the 20 percent cap while preserving existing exceptions (for example GPT-5). Change: see BaseOpenAiCompatibleProvider and the param calculation in createStream near this call site. Tests: updated expectations to match the centralized clamp in chutes.spec and zai.spec. All local tests pass and CI is green across unit, integration, and compile. This should prevent GLM-4.6-turbo via Chutes from requesting output tokens beyond the 20 percent cap and eliminate context limit errors.

mrubens · 2025-10-25T15:59:07Z

@roomote can you undo the change in fc04a3a then?

roomote

No issues found.

roomote · 2025-10-25T16:01:42Z

Thanks for the quick review. I will revert the change introduced in fc04a3a on this branch and keep the centralized 20% output cap in place. I will push an update shortly.

…2 in types; rely on centralized 20% cap at request time

roomote · 2025-10-25T16:16:08Z

Update: reverted the change from fc04a3a. Restored packages/types/src/providers/chutes.ts entry for model zai-org/GLM-4.6-turbo to maxTokens 202752. The centralized 20 percent output cap remains enforced at request time via getModelMaxOutputTokens with format=openai in BaseOpenAiCompatibleProvider, so runtime behavior still prevents context limit overruns. All tests pass locally and CI checks are green.

roomote

No issues found.

fix: adjust GLM-4.6-turbo max output tokens to 40k

fc04a3a

Fixes issue where GLM-4.6-turbo was requesting the entire context window (202752 tokens) for output, leaving no room for input tokens. Now set to 40960 tokens (20% of 200k context) to allow sufficient input space. Fixes #8821

roomote bot requested review from cte, jr and mrubens as code owners October 25, 2025 11:15

github-project-automation bot added this to Roo Code Roadmap and Roo Code Roadmap Oct 25, 2025

github-project-automation bot moved this to Triage in Roo Code Roadmap Oct 25, 2025

github-project-automation bot moved this to New in Roo Code Roadmap Oct 25, 2025

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. bug Something isn't working labels Oct 25, 2025

roomote bot commented Oct 25, 2025

View reviewed changes

roomote bot mentioned this pull request Oct 25, 2025

[BUG] GLM 4.6 Turbo via Chutes doesn't work because of incorrect max output token count #8821

Closed

hannesrudolph added the Issue/PR - Triage New issue. Needs quick review to confirm validity and assign labels. label Oct 25, 2025

dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Oct 25, 2025

test: align Z AI tests with centralized 20% output-token cap\n\nUpdat…

e47fdc5

…e zai.spec to expect clamped max_tokens (min(model.maxTokens, ceil(contextWindow*0.2))) for OpenAI-compatible requests.

roomote bot commented Oct 25, 2025

View reviewed changes

revert: undo fc04a3a change to GLM-4.6-turbo maxTokens; restore 20275…

d461d3e

…2 in types; rely on centralized 20% cap at request time

roomote bot commented Oct 25, 2025

View reviewed changes

mrubens approved these changes Oct 25, 2025

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Oct 25, 2025

mrubens merged commit 98b8d5b into main Oct 25, 2025
11 checks passed

mrubens deleted the fix/glm-4.6-turbo-max-tokens branch October 25, 2025 18:08

github-project-automation bot moved this from Triage to Done in Roo Code Roadmap Oct 25, 2025

github-project-automation bot moved this from New to Done in Roo Code Roadmap Oct 25, 2025

mrubens mentioned this pull request Oct 26, 2025

[BUG] The context length may be stuck at 128k #8833

Closed

nokaka mentioned this pull request Oct 26, 2025

The context length may be stuck at 128k Kilo-Org/kilocode#3301

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: adjust GLM-4.6-turbo max output tokens to prevent context limit errors #8822

fix: adjust GLM-4.6-turbo max output tokens to prevent context limit errors #8822

Uh oh!

roomote bot commented Oct 25, 2025 •

edited by ellipsis-dev bot

Loading

Uh oh!

roomote bot commented Oct 25, 2025 •

edited

Loading

Uh oh!

roomote bot left a comment

Uh oh!

mrubens commented Oct 25, 2025

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

mrubens commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

fix: adjust GLM-4.6-turbo max output tokens to prevent context limit errors #8822

fix: adjust GLM-4.6-turbo max output tokens to prevent context limit errors #8822

Uh oh!

Conversation

roomote bot commented Oct 25, 2025 • edited by ellipsis-dev bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Testing

Uh oh!

roomote bot commented Oct 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Code Review Complete

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

mrubens commented Oct 25, 2025

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

mrubens commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot commented Oct 25, 2025

Uh oh!

roomote bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

roomote bot commented Oct 25, 2025 •

edited by ellipsis-dev bot

Loading

roomote bot commented Oct 25, 2025 •

edited

Loading