-
Notifications
You must be signed in to change notification settings - Fork 3.2k
Bug: Apps extension inner LLM call has no max_tokens, causing truncation and 'missing field html' error #7239
Description
Summary
The Apps extension's inner LLM call to generate HTML app content does not set max_tokens, so the API defaults to 8,192 output tokens. This is insufficient for generating any non-trivial HTML app. The response gets truncated (finish_reason: "length"), the html field is never written, and the user sees a cryptic error:
Error: Failed to parse tool response: missing field `html`
Reproduction
- Use the Databricks provider with
databricks-claude-opus-4-6(or any provider where the default max output tokens is ~8K) - Ask goose to create a non-trivial app (e.g., "Create a figure skating game app")
- The
apps__create_apptool returnsError: Failed to parse tool response: missing field htmlevery time
Root Cause
In crates/goose/src/agents/apps_extension.rs, generate_new_app_content() and generate_updated_app_content() call:
let (response, _usage) = provider
.complete(session_id, &system_prompt, &messages, &tools)
.awaitThis uses provider.complete() which inherits the provider's ModelConfig. When max_tokens is None in the config, no max_tokens is sent to the API, and the Databricks/Claude API defaults to 8,192 output tokens.
The inner LLM is asked to call a create_app_content tool where the html field (a complete HTML/CSS/JS app) is serialized as part of the JSON arguments. The LLM generates fields in order — name, description, width, height, resizable — and hits the 8,192 token ceiling before ever reaching html.
Evidence from diagnostic logs — all 3 inner LLM calls show the same pattern:
| Inner LLM Call | finish_reason | output_tokens | Has html? |
|---|---|---|---|
| Attempt 1 | length |
8,192 | No |
| Attempt 2 | length |
8,192 | No |
| Attempt 3 | length |
8,192 | No |
The returned arguments each time:
{"name":"figure-skating-championship","description":"Interactive figure skating game...","height":700,"resizable":false,"width":900}No html field — the LLM ran out of tokens before starting it.
Suggested Fix
- Set a higher
max_tokensfor the inner LLM call — usecomplete_with_model()with aModelConfigthat overridesmax_tokensto something like 16,384+ (or even higher for complex apps) - Detect truncation — check
finish_reason/ usage in the response and return a meaningful error instead of the cryptic serde error - Both ideally
Environment
- Provider: Databricks
- Model:
databricks-claude-opus-4-6 max_tokensin config:null(not set)- API default: 8,192 output tokens