Skip to content

Bug: Apps extension inner LLM call has no max_tokens, causing truncation and 'missing field html' error #7239

@blackgirlbytes

Description

@blackgirlbytes

Summary

The Apps extension's inner LLM call to generate HTML app content does not set max_tokens, so the API defaults to 8,192 output tokens. This is insufficient for generating any non-trivial HTML app. The response gets truncated (finish_reason: "length"), the html field is never written, and the user sees a cryptic error:

Error: Failed to parse tool response: missing field `html`

Reproduction

  1. Use the Databricks provider with databricks-claude-opus-4-6 (or any provider where the default max output tokens is ~8K)
  2. Ask goose to create a non-trivial app (e.g., "Create a figure skating game app")
  3. The apps__create_app tool returns Error: Failed to parse tool response: missing field html every time

Root Cause

In crates/goose/src/agents/apps_extension.rs, generate_new_app_content() and generate_updated_app_content() call:

let (response, _usage) = provider
    .complete(session_id, &system_prompt, &messages, &tools)
    .await

This uses provider.complete() which inherits the provider's ModelConfig. When max_tokens is None in the config, no max_tokens is sent to the API, and the Databricks/Claude API defaults to 8,192 output tokens.

The inner LLM is asked to call a create_app_content tool where the html field (a complete HTML/CSS/JS app) is serialized as part of the JSON arguments. The LLM generates fields in order — name, description, width, height, resizable — and hits the 8,192 token ceiling before ever reaching html.

Evidence from diagnostic logs — all 3 inner LLM calls show the same pattern:

Inner LLM Call finish_reason output_tokens Has html?
Attempt 1 length 8,192 No
Attempt 2 length 8,192 No
Attempt 3 length 8,192 No

The returned arguments each time:

{"name":"figure-skating-championship","description":"Interactive figure skating game...","height":700,"resizable":false,"width":900}

No html field — the LLM ran out of tokens before starting it.

Suggested Fix

  1. Set a higher max_tokens for the inner LLM call — use complete_with_model() with a ModelConfig that overrides max_tokens to something like 16,384+ (or even higher for complex apps)
  2. Detect truncation — check finish_reason / usage in the response and return a meaningful error instead of the cryptic serde error
  3. Both ideally

Environment

  • Provider: Databricks
  • Model: databricks-claude-opus-4-6
  • max_tokens in config: null (not set)
  • API default: 8,192 output tokens

Metadata

Metadata

Assignees

No one assigned

    Labels

    corePertains to core goose functionality

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions