fix: return ContextLengthExceeded when prompt exceeds effective KV cache size by DOsinga · Pull Request #7815 · block/goose

DOsinga · 2026-03-11T19:07:13Z

Problem

When using local inference with llama.cpp, if the prompt token count exceeds the effective context size (which can be capped by context_limit, n_ctx_train, or available memory), the KV cache is allocated with fewer slots than the prompt requires. Attempting to prefill more tokens than the cache can hold causes llama.cpp to return an opaque error:

Execution error: Prefill decode failed: Decode Error 1: NoKvCacheSlot

This happens because validate_and_compute_context only checked the prompt against the memory-based limit, but not against the final effective_ctx value. For example, with prompt_token_count = 5000 and effective_ctx = 4096, a 4096-token KV cache is created and then 5000 tokens are fed into it — failing around token 4096.

Fix

Add a guard in validate_and_compute_context that checks prompt_token_count >= effective_ctx and returns a clear ContextLengthExceeded error with an actionable message, instead of letting the decode fail with an opaque KV cache error.

Testing

All existing inference_engine unit tests pass
cargo clippy --all-targets -- -D warnings clean
cargo fmt clean

…che size When the prompt token count exceeds the effective context size (capped by context_limit, n_ctx_train, or available memory), the KV cache is allocated with fewer slots than needed. Attempting to prefill more tokens than the cache can hold causes llama.cpp to return a NoKvCacheSlot error. Add a guard in validate_and_compute_context that checks prompt_token_count against effective_ctx and returns a clear ContextLengthExceeded error instead of letting the decode fail with an opaque KV cache error.

* main: (270 commits) test(acp): align provider and server test parity (#7822) fix(acp): register MCP extensions when resuming a session (#7806) fix(goose): load .gitignore in prompt_manager for hint file filtering (#7795) fix: remap max_completion_tokens to max_tokens for OpenAI-compatible providers (#7765) fix(openai): preserve Responses API tool call/output linkage (#7759) chore(deps): bump @hono/node-server from 1.19.9 to 1.19.11 in /evals/open-model-gym/mcp-harness (#7687) fix: return ContextLengthExceeded when prompt exceeds effective KV cache size (#7815) feat: MCP Roots support (#7790) fix(google): use `includeThoughts/part.thought` for thinking handling (#7593) refactor: simplify tokenizer initialization — remove unnecessary Result wrapper (#7744) Fix model selector showing wrong model in tabs (#7784) Stop collecting goosed stderr after startup (#7814) fix: avoid word splitting by space for windows shell commands (#7781) (#7810) Simplify and make it not break on linux (#7813) Add preferred microphone selection (#7805) Remove dependency on posthog-rs (#7811) feat: load hints in nested subdirs (#7772) feat(acp): add read tool and delegate filesystem I/O to ACP clients (#7668) Support secret interpolation in streamable HTTP extension URLs (#7782) More logging for command injection classifier model training (#7779) ...

jh-block approved these changes Mar 11, 2026

View reviewed changes

DOsinga enabled auto-merge March 11, 2026 19:10

DOsinga added this pull request to the merge queue Mar 11, 2026

Merged via the queue into main with commit f462d73 Mar 11, 2026
20 checks passed

DOsinga deleted the fix/local-inference-kv-cache-context-overflow branch March 11, 2026 19:21

github-actions bot mentioned this pull request Mar 16, 2026

chore(release): release version 1.28.0 (minor) #7780

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: return ContextLengthExceeded when prompt exceeds effective KV cache size#7815

fix: return ContextLengthExceeded when prompt exceeds effective KV cache size#7815
DOsinga merged 1 commit intomainfrom
fix/local-inference-kv-cache-context-overflow

DOsinga commented Mar 11, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DOsinga commented Mar 11, 2026

Problem

Fix

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants