Skip to content

fix(llm): kimi-k2.6 rejects temperature=0 — raises 400 on every call #610

@furkankoykiran

Description

@furkankoykiran

Problem

extract_corpus_parallel(files, backend="kimi") always raises openai.BadRequestError: 400 before extracting a single node. The call never reaches the model.

openai.BadRequestError: Error code: 400 - {
  'error': {
    'message': 'invalid temperature: only 1 is allowed for this model',
    'type': 'invalid_request_error'
  }
}

Root cause

_call_openai_compat hardcodes temperature=0:

# graphify/llm.py, _call_openai_compat
resp = client.chat.completions.create(
    model=model,
    ...
    temperature=0,   # <-- always sent, regardless of backend
)

kimi-k2.6 (and kimi-k2.5) enforce a model-level fixed temperature. The official Kimi API docs state:

k2.6/k2.5 will use a fixed value of 1.0 in thinking mode and 0.6 in instant mode. Any other value will result in an error.

This is a model constraint, not an account tier restriction. Every user hitting the Kimi backend gets this error regardless of their plan.

Repro

import os
from pathlib import Path
from graphify.llm import extract_corpus_parallel

os.environ["MOONSHOT_API_KEY"] = "<your-key>"
extract_corpus_parallel([Path("README.md")], backend="kimi")
# → openai.BadRequestError: 400 invalid temperature

Fix

Make temperature backend-configurable in BACKENDS and skip the parameter when None:

BACKENDS = {
    "claude": {
        ...
        "temperature": 0,   # deterministic extraction
    },
    "kimi": {
        ...
        "temperature": None,  # model enforces its own fixed temperature; sending any value raises 400
    },
}

def _call_openai_compat(base_url, api_key, model, user_message, temperature=0):
    create_kwargs = {
        "model": model,
        "messages": [...],
        "max_completion_tokens": 8192,
    }
    if temperature is not None:
        create_kwargs["temperature"] = temperature
    resp = client.chat.completions.create(**create_kwargs)

def extract_files_direct(...):
    ...
    return _call_openai_compat(
        cfg["base_url"], key, mdl, user_msg,
        temperature=cfg.get("temperature", 0)
    )

This keeps temperature=0 for Claude (deterministic), omits the parameter for Kimi (lets the model use its built-in default), and is trivially extensible for future backends.

The same pattern applies to any future OpenAI-compat backend that restricts or ignores the temperature field.

Notes

  • Same root cause affects kimi-k2.5 — both models have the fixed-temperature constraint.
  • The _call_claude path is unaffected (uses the anthropic SDK directly, no temperature kwarg sent).
  • No behavior change for existing Claude users.

Happy to submit a PR with tests if useful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions