Default system prompt degrades `kimi-k2.5` performance on coding benchmarks

### Description

We are reporting this on behalf of Moonshot AI. In our internal evaluations, we found that the current default system prompt appears to degrade `kimi-k2.5` performance on coding- and reasoning-oriented benchmarks.

### Summary of observed impact

| Benchmark   | With fine-tuned prompt | With default prompt |
| ----------- | ---------------------: | ------------------: |
| Benchmark A |           `58.0 ± 2.4` |        `54.1 ± 3.8` |
| Benchmark B |           `67.1 ± 1.0` |        `60.0 ± 2.4` |

Across both benchmarks, the default prompt is not neutral for Kimi. It appears to reduce both average performance and result stability.

## Why the default prompt may be harmful

Based on prompt inspection, we believe there are at least three concrete issues.

### 1. Overly aggressive brevity constraints

The prompt repeatedly instructs the model to minimize response length, including guidance such as:

* “minimize output tokens as much as possible”
* “should NOT answer with unnecessary preamble or postamble”
* “MUST answer concisely with fewer than 4 lines”
* “One word answers are best”

For a reasoning-oriented coding model, these constraints appear too aggressive. They bias the model toward underspecified or shallow responses and may suppress useful planning, explanation, and intermediate reasoning behavior.

### 2. Misaligned few-shot examples

The few-shot examples in the default prompt are primarily trivial question-answer pairs, such as:

* `2+2`
* `How many golf balls fit inside a jetta?`
* `is 11 a prime number?`

These examples do not resemble the kinds of tasks the model is expected to perform in coding and engineering settings.

### 3. Internally conflicting instructions

The prompt also appears to contain contradictory guidance. For example, it instructs the model to explain what a command does and why it is being run, while also discouraging explanatory text before or after responses.

These competing instructions likely create instability in response style and behavior, which may contribute to the higher variance we observe in benchmark results.

Related issues:

* #10927
* #18799

### Plugins

_No response_

### OpenCode version

1.2.27

### Steps to reproduce

We are unable to provide a public reproduction workflow. The underlying evaluation datasets and benchmark setup are internal only.

### Screenshot and/or share link

_No response_

### Operating System

Ubuntu 22.04

### Terminal

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Default system prompt degrades `kimi-k2.5` performance on coding benchmarks #20258

Description

Summary of observed impact

Why the default prompt may be harmful

1. Overly aggressive brevity constraints

2. Misaligned few-shot examples

3. Internally conflicting instructions

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Benchmark	With fine-tuned prompt	With default prompt
Benchmark A	`58.0 ± 2.4`	`54.1 ± 3.8`
Benchmark B	`67.1 ± 1.0`	`60.0 ± 2.4`

Default system prompt degrades kimi-k2.5 performance on coding benchmarks #20258

Description

Description

Summary of observed impact

Why the default prompt may be harmful

1. Overly aggressive brevity constraints

2. Misaligned few-shot examples

3. Internally conflicting instructions

Plugins

OpenCode version

Steps to reproduce

Screenshot and/or share link

Operating System

Terminal

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Default system prompt degrades `kimi-k2.5` performance on coding benchmarks #20258