Uncommon characters such as '’' (U+2019) in system prompts can degrade LLM performance

**Describe the bug**

Files such as `crates/goose/src/prompts/plan.md` and 
`crates/goose/src/prompts/system_gpt_4.1.md` contain characters such as '’' (U+2019: RIGHT SINGLE QUOTATION MARK) and others, which are probably quite uncommon in LLM training data, cause uncommon tokenization, and so can negatively influence (however subtly) the quality of LLM output, especially when used in system prompts or instructions for tool use.

**To Reproduce**
Steps to reproduce the behavior:
1. Clone the Goose Git repository
1. In a (Bourne-like) shell with its current directory at the root of the Goose Git repository clone, execute the following command:
   `$ grep --color='always' -r -P -n "[^\x00-\x7F]" .|less -R`
2. Note the lines with uncommon quotation characters (ignore emoji).

**Expected behavior**
Strictly ASCII characters in system prompts and tool instructions, especially when used as word prefixes or suffixes, except when using clearly separated emoji.

**Screenshots**

![Image](https://github.com/user-attachments/assets/d1c46cd8-df0f-4d0d-9ed8-0e5d6e39405c)
OpenAI online tokenizer output for text containing right quotation mark. Notice that the important word `user's` is split into two tokens, which can influence LLM attention.

![Image](https://github.com/user-attachments/assets/cf510760-bd6a-41c3-b8e8-43c495b0dc75)
OpenAI online tokenizer ourput for the same text, but this time containing a common `'` apostrophe. The important word `user's` is tokenized as a single token.

**Please provide following information:**
 - **OS & Arch:** [WSL Ubuntu 24.04.2 LTS x86]
 - **Interface:** [CLI]
 - **Version:** [v1.0.24]
 - **Extensions enabled:** [Developer Tools, Computer Controller, Memory]
 - **Provider & Model:** [Ollama - qwen3:14b-16k-ctx]

**Additional context**
Add any other context about the problem here.

N/A

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uncommon characters such as '’' (U+2019) in system prompts can degrade LLM performance #2812

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uncommon characters such as '’' (U+2019) in system prompts can degrade LLM performance #2812

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions