fix(citation): use position-based insertion for Gemini grounding supports by DeJeune · Pull Request #13646 · CherryHQ/cherry-studio

DeJeune · 2026-03-19T15:26:26Z

What this PR does

Before this PR:
Gemini groundingSupports segments with very short text (e.g. ** for markdown bold markers) caused all matching occurrences in the content to receive citation tags, resulting in dense, broken rendering and significant UI lag.

After this PR:
Citation tags are inserted at the exact position indicated by the segment's startIndex/endIndex, preventing over-matching on short or repeated text patterns.

Fixes #8880

Why we need it and why it was done in this way

The Gemini API returns groundingSupports with a segment object containing both positional data (startIndex/endIndex) and the matched text. The previous implementation used text-based regex replacement, which is unreliable when the segment text is very short or common (e.g. **). Using the position indices directly is more precise and aligns with how the API intends the data to be consumed.

The following tradeoffs were made:

Position-based insertion means we no longer do fuzzy text matching for Gemini citations. This is intentional since the API provides exact positions.

The following alternatives were considered:

Filtering out very short segments — rejected because the positions are still valid and useful.

Breaking changes

None.

Special notes for your reviewer

Only the WEB_SEARCH_SOURCE.GEMINI case in normalizeCitationMarks is changed. All other source types remain unchanged.
A new test case reproduces the exact bug scenario from the issue (short ** text matching all bold markers).

Checklist

PR: The PR description is expressive enough and will help future contributors
Code: Write code that humans can understand and Keep it simple
Refactor: You have left the code cleaner than you found it (Boy Scout Rule)
Upgrade: Impact of this change on upgrade flows was considered and addressed if required
Documentation: A user-guide update was considered and is present (link) or not required. Check this only when the PR introduces or changes a user-facing feature or behavior.
Self-review: I have reviewed my own code (e.g., via /gh-pr-review, gh pr diff, or GitHub UI) before requesting review from others

Release note

Fixed Gemini citation over-matching caused by short text segments (e.g. `**`) in groundingSupports, which previously added citation tags to every bold marker in the response.

…orts Gemini API can return groundingSupports with very short text segments (e.g. "**") which caused all matching occurrences in the content to receive citation tags. Switch from text-based regex replacement to position-based insertion using startIndex/endIndex from the segment metadata to fix over-matching. Fixes #8880 Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>

Gemini API's groundingSupports segment endIndex is a UTF-8 byte offset, not a JS character offset. CJK characters are 3 bytes in UTF-8 but 1 character in JavaScript, causing citations to be inserted at wrong positions for non-ASCII content. Use TextEncoder/TextDecoder to convert byte offsets to character offsets before slicing the content string. Fixes #8880 Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>

cherry-ai-bot

I re-checked the fix and I think the direction is correct.

The old implementation was fundamentally unsafe for Gemini grounding because it treated segment.text as a global replacement key, which explodes on very short or repeated fragments like **. Switching to endIndex-based insertion matches the API contract much better and avoids the pathological “tag every matching token in the whole response” behavior that caused broken rendering and lag.

I also checked the updated tests: besides adapting the existing Gemini cases to positional metadata, the PR now includes a direct regression test for issue #8880 and a CJK/UTF-8 offset test, which are exactly the two places this kind of fix is most likely to go wrong. I did not find a new correctness blocker in the updated logic. The remaining risk is mostly whether Gemini always reports stable offsets in more exotic markdown/content mixes, but that feels like normal follow-up risk rather than a reason to block this patch.

Add a fixture file with real groundingChunks/groundingSupports data from a Gemini 3 Pro response, and snapshot tests that verify: - normalizeCitationMarks inserts [cite:N] at correct positions - withCitationTags produces correct final HTML output - No over-matching occurs (exactly 15 total cite references) Co-Authored-By: Claude Opus 4.6 <[email protected]> Signed-off-by: suyao <[email protected]>

EurFelux

LGTM! The position-based insertion approach is correct and well-tested. Good coverage for the UTF-8 byte offset edge case with CJK characters.

DeJeune requested a review from EurFelux March 19, 2026 15:33

cherry-ai-bot bot approved these changes Mar 19, 2026

View reviewed changes

DeJeune requested a review from alephpiece March 22, 2026 10:38

EurFelux approved these changes Mar 22, 2026

View reviewed changes

DeJeune merged commit 6b7d72e into main Mar 22, 2026
7 checks passed

DeJeune deleted the DeJeune/fix-citation-tags branch March 22, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(citation): use position-based insertion for Gemini grounding supports#13646

fix(citation): use position-based insertion for Gemini grounding supports#13646
DeJeune merged 3 commits intomainfrom
DeJeune/fix-citation-tags

DeJeune commented Mar 19, 2026 •

edited

Loading

Uh oh!

cherry-ai-bot bot left a comment

Uh oh!

EurFelux left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

DeJeune commented Mar 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What this PR does

Why we need it and why it was done in this way

Breaking changes

Special notes for your reviewer

Checklist

Release note

Uh oh!

cherry-ai-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

EurFelux left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

DeJeune commented Mar 19, 2026 •

edited

Loading