gemini prompt caching by pashpashpash · Pull Request #3181 · cline/cline

pashpashpash · 2025-04-29T08:32:00Z

Description

improved pricing calculation for gemini and vertex + more robust caching & cache tracking for gemini & vertex

Cache Creation/Update: The logic correctly identifies when to create a new cache (first message for a task) or update it (subsequent messages with new content). It uses the updateCacheContent function, which creates a new cache with the full content (old + new) due to Gemini API limitations.
Cache Cleanup: The updateCacheContent function now attempts to delete the old cache in the background using setTimeout after successfully creating the new one. This deletion is non-blocking; if it fails (e.g., due to permissions causing a 403 error), the error is logged, but the main operation continues. The old cache will eventually expire based on its TTL, providing a fallback cleanup mechanism. This addresses the previous 403 error issue.
Robustness: The use of the isCacheBusy flag helps prevent race conditions during cache updates. The non-blocking deletion makes the process more resilient to potential API errors during cleanup.
Cost Accounting: The costs are correctly split into immediate (input, output, cache read) and ongoing (cache storage). The calculateCost method returns only immediate costs for per-message display, while getTaskOngoingCosts calculates the storage cost based on the total tokens tracked in taskCacheTokens.

Test Procedure

Type of Change

🐛 Bug fix (non-breaking change which fixes an issue)
✨ New feature (non-breaking change which adds functionality)
💥 Breaking change (fix or feature that would cause existing functionality to not work as expected)
♻️ Refactor Changes
💅 Cosmetic Changes
📚 Documentation update
🏃 Workflow Changes

Pre-flight Checklist

Changes are limited to a single feature, bugfix or chore (split larger changes into separate PRs)
Tests are passing (npm test) and code is formatted and linted (npm run format && npm run lint)
I have created a changeset using npm run changeset (required for user-facing changes)
I have reviewed contributor guidelines

Screenshots

Additional Notes

Important

Enhance Gemini and Vertex API integration with improved caching strategy, accurate cost calculation, and updated UI components.

Caching and Cost Calculation:
- Implement task-based caching strategy in GeminiHandler in gemini.ts, creating a single cache per task and reusing it for subsequent turns.
- Introduce updateCacheContent() to handle cache updates and deletions, ensuring non-blocking operations.
- Calculate immediate and ongoing costs separately, using calculateCost() and getTaskOngoingCosts().
API Integration:
- Add GeminiHandler to VertexHandler in vertex.ts for handling Gemini models.
- Update ApiHandlerOptions in api.ts to include new configuration options for Gemini and Vertex.
UI Enhancements:
- Update TaskHeader.tsx to display cache and cost information.
- Modify ApiOptions.tsx to include new settings for Gemini and Vertex APIs.
Miscellaneous:
- Add node-cache dependency in package.json for caching support.
- Update model pricing and tier information in api.ts to reflect new cost structures.

^{This description was created by}^{for f3c07f3. You can customize this summary. It will automatically update as commits are pushed.}

changeset-bot · 2025-04-29T08:32:04Z

🦋 Changeset detected

Latest commit: 8de8bf1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 1 package

Name	Type
claude-dev	Minor

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

github-actions · 2025-04-29T08:32:15Z

🔗 Linear issue created: https://linear.app/cline-bot/issue/ENG-639/[github]-gemini-prompt-caching-wip

github-actions · 2025-04-29T08:35:36Z

Coverage Report

Extension Coverage

Base branch: 45%

PR branch: 46%

✅ Coverage increased or remained the same

Webview Coverage

Base branch: 11%

PR branch: 11%

✅ Coverage increased or remained the same

Overall Assessment

✅ Test coverage has been maintained or improved

_{Last updated: 2025-05-02T04:42:45.827765}

…rompt-caching

…ered cache prices

ellipsis-dev

Caution

Changes requested ❌

Reviewed everything up to e0db81c in 2 minutes and 46 seconds. Click for details.

Reviewed 1190 lines of code in 7 files
Skipped 1 files when reviewing.
Skipped posting 8 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/api/providers/gemini.ts:96

Draft comment:
Consider using async/await for the cache creation flow instead of the promise chain for improved readability.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% This is a code quality suggestion that could improve readability. However, this is background processing code that intentionally doesn't await completion. The promise chain actually makes the asynchronous, non-blocking nature of this code more explicit. Using async/await might make it less clear that this is intentionally not awaited. The comment assumes async/await is always more readable, but in this case the promise chain better expresses the intent of background processing. The current implementation clearly shows the flow of background operations. While async/await can improve readability in many cases, here the promise chain better communicates the non-blocking nature of the cache creation process. The comment should be deleted as the current promise chain implementation is actually more appropriate for this background processing code.

2. src/utils/cost.ts:14

Draft comment:
Ensure the default prices and tiered adjustments are well documented; returning undefined when inputPrice or outputPrice is missing may lead to silent failures.
Reason this comment was not posted:
Comment did not seem useful. Confidence is useful = 50% <= threshold 50% The comment is suggesting that returning undefined could lead to silent failures, which is a valid concern. However, it also asks to ensure documentation, which is not allowed. I need to determine if the comment is more about the potential issue or the documentation.

3. webview-ui/src/components/settings/ApiOptions.tsx:230

Draft comment:
The createDropdown function is large and tightly coupled; consider extracting it into a separate component to improve readability and maintainability.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

4. webview-ui/src/components/settings/ApiOptions.tsx:1815

Draft comment:
Ensure that callback URLs in getOpenRouterAuthUrl are URL-encoded to avoid issues with special characters in the uriScheme.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

5. src/api/providers/gemini.ts:67

Draft comment:
The caching eligibility check uses a multiplier of 4 on CONTEXT_CACHE_TOKEN_MINIMUM. It isn’t immediately clear why 4 is chosen. Consider adding a brief comment explaining the rationale behind this multiplier.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 20% vs. threshold = 50% While the comment raises a valid point about documentation, this seems like a minor implementation detail. The code already has good comments explaining the high-level purpose of the cache eligibility check. The '4' multiplier appears to be a tuning parameter based on the character-per-token estimation used elsewhere in the code. Adding another comment here would add noise without significant value. I might be undervaluing the importance of documenting magic numbers. Future maintainers might waste time trying to understand why 4 was chosen. The relationship to the 4 characters-per-token estimation is clear from the codebase context, and the exact multiplier value isn't critical - it's just a rough heuristic for when caching might be worthwhile. The comment should be deleted. The multiplier is a tuning parameter whose exact value isn't critical, and its relationship to token estimation is clear from context.

6. src/api/providers/vertex.ts:28

Draft comment:
For model IDs that include 'claude', the implementation is left as a comment. Consider adding a TODO or placeholder to explicitly handle or warn about this incomplete branch.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 0% vs. threshold = 50% The comment appears to be incorrect in its premise. The code shows clear signs of intentional refactoring where Gemini handling is extracted while Claude handling remains. The "..." comment is likely a placeholder in the diff to avoid showing unchanged code. This is not a case of missing implementation, but rather unchanged implementation. I could be wrong about the "..." - maybe it really does mean the implementation is missing. The comment could be pointing out a real issue. The surrounding context strongly suggests this is just a diff simplification - the comments on lines 29-30 explicitly state "Your existing Claude implementation remains unchanged" and "This preserves your current Claude-specific functionality". The comment should be deleted as it incorrectly assumes missing implementation where the code is actually just preserving existing functionality.

7. webview-ui/src/components/settings/ApiOptions.tsx:1944

Draft comment:
The 'Supports browser use' item uses the same 'supportsImages' property as 'Supports Images'. If these represent distinct features, consider introducing a separate flag to avoid confusion.
Reason this comment was not posted:
Comment was not on a location in the diff, so it can't be submitted as a review comment.

8. src/utils/cost.ts:26

Draft comment:
The tiered pricing lookup in calculateApiCostInternal relies on tiers sorted by contextWindow. While the fallback uses the last tier, consider adding a brief comment to clarify that the highest tier should use Infinity as the upper limit for robust matching.
Reason this comment was not posted:
Comment looked like it was already resolved.

Workflow ID: wflow_vhZhppHNlCHqwb5F

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

src/api/providers/gemini.ts

src/api/providers/vertex.ts

ellipsis-dev

Caution

Changes requested ❌

Reviewed 6f1a41f in 2 minutes and 11 seconds. Click for details.

Reviewed 235 lines of code in 1 files
Skipped 0 files when reviewing.
Skipped posting 4 draft comments. View those below.
Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

1. src/api/providers/vertex.ts:45

Draft comment:
Consider adding an explicit type annotation for the 'stream' variable to improve clarity and type safety.
Reason this comment was not posted:
Decided after close inspection that this draft comment was likely wrong and/or not actionable: usefulness confidence = 10% vs. threshold = 50% Adding an explicit 'any' type would actually reduce type safety by opting out of TypeScript's type inference. The stream variable's type should be correctly inferred from the AnthropicVertex SDK's return type. Adding 'any' would hide potential type errors and make the code less safe. Perhaps there's a specific reason why type inference isn't working well here that I'm missing? Maybe the SDK types are incomplete? Even if there are SDK type issues, explicitly typing as 'any' would be worse than letting TypeScript infer what it can. If there are type issues, they should be fixed in the SDK or by using a more specific type. The comment should be deleted as it suggests adding an 'any' type which would reduce type safety rather than improve it.

2. src/api/providers/vertex.ts:74

Draft comment:
The message mapping logic for ephemeral cache control (lines 74–112) is very similar to the one in the default case (lines 133–144). Consider refactoring this into a helper function to reduce duplication.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

3. src/api/providers/vertex.ts:43

Draft comment:
Variable 'budget_tokens' is written in snake_case while most other variables use camelCase. Confirm if this naming is intentional (e.g. matching API expectations) or consider standardizing for consistency.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

4. src/api/providers/vertex.ts:170

Draft comment:
In the 'message_stop' case, only a break from the switch is used. If 'message_stop' signifies the end of the stream, consider exiting the loop to prevent processing any further chunks.
Reason this comment was not posted:
Confidence changes required: 33% <= threshold 50% None

Workflow ID: wflow_lDunSZLwIJ6IXIsq

^{You can customize}^{by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.}

src/api/providers/vertex.ts

src/api/providers/gemini.ts

src/shared/api.ts

src/utils/cost.ts

webview-ui/src/components/settings/ApiOptions.tsx

nischalsrinivas · 2025-05-01T13:19:40Z

This is great improvement, I like how cache information is shown, it will be tremendously useful to see how cache is growing overtime.

Enhanced caching system for GeminiHandler in gemini.ts using node-cache with a default TTL of 15 minutes

But curious if the default TTL can be configured?

arafatkatze

This algorithm is not the best way to go about caching.

joaomj · 2025-05-01T20:39:11Z

Cache information will be shown when using what providers? OpenRouter, Gemini?

arafatkatze · 2025-05-01T21:05:11Z

@pashpashpash Here's the problem and the fix.

At the moment every user turn triggers two separate network calls:

models.generateContentStream – cheap, because you only send the delta.
caches.create – expensive, because you re-upload the entire context.

Because caches.create is executed on every turn, the full prompt is written to the cache over and over again. With the default TTL of 15 minutes this costs roughly $4.50 per million tokens per hour, i.e. $1.125 for every million tokens kept for 15 minutes—a charge that the user never sees in the normal token accounting.

Recommended changes

One cache per task-id
• When the task starts, call caches.create once if the cache doesn't exist
• On subsequent turns, if the cache does exist update that same cache instead of creating a new one when the cache exists

- On subsquent turns if the cache doesn't exist because of TTL limits then just create a new ones.

Stable cache key

Make sure cacheKey is a stable conversation / task identifier (e.g. taskId). If you fall back to Date.now() the key changes every call and the cache is never reused.

Cost visibility

We may never get a perfect breakdown from the API, but we can:
• Count cacheWriteTokens and cacheReadTokens separately.
• Apply the published $ / 1M prices for cache writes & reads.
That at least prevents us from grossly misrepresenting the costs.

The net effect: we create the cache once, keep extending it, avoid dozens of unused caches, and eliminate the hidden “hidden double spend” on every turn.

src/api/providers/gemini.ts

arafatkatze

This PR works but @pashpashpash and I would still like to be vigilant.

WE have tried best to show the ongoing pricing of the cache usage and made the best use of the available cache to save on costs given how much we hit the same parts of cache over and over again this PR makes sense.

This PR has been tested on vertex and gemini for caching.

dcbartlett

I'm going to review this more when I have access to my computer. The changes commented on should happen for now, might have more later.

Co-authored-by: ellipsis-dev[bot] <65095814+ellipsis-dev[bot]@users.noreply.github.com>

src/api/providers/gemini.ts

wip

ef61122

pashpashpash mentioned this pull request Apr 29, 2025

Prompt Caching Issue & High Cost Spikes with Gemini 2.5 Pro Preview (Cline & OpenRouter Providers) #3158

Closed

arafatkatze mentioned this pull request Apr 30, 2025

Context caching (potentially) not happening for Gemini API provider #3199

Closed

Cline Evaluation added 5 commits April 30, 2025 11:11

merge conflicts

e7c06b4

Merge remote-tracking branch 'origin/main' into pashpashpash/gemini-p…

bd780db

…rompt-caching

updated api tiered pricing schema for vertex and gemini to support ti…

64010a4

…ered cache prices

vertex too

96bb769

changeset

e0db81c

pashpashpash marked this pull request as ready for review May 1, 2025 02:12

pashpashpash requested review from Garoth, NightTrek, dcbartlett, ocasta181, saito-sv and saoudrizwan as code owners May 1, 2025 02:12

ellipsis-dev bot reviewed May 1, 2025

View reviewed changes

src/api/providers/gemini.ts Show resolved Hide resolved

saoudrizwan reviewed May 1, 2025

View reviewed changes

src/api/providers/vertex.ts Outdated Show resolved Hide resolved

pashpashpash changed the title ~~gemini prompt caching (wip)~~ gemini prompt caching May 1, 2025

pushing claude implementation

6f1a41f

ellipsis-dev bot reviewed May 1, 2025

View reviewed changes

src/api/providers/vertex.ts Show resolved Hide resolved

src/api/providers/vertex.ts Outdated Show resolved Hide resolved

pashpashpash requested a review from saoudrizwan May 1, 2025 03:16