Skip to content

feat(metrics): cache write cost accounting in /metrics#57

Merged
Siddhant-K-code merged 2 commits into
mainfrom
feat/52-cache-cost-accounting
May 2, 2026
Merged

feat(metrics): cache write cost accounting in /metrics#57
Siddhant-K-code merged 2 commits into
mainfrom
feat/52-cache-cost-accounting

Conversation

@Siddhant-K-code

Copy link
Copy Markdown
Owner

Closes #52

What

Adds 9 new Prometheus metrics to /metrics covering Anthropic prompt cache token usage and cache boundary state. Two new helper methods let callers record data without touching Prometheus directly.

Changes

pkg/metrics/metrics.go

New metrics:

Metric Type Description
distill_cache_creation_tokens_total{session_id} Counter Tokens written to cache (charged at 1.25× input price)
distill_cache_read_tokens_total{session_id} Counter Tokens read from cache (charged at 0.10× input price)
distill_uncached_input_tokens_total{session_id} Counter Uncached input tokens (charged at 1.00×)
distill_cache_hit_rate Gauge Rolling hit rate: cache_read / (cache_read + cache_creation + input)
distill_cache_write_efficiency Gauge reads / writes ratio — values < 1.0 mean writes that expire before being read
distill_cache_boundary_position_tokens{session_id} Gauge Current boundary position in tokens
distill_cache_boundary_advances_total{session_id} Counter Times boundary moved forward
distill_cache_boundary_retreats_total{session_id} Counter Times boundary retreated (content changed)
distill_cache_estimated_savings_tokens_total{session_id} Counter Estimated tokens saved by caching

New helpers:

  • RecordCacheUsage(UsageRecord) — pass the usage block from any Anthropic API response; updates counters and derived gauges
  • RecordCacheBoundary(sessionID, tokens, advanced, retreated) — called by the session boundary manager after each evaluation

pkg/metrics/metrics_test.go

  • TestRecordCacheUsage: verifies counter increments and hit rate gauge
  • TestRecordCacheUsage_DefaultSessionID: empty session ID falls back to "default"
  • TestRecordCacheBoundary: advances/retreats counters and position gauge
  • TestHandler_CacheMetrics: all new metric names appear in /metrics output

Why

Distill's existing /metrics only covers the dedup pipeline. Without visibility into cache_creation_input_tokens and cache_read_input_tokens, a 40% token reduction looks like a 40% cost reduction — but if cache hit rate is low, the actual saving is much smaller. These metrics close that gap and also validate that the boundary manager (PR #56) is working correctly in production.

Siddhant-K-code and others added 2 commits May 2, 2026 13:05
Add Prometheus metrics for Anthropic prompt cache token usage and
cache boundary state. Callers record API response usage via
RecordCacheUsage; boundary state is recorded via RecordCacheBoundary.

New metrics:
  distill_cache_creation_tokens_total{session_id}   - tokens written (1.25x)
  distill_cache_read_tokens_total{session_id}        - tokens read (0.10x)
  distill_uncached_input_tokens_total{session_id}    - uncached tokens (1.00x)
  distill_cache_hit_rate                             - rolling hit rate gauge
  distill_cache_write_efficiency                     - reads/writes ratio
  distill_cache_boundary_position_tokens{session_id} - current boundary
  distill_cache_boundary_advances_total{session_id}  - boundary advances
  distill_cache_boundary_retreats_total{session_id}  - boundary retreats
  distill_cache_estimated_savings_tokens_total        - estimated savings

New helpers:
  RecordCacheUsage(UsageRecord)              - records Anthropic usage block
  RecordCacheBoundary(sid, tokens, adv, ret) - records boundary evaluation

Co-authored-by: Ona <[email protected]>
@Siddhant-K-code Siddhant-K-code force-pushed the feat/52-cache-cost-accounting branch from 9b66a9c to 39a2be7 Compare May 2, 2026 13:05
@Siddhant-K-code Siddhant-K-code merged commit 255658f into main May 2, 2026
2 checks passed
@Siddhant-K-code Siddhant-K-code deleted the feat/52-cache-cost-accounting branch May 2, 2026 13:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature] Cache write cost accounting — track cache_creation vs cache_read tokens in /metrics

1 participant