[Fix] Make _stcore/metrics OpenMetrics-compliant#14538
[Fix] Make _stcore/metrics OpenMetrics-compliant#14538sfc-gh-mbarnes merged 2 commits intodevelopfrom
_stcore/metrics OpenMetrics-compliant#14538Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
There was a problem hiding this comment.
Pull request overview
This PR updates Streamlit’s /_stcore/metrics OpenMetrics text output to be more spec-compliant by using canonical counter family names (without _total) in metadata while continuing to emit counter samples with the _total suffix, and by tightening comment serialization.
Changes:
- Canonicalize the
session_eventscounter family name (metadata usessession_events; sample remainssession_events_total{...}), and remove the countersample_nameoverride mechanism. - Update text serialization to omit
# UNITlines for unitless metrics and to emit# HELP <family_name> <help>. - Update unit, server, and e2e tests to reflect the family rename and the metadata/sample split behavior.
Reviewed changes
Copilot reviewed 9 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
| File | Description |
|---|---|
| lib/streamlit/runtime/stats.py | Renames SESSION_EVENTS_FAMILY and makes counter samples consistently render as <family>_total. |
| lib/streamlit/runtime/websocket_session_manager.py | Aligns emitted session metrics with canonical counter family naming (no explicit sample name override). |
| lib/streamlit/web/server/stats_request_handler.py | Tightens OpenMetrics text comment rendering (# HELP includes family name; omits empty # UNIT). |
| lib/tests/streamlit/runtime/stats_test.py | Updates CounterStat expectations to match _total sample rendering. |
| lib/tests/streamlit/runtime/websocket_session_manager_test.py | Updates runtime stats assertions for session_events family and _total sample naming. |
| lib/tests/streamlit/web/server/stats_handler_test.py | Updates Tornado metrics endpoint tests for session_events family and # HELP/# UNIT formatting. |
| lib/tests/streamlit/web/server/starlette/starlette_app_test.py | Updates Starlette metrics endpoint tests for the same OpenMetrics output changes. |
| e2e_playwright/web_server_test.py | Updates e2e coverage for metrics filtering with canonical families=session_events. |
| e2e_playwright/websocket_reconnects_test.py | Updates e2e metrics scraping to request families=session_events while parsing session_events_total{...} samples. |
There was a problem hiding this comment.
Summary
This PR makes the /_stcore/metrics endpoint OpenMetrics-compliant by canonicalizing counter metric family names. The key changes:
- Family name fix:
SESSION_EVENTS_FAMILYchanges from"session_events_total"to"session_events"— the_totalsuffix is reserved for counter sample names per the OpenMetrics spec, not the metric family name. sample_namefield removed:CounterStat.sample_nameis eliminated;to_metric_str()now derives the sample name by appending_totalto the family name automatically.# HELPline fix: Now emits# HELP <family_name> <help>instead of# HELP <help>, matching the OpenMetrics specification.# UNITline fix: Empty unit lines are no longer emitted, matching the OpenMetrics specification.- Test updates: All unit and E2E tests are updated for the new family name and serialization format.
This is a follow-up to #14476 and introduces an intentional API-visible breaking change: consumers must now use families=session_events instead of families=session_events_total.
All three expected reviewers (claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high) completed their reviews.
Code Quality
Unanimous agreement across all reviewers. The code is clean, well-structured, and follows existing patterns. The removal of the sample_name field in favor of deriving the sample name from the family name (f"{self.family_name}_total") is a good simplification that eliminates a source of inconsistency where callers had to manually keep the sample name and family name in sync. The text serialization logic in stats_request_handler.py is improved — conditionally emitting # UNIT and including the family name in # HELP both align with the OpenMetrics exposition format.
Test Coverage
Unanimous agreement across all reviewers. Test coverage is thorough and well-updated across all affected layers:
- Unit tests (
stats_test.py): Counter stat serialization tests updated,sample_nametest replaced with_total-suffix convention test. - Unit tests (
websocket_session_manager_test.py): Family name references updated; newto_metric_str()assertion verifies sample name behavior. - Handler tests (
stats_handler_test.py,starlette_app_test.py): Text output assertions verify corrected# TYPE,# HELP, and# UNITformat across both Tornado and Starlette paths. - E2E tests (
web_server_test.py,websocket_reconnects_test.py): Query parameters updated; assertions still verifysession_events_totalappears in output (the sample name) while other families are excluded.
Backwards Compatibility
Minor disagreement between reviewers. All three reviewers identified the same breaking change: families=session_events_total no longer returns session event metrics — consumers must use families=session_events. The emitted sample lines remain session_events_total{...}, so scraping tools matching on sample names are unaffected.
- claude-4.6-opus-high-thinking and gemini-3.1-pro: Accepted this as an intentional, documented breaking change with limited blast radius given the endpoint's recent introduction and
_stcoreprefix. - gpt-5.3-codex-high: Flagged this as a blocking backwards-compatibility issue, recommending a deprecation alias accepting both names.
Resolution: The breaking change is intentional, clearly documented in the PR, and scoped to an internal endpoint (_stcore) introduced very recently (#14476). The change moves toward spec compliance, which benefits long-term compatibility with standard Prometheus/OpenMetrics tooling. A deprecation alias would add complexity for minimal benefit given the limited consumer base. This is non-blocking, but migration guidance in release notes is a reasonable suggestion.
Security & Risk
Unanimous agreement. No security concerns identified. Changes are limited to metric serialization formatting and don't touch authentication, authorization, CORS, WebSocket handling, file serving, or any security-sensitive codepath.
External test recommendation
- Recommend external_test: No
- Triggered categories: None
- Evidence:
- Changes are limited to metric serialization format in
lib/streamlit/runtime/stats.pyandlib/streamlit/web/server/stats_request_handler.py - No routing, auth, WebSocket, embedding, asset serving, CORS, or security header changes
- The
/_stcore/metricsendpoint URL and access pattern remain identical
- Changes are limited to metric serialization format in
- Confidence: High
Reviewer disagreement on this point: gpt-5.3-codex-high recommended external tests targeting routing/URL behavior (medium confidence), while claude-4.6-opus-high-thinking and gemini-3.1-pro found no external test need. The endpoint URL itself is unchanged — only the query parameter semantics for family filtering changed. This does not affect standard Prometheus scraping (which fetches all metrics without family filters), so external testing is not warranted.
Accessibility
No frontend changes are included in this PR. No accessibility considerations apply. All reviewers agreed on this point.
Recommendations
- Consider adding brief migration guidance in release notes for any consumers using
families=session_events_totalquery parameter (all reviewers noted the breaking change).
Verdict
APPROVED: Clean, well-tested OpenMetrics compliance fix that correctly canonicalizes counter family names, fixes HELP/UNIT metadata formatting, and thoroughly updates all tests. The intentional breaking change is acceptable given the endpoint's recent introduction and internal scope. Two of three reviewers approved; the single changes-requested concern (backwards compatibility deprecation window) is non-blocking given context.
Consolidated AI review by claude-4.6-opus-high-thinking from 3 individual reviews (claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high).
This review also includes 1 inline comment(s) on specific code lines.
|
|
||
| CACHE_MEMORY_FAMILY: Final = "cache_memory_bytes" | ||
| SESSION_EVENTS_FAMILY: Final = "session_events_total" | ||
| SESSION_EVENTS_FAMILY: Final = "session_events" |
There was a problem hiding this comment.
thought: Renaming the family key from session_events_total to session_events is an intentional breaking change for families= query consumers. If any internal tooling or dashboards rely on families=session_events_total, they will silently receive empty results. Consider whether a deprecation alias (accepting both names for one release cycle) is warranted, though the recent introduction of this endpoint (#14476) and its _stcore prefix significantly limit blast radius.
Describe your changes
/_stcore/metricsso metadata uses OpenMetrics-compliant family names while emitted counter samples still use the_totalsuffix.# UNITlines and emit# HELP <family_name> <help>.session_eventsfamily rename and the canonical family/sample split.Follow-up to #14476
API-visible change
families=session_events_totalno longer works.families=session_events.session_events_total{...}.Testing Plan