Skip to content

[Fix] OpenMetrics compliance for session_duration metric#14476

Merged
mayagbarnes merged 1 commit intodevelopfrom
fix-14432
Mar 26, 2026
Merged

[Fix] OpenMetrics compliance for session_duration metric#14476
mayagbarnes merged 1 commit intodevelopfrom
fix-14432

Conversation

@mayagbarnes
Copy link
Copy Markdown
Collaborator

@mayagbarnes mayagbarnes commented Mar 23, 2026

Describe your changes

Fixes the Prometheus scraping error:

Error scraping target: unit "seconds" not a suffix of metrics "session_duration_seconds_total"

The OpenMetrics spec requires the UNIT value to be a suffix of the MetricFamily name. The session duration counter was using session_duration_seconds_total as both the family name and the sample name, so the declared unit seconds wasn't a valid suffix (it's buried before _total).

What changed

  • Renamed the SESSION_DURATION_FAMILY constant from "session_duration_seconds_total" to "session_duration_seconds".
  • Added an optional sample_name field to CounterStat so counter samples can use a name distinct from the family metadata name. When unset, behavior is unchanged.
  • The session duration producer now sets sample_name="session_duration_seconds_total" so the emitted sample line keeps the required _total suffix while the family metadata uses the canonical name.
  • No changes to the serializers — they already separate metadata (family_name) from sample rendering (to_metric_str()).

Scope notes

  • Only the broken unitful counter (session_duration) is renamed. session_events_total is technically non-canonical per OpenMetrics but does not cause a Prometheus failure, so it is left unchanged to limit the API-visible break.
  • The pre-existing issue of emitting # UNIT lines for unitless metrics is tracked separately.

GitHub Issue Link (if applicable)

Closes #14432

Testing Plan

  • Unit Tests (JS and/or Python): ✅ Added/Updated
  • Manual testing: ✅

@mayagbarnes mayagbarnes added change:bugfix PR contains bug fix implementation impact:users PR changes affect end users labels Mar 23, 2026
@snyk-io
Copy link
Copy Markdown
Contributor

snyk-io bot commented Mar 23, 2026

Snyk checks have passed. No issues have been found so far.

Status Scan Engine Critical High Medium Low Total (0)
Open Source Security 0 0 0 0 0 issues
Licenses 0 0 0 0 0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 23, 2026

✅ PR preview is ready!

Name Link
📦 Wheel file https://core-previews.s3-us-west-2.amazonaws.com/pr-14476/streamlit-1.55.0-py3-none-any.whl
📦 @streamlit/component-v2-lib Download from artifacts
🕹️ Preview app pr-14476.streamlit.app (☁️ Deploy here if not accessible)

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes OpenMetrics unit compliance for the session_duration counter by separating the metric family name (used for # TYPE/# UNIT metadata and protobuf MetricFamily.name) from the emitted counter sample name (which must keep the _total suffix).

Changes:

  • Rename SESSION_DURATION_FAMILY from session_duration_seconds_totalsession_duration_seconds to make the declared unit (seconds) a valid suffix of the family name.
  • Extend CounterStat with an optional sample_name override so emitted samples can differ from family metadata while preserving existing behavior when unset.
  • Update runtime producer and tests to emit session_duration_seconds_total samples under the session_duration_seconds family for both text and protobuf outputs.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated no comments.

Show a summary per file
File Description
lib/streamlit/runtime/stats.py Renames SESSION_DURATION_FAMILY and adds CounterStat.sample_name to decouple family metadata name from emitted sample name.
lib/streamlit/runtime/websocket_session_manager.py Emits the session duration counter with canonical family name + _total sample name override.
lib/tests/streamlit/runtime/stats_test.py Adds coverage for CounterStat.sample_name and verifies default remains None.
lib/tests/streamlit/runtime/websocket_session_manager_test.py Updates assertions to the new family name and validates the _total sample name is still emitted.
lib/tests/streamlit/web/server/stats_handler_test.py Adds/updates endpoint tests to verify canonical family metadata vs _total sample name in both text and protobuf formats.
lib/tests/streamlit/web/server/starlette/starlette_app_test.py Mirrors the same assertions for the Starlette server implementation (text + protobuf).

@mayagbarnes mayagbarnes added the ai-review If applied to PR or issue will run AI review workflow label Mar 24, 2026
@github-actions github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Mar 24, 2026
Copy link
Copy Markdown
Contributor

@github-actions github-actions bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary

This PR fixes an OpenMetrics compliance error where Prometheus scrapers reject the session_duration_seconds_total metric with: unit "seconds" not a suffix of metrics "session_duration_seconds_total". Per the OpenMetrics spec, the declared UNIT must be a suffix of the MetricFamily name.

The fix renames SESSION_DURATION_FAMILY from "session_duration_seconds_total" to "session_duration_seconds" and introduces an optional sample_name field on CounterStat so the emitted sample line can still use the required _total suffix while the family metadata uses the canonical (unit-suffixed) name.

Reviewer consensus: All three reviewers (claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high) approved the PR unanimously. There were no disagreements on the approach, correctness, or scope of the change.

Code Quality

All reviewers agree the implementation is clean, minimal, and well-structured:

  • Adding sample_name as an optional field on a NamedTuple with a None default preserves full backward compatibility. All existing call sites use keyword arguments, so the new field at the end is safe.
  • to_metric_str() cleanly falls back to family_name when sample_name is None, preserving existing behavior for all other counters.
  • marshall_metric_proto() correctly does not use sample_name — in protobuf format, the family name already identifies the metric family with no separate "sample name" concept.
  • The scope is well-contained: only the broken unitful counter is renamed, and the PR description explains why session_events_total is intentionally left unchanged.

Test Coverage

All reviewers rated test coverage as excellent/strong. The PR adds and updates tests across all layers:

  • stats_test.py: New unit tests for CounterStat.to_metric_str() with sample_name (with and without labels), plus assertion that sample_name is None by default.
  • websocket_session_manager_test.py: All references to the old family name updated; new assertion verifying sample_name is correctly set.
  • stats_handler_test.py: New tests for unitful counter protobuf output and text-format output verifying canonical # TYPE, # UNIT, and sample lines.
  • starlette_app_test.py: New protobuf test and text-format assertions added for the metrics endpoint.

No e2e tests were added, but all reviewers agree the existing unit/integration coverage is sufficient for this backend-only metrics formatting change.

Backwards Compatibility

All reviewers agree the change is backward compatible for normal usage. The observable changes are:

Aspect Before After
# TYPE line session_duration_seconds_total session_duration_seconds
# UNIT line session_duration_seconds_total seconds session_duration_seconds seconds
Sample line session_duration_seconds_total <value> session_duration_seconds_total <value> (unchanged)
Protobuf family name session_duration_seconds_total session_duration_seconds

The sample line is unchanged, so Prometheus scraping rules matching on session_duration_seconds_total continue to work. One reviewer (gpt-5.3-codex-high) noted a minor compatibility consideration: consumers that filter by metric family name (e.g., via families=session_duration_seconds_total) would need to update to families=session_duration_seconds. This is a valid but low-impact observation since the old output was non-compliant and caused Prometheus scraping errors. The SESSION_DURATION_FAMILY constant is in an internal module and not part of the public st.* API.

Security & Risk

All reviewers agree there are no security concerns. Changes are limited to internal metrics reporting plumbing — no new endpoints, no authentication changes, no external dependencies, and no user-facing input handling.

External test recommendation

  • Recommend external_test: No
  • Triggered categories: None
  • Evidence:
    • Changes are limited to internal metrics data model (stats.py) and its consumer (websocket_session_manager.py)
    • No routing, auth, websocket handshake, embedding, asset serving, CORS, or security header changes
    • The /_stcore/metrics endpoint behavior change is purely in metric name/format compliance
  • Suggested external_test focus areas: N/A
  • Confidence: High (unanimous across all three reviewers)
  • Assumptions and gaps: None — the changes are clearly scoped to internal metrics infrastructure with no impact on externally hosted or embedded behavior.

Accessibility

No frontend changes; accessibility is not applicable. All reviewers concur.

Recommendations

  1. Consider calling out the metric-family rename (session_duration_seconds_totalsession_duration_seconds) in release notes for operators using families= filters or protobuf consumers keyed by family name. (Raised by gpt-5.3-codex-high; agreed as good practice by consolidation review.)

Verdict

APPROVED: All three reviewers unanimously approved. The fix is well-scoped, correctly addresses the OpenMetrics unit-suffix compliance issue, has thorough test coverage across all serialization layers, and carries no backward compatibility concerns for normal usage.


This is a consolidated AI review by claude-4.6-opus-high-thinking, synthesizing reviews from: claude-4.6-opus-high-thinking, gemini-3.1-pro, gpt-5.3-codex-high. All expected models completed their reviews successfully.

@mayagbarnes mayagbarnes marked this pull request as ready for review March 24, 2026 04:26
@sfc-gh-lwilby
Copy link
Copy Markdown
Collaborator

Only the broken unitful counter (session_duration) is renamed. session_events_total is technically non-canonical per OpenMetrics but does not cause a Prometheus failure, so it is left unchanged to limit the API-visible break.

I'm wondering if we might as well just update this at the same time to keep the naming compliant with the spec? Non-blocking though.

@mayagbarnes
Copy link
Copy Markdown
Collaborator Author

Only the broken unitful counter (session_duration) is renamed. session_events_total is technically non-canonical per OpenMetrics but does not cause a Prometheus failure, so it is left unchanged to limit the API-visible break.

I'm wondering if we might as well just update this at the same time to keep the naming compliant with the spec? Non-blocking though.

I was also struggling with whether to just get this fully compliant 😅 so opting to compromise and do it as a separate PR

@mayagbarnes mayagbarnes merged commit c6f7e5e into develop Mar 26, 2026
84 checks passed
@mayagbarnes mayagbarnes deleted the fix-14432 branch March 26, 2026 18:48
sfc-gh-mbarnes pushed a commit that referenced this pull request Mar 27, 2026
- Canonicalize counter metric families in `/_stcore/metrics` so metadata
uses OpenMetrics-compliant family names while emitted counter samples
still use the `_total` suffix.
- Tighten text serialization to omit empty `# UNIT` lines and emit `#
HELP <family_name> <help>`.
- Update runtime, server, and e2e metrics tests for the `session_events`
family rename and the canonical family/sample split.

Follow-up to #14476 

### API-visible change
- `families=session_events_total` no longer works.
- Consumers should now request `families=session_events`.
- The emitted sample line remains `session_events_total{...}`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:bugfix PR contains bug fix implementation impact:users PR changes affect end users

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Error from prometheus when scraping /_stcore/metrics

3 participants