Add layer of abstraction to generalize the CacheStat type by vdonato · Pull Request #13265 · streamlit/streamlit

vdonato · 2025-12-08T23:22:10Z

Describe your changes

Currently, the _stcore/metrics endpoint only returns data related to caches, including st.session_state,
@st.cache_data, and @st.cache_resource. In the SiS vNext world, there are various use cases that where
it would be useful to be able to expose more metrics from this endpoint.

We'll be both adding additional metrics and making some changes to the endpoint itself in subsequent
PRs, but as a first step, we'll just add another layer of abstraction to generalize the existing CacheStatsProvider
class.

In this PR, we

add a new Stat protocol with required properties that give us enough information to describe an
OpenMetrics metric family
have the existing CacheStat class implement the new protocol
change the name of CacheStatsProvider to StatsProvider, and edit the signatures of StatsManager
methods to now return a dictionary mapping from metric family name to lists of metrics rather than only
returning lists of CacheStats.
make corresponding downstream changes throughout the codebase so that things play nicely with our
new StatsManager.register_provider() and StatsManager.get_stats() signatures.

We make one relatively small behavioral change to not include metrics comments if we don't have any
metrics to report, but aside from this, we don't make any behavioral changes in this PR.

Testing Plan

Unit Tests
Updated Python unit tests
Any manual testing needed?
curled the metrics endpoint locally to verify that the output looks the same as it did previously.

snyk-io · 2025-12-08T23:22:19Z

✅ Snyk checks have passed. No issues have been found so far.

Status	Scanner	Critical	High	Medium	Low	Total (0)
✅	Open Source Security	0	0	0	0	0 issues
✅	Licenses	0	0	0	0	0 issues

💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse.

github-actions · 2025-12-08T23:22:28Z

✅ PR preview is ready!

Name	Link
📦 Wheel file	https://core-previews.s3-us-west-2.amazonaws.com/pr-13265/streamlit-1.52.1-py3-none-any.whl
📦 `@streamlit/component-v2-lib`	Download from artifacts
🕹️ Preview app	pr-13265.streamlit.app (☁️ Deploy here if not accessible)

lib/streamlit/runtime/stats.py

github-actions · 2025-12-08T23:32:33Z

📉 Frontend coverage change detected

The frontend unit test (vitest) coverage has decreased by 0.0000%

Current PR: 85.9300% (12445 lines, 1751 missed)
Latest develop: 85.9300% (12445 lines, 1751 missed)

✅ Coverage change is within normal range.

📊 View detailed coverage comparison

Copilot

Pull request overview

This PR introduces a layer of abstraction to generalize the metrics system beyond cache-specific statistics. The Stat protocol is added as a general interface for metrics that can be serialized to OpenMetrics format, with CacheStat implementing this protocol. CacheStatsProvider is renamed to StatsProvider, and StatsManager is updated to support multiple metric families by changing its return type from list[CacheStat] to dict[str, Sequence[Stat]].

Key changes:

New Stat protocol defines the interface for metrics with properties for family name, type, unit, and help text
StatsManager.register_provider() now requires a family name parameter, allowing providers to be grouped by metric family
Serialization logic updated to handle multiple metric families and skip empty families

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.

Show a summary per file

File	Description
lib/streamlit/runtime/stats.py	Introduces `Stat` protocol, renames `CacheStatsProvider` to `StatsProvider`, adds `metric_type_string_to_proto()` helper, updates `StatsManager` to support family-based registration
lib/streamlit/runtime/state/session_state.py	Updates `SessionStateStatProvider` to inherit from `StatsProvider`, renames `group_stats` to `group_cache_stats`
lib/streamlit/runtime/runtime.py	Updates all `register_provider()` calls to include `"cache_memory_bytes"` family name parameter
lib/streamlit/runtime/uploaded_file_manager.py	Updates protocol inheritance from `CacheStatsProvider` to `StatsProvider`
lib/streamlit/runtime/memory_uploaded_file_manager.py	Renames `group_stats` to `group_cache_stats`
lib/streamlit/runtime/memory_media_file_storage.py	Updates class inheritance and renames function call
lib/streamlit/runtime/caching/cache_data_api.py	Updates `DataCaches` and helper function to use `StatsProvider` and `group_cache_stats`
lib/streamlit/runtime/caching/cache_resource_api.py	Updates `ResourceCaches` to inherit from `StatsProvider` and renames function call
lib/streamlit/web/server/server.py	Updates `register_provider()` call to include family name parameter
lib/streamlit/web/server/stats_request_handler.py	Updates serialization methods to handle dict of stats by family, with logic to skip empty families
lib/tests/streamlit/runtime/stats_test.py	Updates tests for new API, adds tests for multiple families and protocol implementation
lib/tests/streamlit/web/server/stats_handler_test.py	Updates test expectations to use dict format, changes empty stats behavior expectation
lib/tests/streamlit/runtime/uploaded_file_manager_test.py	Updates comment to reference `StatsProvider`
lib/tests/streamlit/runtime/memory_media_file_storage_test.py	Updates comment to reference `StatsProvider`

lib/streamlit/runtime/stats.py

lib/tests/streamlit/runtime/stats_test.py

lib/streamlit/runtime/caching/cache_data_api.py

lib/streamlit/runtime/stats.py

Copilot · 2025-12-09T01:18:45Z

lib/streamlit/web/server/stats_request_handler.py

    @staticmethod
-    def _stats_to_text(stats: list[CacheStat]) -> str:
-        metric_type = "# TYPE cache_memory_bytes gauge"
-        metric_unit = "# UNIT cache_memory_bytes bytes"
-        metric_help = "# HELP Total memory consumed by a cache."
-        openmetrics_eof = "# EOF\n"
-
-        # Format: header, stats, EOF
-        result = [metric_type, metric_unit, metric_help]
-        result.extend(stat.to_metric_str() for stat in stats)
-        result.append(openmetrics_eof)
-
+    def _stats_to_text(stats_by_family: dict[str, Sequence[Stat]]) -> str:
+        result: list[str] = []
+
+        for stats in stats_by_family.values():
+            if not stats:
+                continue
+
+            # All of the stats in a family will have the same family_name, type,
+            # unit, and help text, so we can just use the first one to construct
+            # our OpenMetrics comments.
+            first_stat = stats[0]
+            result.append(f"# TYPE {first_stat.family_name} {first_stat.type}")
+            result.append(f"# UNIT {first_stat.family_name} {first_stat.unit}")
+            result.append(f"# HELP {first_stat.help}")
+            result.extend(stat.to_metric_str() for stat in stats)
+
+        result.append("# EOF\n")
        return "\n".join(result)


The StatsManager.get_stats() method returns a dictionary where keys are the registration family names, but the actual serialization in _stats_to_text and _stats_to_proto uses first_stat.family_name from the stats themselves. This creates a potential inconsistency: if a provider's stats have a different family_name than the key they're registered under, the output will use the stat's family_name, not the registration key.

Consider either:

Adding validation in register_provider() or get_stats() to ensure all stats from a provider match their registered family name

Using the dictionary key (registration family name) instead of first_stat.family_name when serializing

Documenting this behavior clearly if it's intentional

Currently, the dictionary key from get_stats() is ignored during serialization, which seems like a code smell.

lib/streamlit/runtime/stats.py

lukasmasuch · 2025-12-09T01:21:36Z

@cursor review

cursor

✅ Bugbot reviewed your changes and found no bugs!

lukasmasuch · 2025-12-09T13:16:58Z

lib/streamlit/runtime/runtime.py

+        self._stats_mgr.register_provider("cache_memory_bytes", self._uploaded_file_mgr)
+        self._stats_mgr.register_provider(
+            "cache_memory_bytes", SessionStateStatProvider(self._session_mgr)
+        )


question: The stats from the upload file manager and session manager aren't really cache_memory_bytes or it could be a bit misleading to call it that way since it's just measuring the storage that the session state and memory files take on. Is it already named that way in the current version?

Also, we might want to consider adding the media file storage here, as well as a follow-up

Edit: Looks like this gets registered in server.py. Do you know the reason for this? Maybe it would be good to add a comment here explaining that the media file manager gets registered in a different location (and why)

The names are kept the same as in the current implementation, but we can look into changing them in some of the followups.

I think the only reason why the media file manager gets registered in server.py is because the manager gets created there. It's probably ok to leave it as-is because I think that we'll have more metrics registrations added in various other places as we continue to track additional metrics. I'll add a comment mentioning there may be other registrations that happen outside of runtime.py in a followup PR, though.

lukasmasuch · 2025-12-09T13:18:52Z

lib/streamlit/runtime/stats.py

+    def marshall_metric_proto(self, metric: MetricProto) -> None:
+        """Fill an OpenMetrics `Metric` protobuf object."""
+        pass


question: Do you know why we support protobuf here as well? Is this a common thing to do with OpenMetric endpoints? Do we actually want to support this with the "new" metrics endpoint?

We generally use gRPC to communicate between internal services in the SiS world. I can look into whether it might be worth not doing so if the space savings are totally negligible, but I'd imagine we'd prefer to have the option of doing so for consistency

But is the _stcore/metrics endpoint usable via gRPC in its current state? Isn't it just an HTTP REST endpoint returning protobuf bytes in its current form?

lukasmasuch

LGTM 👍 but check the copilot feedback for valid/relevant aspects.

Two aspects that might be nice to re-validate when redesigning/updating this endpoint (in follow ups):

Do we want/need to support returning protobuf in the endpoint? Is this a common aspect with OpenMetrics, do we want to use this in SiS? I suspect that the space efficiency of protobuf doesn't make a big difference in this case.
The current cache_memory_stats metric name seems a bit incorrect given that we also track media file storage, file upload storage, session state. Should we change to a more generic name, or use more specific names for each type of memory?

github-actions · 2025-12-09T22:19:51Z

Summary

This PR introduces a layer of abstraction to generalize the metrics/stats endpoint infrastructure. The main changes are:

New Stat protocol: A new protocol that defines the interface for stats that can be serialized to OpenMetrics format, with properties for family_name, type, unit, help, and methods for serialization.
Renamed classes and functions:
- CacheStatsProvider → StatsProvider
- group_stats → group_cache_stats
Enhanced StatsManager:
- register_provider() now requires a family_name parameter to group providers by metric family
- get_stats() returns a dictionary mapping family names to sequences of stats
Updated CacheStat: Now implements the Stat protocol with hardcoded values for cache metrics (family_name="cache_memory_bytes", type="gauge", etc.)
Minor behavioral change: When there are no stats, the endpoint now returns only # EOF\n instead of the full header with TYPE, UNIT, and HELP comments.

Code Quality

Strengths

Clean abstraction: The new Stat protocol provides a clear interface for future metric types, enabling extensibility without modifying existing code.
Consistent refactoring: All usages of the renamed classes/functions have been updated throughout the codebase.
Good separation of concerns: The CacheStat class now self-describes its OpenMetrics metadata through protocol properties.
Type safety: Appropriate use of cast() and type annotations throughout.

Minor Observations

Protocol body style (lines 40-66 in lib/streamlit/runtime/stats.py): The Stat protocol uses pass as the method body. While functionally correct, using ... (ellipsis) is more idiomatic for Protocol definitions. This is a minor stylistic point and doesn't affect functionality.
StatsProvider uses @AbstractMethod (line 173 in lib/streamlit/runtime/stats.py): The StatsProvider protocol uses @abstractmethod with raise NotImplementedError. For Protocol classes, @abstractmethod is typically unnecessary. However, this follows the existing code style from the original CacheStatsProvider, so it's consistent.
Type annotation in cache_data_api.py (line 636): The cast cast("Sequence[CacheStat]", self.storage.get_stats()) is needed because StatsProvider.get_stats() returns Sequence[Stat], but the DataCache.get_stats() method returns Sequence[CacheStat]. This is correct but worth noting for future maintenance.

Test Coverage

Unit Tests Updated

✅ lib/tests/streamlit/runtime/stats_test.py:

Updated MockStatsProvider to use new StatsProvider interface
Updated test_get_stats for new dictionary return format
Added test_get_stats_multiple_families to test multiple metric families
Renamed test_group_stats → test_group_cache_stats
Added CacheStatProtocolTest with tests for Stat protocol implementation

✅ lib/tests/streamlit/web/server/stats_handler_test.py:

Updated mock data structure to use dictionary format
Updated test_no_stats to expect only # EOF\n (behavioral change)
Updated test_has_stats and test_protobuf_stats with new data structure

✅ lib/tests/streamlit/runtime/memory_media_file_storage_test.py:

Updated docstring reference from CacheStatsProvider to StatsProvider

✅ lib/tests/streamlit/runtime/uploaded_file_manager_test.py:

Updated docstring reference from CacheStatsProvider to StatsProvider

Test Coverage Assessment

Good coverage: The test_protobuf_stats test indirectly exercises marshall_metric_proto and metric_type_string_to_proto by verifying the complete protobuf output.
Missing direct tests: The metric_type_string_to_proto function isn't directly unit tested for all type mappings (counter, histogram, etc.) and the UNKNOWN fallback case. However, the current usage only requires "gauge", which is covered.
Edge case covered: The test_no_stats test verifies the new behavior when there are no stats.

Backwards Compatibility

Internal API Changes (Low Risk)

The renamed symbols (CacheStatsProvider, group_stats) are internal APIs not exposed in the public st namespace. The risk of breaking external users is minimal because:

These are implementation details in streamlit.runtime.stats
No backward compatibility aliases are provided, but external usage is not expected
All internal usages have been updated

Behavioral Change (Very Low Risk)

The metrics endpoint output format changes when there are no stats:

Before: Returns header comments (# TYPE, # UNIT, # HELP) even with empty stats
After: Returns only # EOF\n when there are no stats

This is a minor improvement and is unlikely to break any metrics consumers since:

The output is still valid OpenMetrics format
Most metrics consumers handle empty metric sets gracefully

Security & Risk

✅ No security concerns identified:

No new endpoints or authentication changes
No changes to input validation or data handling
No exposure of new information in the metrics endpoint

✅ Low regression risk:

Changes are well-scoped to the stats/metrics infrastructure
Existing functionality is preserved with proper test coverage
No changes to protobuf definitions or frontend code

Recommendations

The PR is well-implemented. Minor suggestions for consideration:

Optional: Add direct unit tests for metric_type_string_to_proto - While currently covered indirectly, explicit tests for all type mappings (especially the UNKNOWN fallback) would improve test robustness for future metric types.
Optional: Consider using ... in Protocol bodies - This is more idiomatic Python for Protocol definitions, though pass works correctly.
Documentation note: The PR description mentions this is preparatory work for "SiS vNext". Consider adding a brief code comment in StatsManager or the module docstring explaining the intended extensibility pattern for future contributors.

Verdict

APPROVED: This PR introduces a clean abstraction layer that generalizes the metrics infrastructure without breaking existing functionality. The code is well-structured, properly tested, and ready for merge. The changes set up a solid foundation for adding additional metric types in future PRs.

This is an automated AI review. Please verify the feedback and use your judgment.

sfc-gh-lmasuch

LGTM 👍

) In #13265, we began the process of extending our metrics collection to be more general than just being able to collect cache stats. In this PR, we do a few things to continue the work done in the previous PR: * continue to iterate on the interface to support requesting only specific metrics families. We need to do this because cache stats are naturally slow to compute (we should try to optimize this, but naturally it'll take time to serialize an object to compute its size), and some of the new metrics we'll be adding will be polled for frequently * Add new metrics to track websocket connects, disconnects, reconnects, and the number of active user sessions. * Make some changes to the metrics endpoint itself to allow the caller to request specific metrics families.

vdonato added security-assessment-completed impact:internal PR changes only affect internal code change:feature PR contains new feature or enhancement implementation labels Dec 8, 2025

github-code-quality bot found potential problems Dec 8, 2025

View reviewed changes

Add layer of indirection in front of StatsProvider/StatsManager types

7fa6479

vdonato force-pushed the vdonato/generalize-metrics-endpoint branch from ae7ccff to 7fa6479 Compare December 8, 2025 23:44

vdonato changed the title ~~Add layer of indirection in front of StatsProvider/StatsManager types~~ Add layer of abstraction in front of StatsProvider/StatsManager types Dec 9, 2025

vdonato changed the title ~~Add layer of abstraction in front of StatsProvider/StatsManager types~~ Add layer of abstraction to generalize the CacheStat type Dec 9, 2025

vdonato marked this pull request as ready for review December 9, 2025 01:12

vdonato requested a review from a team as a code owner December 9, 2025 01:12

Copilot AI review requested due to automatic review settings December 9, 2025 01:12

Copilot started reviewing on behalf of vdonato December 9, 2025 01:12 View session

Copilot AI reviewed Dec 9, 2025

View reviewed changes

cursor bot reviewed Dec 9, 2025

View reviewed changes

lukasmasuch reviewed Dec 9, 2025

View reviewed changes

lukasmasuch approved these changes Dec 9, 2025

View reviewed changes

lukasmasuch added the ai-review If applied to PR or issue will run AI review workflow label Dec 9, 2025

github-actions bot removed the ai-review If applied to PR or issue will run AI review workflow label Dec 9, 2025

Address PR feedback

e4aeff2

sfc-gh-lmasuch approved these changes Dec 9, 2025

View reviewed changes

vdonato merged commit c8aca1a into develop Dec 9, 2025
42 checks passed

vdonato deleted the vdonato/generalize-metrics-endpoint branch December 9, 2025 22:53

sfc-gh-vdonato mentioned this pull request Dec 13, 2025

Collect websocket metrics and allow filtering by endpoint family #13324

Merged

Conversation

vdonato commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Describe your changes

Testing Plan

Uh oh!

snyk-io bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Snyk checks have passed. No issues have been found so far.

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ PR preview is ready!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

github-actions bot commented Dec 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

📉 Frontend coverage change detected

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

lukasmasuch commented Dec 9, 2025

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

✅ Bugbot reviewed your changes and found no bugs!

Uh oh!

lukasmasuch Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukasmasuch Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdonato Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukasmasuch Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

vdonato Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

lukasmasuch Dec 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lukasmasuch left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Dec 9, 2025

Summary

Code Quality

Strengths

Minor Observations

Test Coverage

Unit Tests Updated

Test Coverage Assessment

Backwards Compatibility

vdonato commented Dec 8, 2025 •

edited

Loading

snyk-io bot commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

github-actions bot commented Dec 8, 2025 •

edited

Loading

lukasmasuch Dec 9, 2025 •

edited

Loading

lukasmasuch Dec 9, 2025 •

edited

Loading

lukasmasuch Dec 9, 2025 •

edited

Loading

lukasmasuch left a comment •

edited

Loading