Add layer of abstraction to generalize the CacheStat type#13265
Add layer of abstraction to generalize the CacheStat type#13265
Conversation
✅ Snyk checks have passed. No issues have been found so far.
💻 Catch issues earlier using the plugins for VS Code, JetBrains IDEs, Visual Studio, and Eclipse. |
✅ PR preview is ready!
|
📉 Frontend coverage change detectedThe frontend unit test (vitest) coverage has decreased by 0.0000%
✅ Coverage change is within normal range. |
ae7ccff to
7fa6479
Compare
There was a problem hiding this comment.
Pull request overview
This PR introduces a layer of abstraction to generalize the metrics system beyond cache-specific statistics. The Stat protocol is added as a general interface for metrics that can be serialized to OpenMetrics format, with CacheStat implementing this protocol. CacheStatsProvider is renamed to StatsProvider, and StatsManager is updated to support multiple metric families by changing its return type from list[CacheStat] to dict[str, Sequence[Stat]].
Key changes:
- New
Statprotocol defines the interface for metrics with properties for family name, type, unit, and help text StatsManager.register_provider()now requires a family name parameter, allowing providers to be grouped by metric family- Serialization logic updated to handle multiple metric families and skip empty families
Reviewed changes
Copilot reviewed 14 out of 14 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| lib/streamlit/runtime/stats.py | Introduces Stat protocol, renames CacheStatsProvider to StatsProvider, adds metric_type_string_to_proto() helper, updates StatsManager to support family-based registration |
| lib/streamlit/runtime/state/session_state.py | Updates SessionStateStatProvider to inherit from StatsProvider, renames group_stats to group_cache_stats |
| lib/streamlit/runtime/runtime.py | Updates all register_provider() calls to include "cache_memory_bytes" family name parameter |
| lib/streamlit/runtime/uploaded_file_manager.py | Updates protocol inheritance from CacheStatsProvider to StatsProvider |
| lib/streamlit/runtime/memory_uploaded_file_manager.py | Renames group_stats to group_cache_stats |
| lib/streamlit/runtime/memory_media_file_storage.py | Updates class inheritance and renames function call |
| lib/streamlit/runtime/caching/cache_data_api.py | Updates DataCaches and helper function to use StatsProvider and group_cache_stats |
| lib/streamlit/runtime/caching/cache_resource_api.py | Updates ResourceCaches to inherit from StatsProvider and renames function call |
| lib/streamlit/web/server/server.py | Updates register_provider() call to include family name parameter |
| lib/streamlit/web/server/stats_request_handler.py | Updates serialization methods to handle dict of stats by family, with logic to skip empty families |
| lib/tests/streamlit/runtime/stats_test.py | Updates tests for new API, adds tests for multiple families and protocol implementation |
| lib/tests/streamlit/web/server/stats_handler_test.py | Updates test expectations to use dict format, changes empty stats behavior expectation |
| lib/tests/streamlit/runtime/uploaded_file_manager_test.py | Updates comment to reference StatsProvider |
| lib/tests/streamlit/runtime/memory_media_file_storage_test.py | Updates comment to reference StatsProvider |
| @staticmethod | ||
| def _stats_to_text(stats: list[CacheStat]) -> str: | ||
| metric_type = "# TYPE cache_memory_bytes gauge" | ||
| metric_unit = "# UNIT cache_memory_bytes bytes" | ||
| metric_help = "# HELP Total memory consumed by a cache." | ||
| openmetrics_eof = "# EOF\n" | ||
|
|
||
| # Format: header, stats, EOF | ||
| result = [metric_type, metric_unit, metric_help] | ||
| result.extend(stat.to_metric_str() for stat in stats) | ||
| result.append(openmetrics_eof) | ||
|
|
||
| def _stats_to_text(stats_by_family: dict[str, Sequence[Stat]]) -> str: | ||
| result: list[str] = [] | ||
|
|
||
| for stats in stats_by_family.values(): | ||
| if not stats: | ||
| continue | ||
|
|
||
| # All of the stats in a family will have the same family_name, type, | ||
| # unit, and help text, so we can just use the first one to construct | ||
| # our OpenMetrics comments. | ||
| first_stat = stats[0] | ||
| result.append(f"# TYPE {first_stat.family_name} {first_stat.type}") | ||
| result.append(f"# UNIT {first_stat.family_name} {first_stat.unit}") | ||
| result.append(f"# HELP {first_stat.help}") | ||
| result.extend(stat.to_metric_str() for stat in stats) | ||
|
|
||
| result.append("# EOF\n") | ||
| return "\n".join(result) |
There was a problem hiding this comment.
The StatsManager.get_stats() method returns a dictionary where keys are the registration family names, but the actual serialization in _stats_to_text and _stats_to_proto uses first_stat.family_name from the stats themselves. This creates a potential inconsistency: if a provider's stats have a different family_name than the key they're registered under, the output will use the stat's family_name, not the registration key.
Consider either:
- Adding validation in
register_provider()orget_stats()to ensure all stats from a provider match their registered family name - Using the dictionary key (registration family name) instead of
first_stat.family_namewhen serializing - Documenting this behavior clearly if it's intentional
Currently, the dictionary key from get_stats() is ignored during serialization, which seems like a code smell.
|
@cursor review |
| self._stats_mgr.register_provider("cache_memory_bytes", self._uploaded_file_mgr) | ||
| self._stats_mgr.register_provider( | ||
| "cache_memory_bytes", SessionStateStatProvider(self._session_mgr) | ||
| ) |
There was a problem hiding this comment.
question: The stats from the upload file manager and session manager aren't really cache_memory_bytes or it could be a bit misleading to call it that way since it's just measuring the storage that the session state and memory files take on. Is it already named that way in the current version?
There was a problem hiding this comment.
Also, we might want to consider adding the media file storage here, as well as a follow-up
Edit: Looks like this gets registered in server.py. Do you know the reason for this? Maybe it would be good to add a comment here explaining that the media file manager gets registered in a different location (and why)
There was a problem hiding this comment.
The names are kept the same as in the current implementation, but we can look into changing them in some of the followups.
I think the only reason why the media file manager gets registered in server.py is because the manager gets created there. It's probably ok to leave it as-is because I think that we'll have more metrics registrations added in various other places as we continue to track additional metrics. I'll add a comment mentioning there may be other registrations that happen outside of runtime.py in a followup PR, though.
| def marshall_metric_proto(self, metric: MetricProto) -> None: | ||
| """Fill an OpenMetrics `Metric` protobuf object.""" | ||
| pass |
There was a problem hiding this comment.
question: Do you know why we support protobuf here as well? Is this a common thing to do with OpenMetric endpoints? Do we actually want to support this with the "new" metrics endpoint?
There was a problem hiding this comment.
We generally use gRPC to communicate between internal services in the SiS world. I can look into whether it might be worth not doing so if the space savings are totally negligible, but I'd imagine we'd prefer to have the option of doing so for consistency
There was a problem hiding this comment.
But is the _stcore/metrics endpoint usable via gRPC in its current state? Isn't it just an HTTP REST endpoint returning protobuf bytes in its current form?
There was a problem hiding this comment.
LGTM 👍 but check the copilot feedback for valid/relevant aspects.
Two aspects that might be nice to re-validate when redesigning/updating this endpoint (in follow ups):
- Do we want/need to support returning protobuf in the endpoint? Is this a common aspect with OpenMetrics, do we want to use this in SiS? I suspect that the space efficiency of protobuf doesn't make a big difference in this case.
- The current
cache_memory_statsmetric name seems a bit incorrect given that we also track media file storage, file upload storage, session state. Should we change to a more generic name, or use more specific names for each type of memory?
SummaryThis PR introduces a layer of abstraction to generalize the metrics/stats endpoint infrastructure. The main changes are:
Code QualityStrengths
Minor Observations
Test CoverageUnit Tests Updated✅
✅
✅
✅
Test Coverage Assessment
Backwards CompatibilityInternal API Changes (Low Risk)The renamed symbols (
Behavioral Change (Very Low Risk)The metrics endpoint output format changes when there are no stats:
This is a minor improvement and is unlikely to break any metrics consumers since:
Security & Risk✅ No security concerns identified:
✅ Low regression risk:
RecommendationsThe PR is well-implemented. Minor suggestions for consideration:
VerdictAPPROVED: This PR introduces a clean abstraction layer that generalizes the metrics infrastructure without breaking existing functionality. The code is well-structured, properly tested, and ready for merge. The changes set up a solid foundation for adding additional metric types in future PRs. This is an automated AI review. Please verify the feedback and use your judgment. |
) In #13265, we began the process of extending our metrics collection to be more general than just being able to collect cache stats. In this PR, we do a few things to continue the work done in the previous PR: * continue to iterate on the interface to support requesting only specific metrics families. We need to do this because cache stats are naturally slow to compute (we should try to optimize this, but naturally it'll take time to serialize an object to compute its size), and some of the new metrics we'll be adding will be polled for frequently * Add new metrics to track websocket connects, disconnects, reconnects, and the number of active user sessions. * Make some changes to the metrics endpoint itself to allow the caller to request specific metrics families.
Describe your changes
Currently, the
_stcore/metricsendpoint only returns data related to caches, includingst.session_state,@st.cache_data, and@st.cache_resource. In the SiS vNext world, there are various use cases that whereit would be useful to be able to expose more metrics from this endpoint.
We'll be both adding additional metrics and making some changes to the endpoint itself in subsequent
PRs, but as a first step, we'll just add another layer of abstraction to generalize the existing
CacheStatsProviderclass.
In this PR, we
Statprotocol with required properties that give us enough information to describe anOpenMetrics metric family
CacheStatclass implement the new protocolCacheStatsProvidertoStatsProvider, and edit the signatures ofStatsManagermethods to now return a dictionary mapping from metric family name to lists of metrics rather than only
returning lists of
CacheStats.new
StatsManager.register_provider()andStatsManager.get_stats()signatures.We make one relatively small behavioral change to not include metrics comments if we don't have any
metrics to report, but aside from this, we don't make any behavioral changes in this PR.
Testing Plan
Updated Python unit tests
curled the metrics endpoint locally to verify that the output looks the same as it did previously.