-
Notifications
You must be signed in to change notification settings - Fork 715
feat: add super cluster support for metrics #9225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6c96941 to
5070245
Compare
|
Failed to generate code suggestions for PR |
Greptile OverviewGreptile Summaryimplemented metrics super cluster v1 support that enables cross-region query aggregation by having each querier fetch data from both local and remote regions, then computing results in the leader region Key Changes
Issues Found
Confidence Score: 4/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant User
participant LeaderRegion as Leader Region (HTTP Handler)
participant Querier1 as Querier (Region 1)
participant Querier2 as Querier (Region 2)
participant LocalStorage as Local Storage
participant RemoteRegion as Remote Region (gRPC)
User->>LeaderRegion: POST /api/{org}/prometheus/api/v1/query_range
Note over User,LeaderRegion: Query params: query, start, end, step, regions, clusters
LeaderRegion->>LeaderRegion: Parse parameters and check super_cluster enabled
LeaderRegion->>LeaderRegion: Partition time range by querier count
par Parallel Querier Dispatch
LeaderRegion->>Querier1: gRPC Metrics.Query (time partition 1)
LeaderRegion->>Querier2: gRPC Metrics.Query (time partition 2)
end
par Each Querier Processing
Querier1->>LocalStorage: Load local region data
LocalStorage-->>Querier1: Local metrics data
alt Super Cluster Enabled
Querier1->>RemoteRegion: gRPC Metrics.Data (same time range)
RemoteRegion->>RemoteRegion: Query metrics from remote region
RemoteRegion-->>Querier1: Stream metrics data
Querier1->>Querier1: Merge local + remote data
end
Querier1->>Querier1: Execute PromQL computation
Querier1-->>LeaderRegion: Return computed results
Querier2->>LocalStorage: Load local region data
LocalStorage-->>Querier2: Local metrics data
alt Super Cluster Enabled
Querier2->>RemoteRegion: gRPC Metrics.Data (same time range)
RemoteRegion-->>Querier2: Stream metrics data
Querier2->>Querier2: Merge local + remote data
end
Querier2->>Querier2: Execute PromQL computation
Querier2-->>LeaderRegion: Return computed results
end
LeaderRegion->>LeaderRegion: Merge results from all queriers
LeaderRegion->>LeaderRegion: Apply final aggregations
LeaderRegion-->>User: Return final PromQL result
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
18 files reviewed, 2 comments
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR implements super cluster support for metrics queries, enabling a leader region to fetch and aggregate data from multiple regions. The implementation follows a distributed query pattern where the leader region partitions requests by time range, dispatches them to queriers across all regions, and merges the results.
Key changes:
- Added new gRPC streaming service
metrics.datafor cross-region data retrieval - Introduced
QueryContextstruct to encapsulate query execution parameters - Added API parameters:
search_type,regions, andclustersfor region/cluster selection - Migrated from
std::collections::HashSettohashbrown::HashSetacross promql modules
Reviewed changes
Copilot reviewed 21 out of 22 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/service/search/super_cluster/leader.rs | Renamed variables from "nodes" to "clusters" for clarity in super cluster context |
| src/service/promql/search/mod.rs | Refactored cache logic to use boolean use_cache instead of negated cache_disabled; added job ID generation from trace_id |
| src/service/promql/search/grpc/mod.rs | Added new data() function for streaming metrics responses with time-range partitioning |
| src/service/promql/search/grpc/wal.rs | Changed label_selector parameter from Option<HashSet> to HashSet |
| src/service/promql/engine.rs | Integrated super cluster data loading with local cluster data; refactored to use QueryContext; updated all tests |
| src/service/promql/exec.rs | Refactored PromqlContext to use new QueryContext struct for better parameter organization |
| src/service/promql/utils.rs | Simplified apply_label_selector to accept HashSet directly instead of Option<HashSet> |
| src/service/promql/mod.rs | Added new fields to MetricsQueryRequest for super cluster support |
| src/service/alerts/mod.rs | Updated alert evaluation to check for super cluster configuration |
| src/proto/proto/cluster/metrics.proto | Added new gRPC Data streaming method and new fields for super cluster configuration |
| src/proto/src/generated/cluster.rs | Generated code from proto changes |
| src/handler/grpc/request/metrics/querier.rs | Implemented new data() gRPC method for streaming metrics data |
| src/handler/grpc/mod.rs | Updated conversion logic for MetricsQueryRequest |
| src/handler/http/request/promql/mod.rs | Added super cluster detection and new API parameters handling |
| src/config/src/meta/promql/value.rs | Added QueryContext struct for query execution parameters |
| src/config/src/meta/promql/mod.rs | Added custom deserializer for comma-separated or array regions/clusters parameters |
| src/config/src/meta/cluster.rs | Added is_local() method to NodeInfo trait |
| src/config/src/cluster.rs | Refactored to extract get_local_http_addr() and get_local_grpc_addr() helper functions |
| src/common/infra/cluster/nats.rs | Used new helper functions for consistent address generation |
| src/infra/src/table/users.rs | Made user inserts idempotent by treating unique constraint violations as success |
| src/infra/src/table/organizations.rs | Made organization inserts idempotent by treating unique constraint violations as success |
| src/infra/src/table/org_users.rs | Made org_user inserts idempotent by treating unique constraint violations as success |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Metrics super cluster v1
This version we implement a simple solution that leader region fetch data from other regions and only compute the result in leader region.
The mainly logic is:
What we changed
metrics.datathat allow you get metrics data from other region.API changes
Form paramaters
Field descriptions:
querystartendstep15s,1m) or float number of secondstimeout30s,1m)use_cache*use_streaming*search_type*regions,for multiple region, e.g.c1,c2*clusters,for multiple region, e.g.c1,c2