-
Notifications
You must be signed in to change notification settings - Fork 715
feat: sampling for search apis and patterns #9207
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
12b2d21 to
f7e9c15
Compare
|
Failed to generate code suggestions for PR |
7a90e21 to
c7db38a
Compare
225441c to
f8d6a8a
Compare
f8d6a8a to
fea0d2c
Compare
|
@oasisk can you add description for the API change, how to use it. |
692c396 to
00f00e0
Compare
b04ddb9 to
99ba204
Compare
99ba204 to
3078154
Compare
|
Failed to generate code suggestions for PR |
Greptile OverviewGreptile SummaryThis PR adds sampling support for search APIs and pattern extraction, implementing both a simplified user-facing API ( Key ChangesBackend (Rust):
Frontend (Vue):
ArchitectureThe implementation follows a clean separation: users specify Confidence Score: 3/5
Important Files ChangedFile Analysis
Sequence DiagramsequenceDiagram
participant UI as Frontend UI
participant API as Pattern Extract API
participant Stream as Search Streaming
participant SQL as SQL Parser
participant Plan as Query Planner
participant Exec as Query Executor
participant File as File Storage
UI->>API: POST /patterns/extract<br/>(size=-1, sampling_ratio unset)
API->>API: Apply default sampling_ratio<br/>from O2 config if not provided
API->>Stream: process_search_stream_request()<br/>(extract_patterns=true)
Stream->>SQL: Sql::new_with_options()<br/>(extract_patterns flag)
alt Enterprise Feature Enabled
SQL->>SQL: parse_sampling_config()<br/>Convert ratio to SamplingConfig
Note over SQL: Defaults:<br/>- mode: "system" (row group)<br/>- strategy: "stratified"<br/>- strata: auto-calculated
else Community Edition
SQL->>SQL: Log warning about<br/>enterprise feature
end
SQL->>Plan: Build execution plan<br/>(includes sampling_config)
Plan->>Plan: Propagate sampling_config<br/>to RemoteScanNodes
Plan->>Exec: Execute distributed query<br/>(sampling in SearchInfo)
alt Sampling Enabled
Exec->>File: Apply sampling to file_list<br/>(stratified temporal sampling)
Note over File: Row-group-level sampling:<br/>BitVec size == row_group_count
File->>Exec: Sampled file access plans
else No Sampling
Exec->>File: Access all files
File->>Exec: Full access plans
end
Exec->>Exec: Read parquet files<br/>(with ParquetAccessPlan)
Note over Exec: WAL data NEVER sampled<br/>(always full)
Exec->>Stream: Return search results<br/>(scan_records count)
Stream->>Stream: Accumulate results<br/>Track total_scan_records
Stream->>Stream: Extract patterns<br/>from accumulated logs
Stream->>API: PatternExtractionResult
API->>UI: Return patterns + statistics
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
27 files reviewed, 2 comments
8439095 to
ac6deb0
Compare
Auto-generated translation updates from English source file. 🤖 Generated with automated translation workflow
9c4ad0c to
9d9c017
Compare
No description provided.