feat: support user level search work group #3790

hengfeiyang · 2024-06-22T16:33:56Z

Full text search default fields should can be always searched
Return error if the query wait in queue over TIMEOUT
Support user level search work group limit
Support cancel multiple search trace_id

New ENV

O2_SEARCH_GROUP_USER_LOG_MAX_CONCURRENCY = 1
O2_SEARCH_GROUP_USER_SHORT_MAX_CONCURRENCY = 2

Summary by CodeRabbit

New Features
- Added support for canceling multiple queries in the search functionality.
- Introduced a new user concurrency management feature for search requests.
Improvements
- Enhanced token extraction logic for better handling of access_token cases.
- Improved the handling and processing of full-text search fields in SQL queries.
Bug Fixes
- Fixed potential issues by removing unnecessary .unwrap() calls in several functions.
- Corrected the handling of JSON format in authorization string processing.
Refactor
- Refactored various functions to improve control flow and robustness.
- Updated log messages for better clarity.

coderabbitai · 2024-06-22T16:34:07Z

Walkthrough

The recent code revisions focus on improving token management, enhancing search functionalities, and refining concurrency controls in a multi-user context. Key changes include explicit token handling, support for query cancellation, addition of user identification in search requests, and removing potential panics with .unwrap() calls to increase robustness. These amendments collectively aim to make the system more reliable and scalable, particularly in handling complex search operations across different user scenarios.

Changes

File / Group	Summary
`src/common/utils/auth.rs`	Refactored the `extract_auth_str` function to explicitly handle different `access_token` cases, improving token extraction and control flow.
`src/config/src/meta/search.rs`	Added `user_id: None` field in `SearchRequest` initialization.
`src/handler/http/auth/validator.rs`	Modified `get_user_email_from_auth_str` to change parameter passing method for `from_str` function.
`src/handler/http/request/search/job.rs`	Introduced a `put` method import, added functions for handling multiple query cancellations, and refactored query cancellation logic.
`src/handler/http/request/search/mod.rs`	Updated log message to specify "local" queue.
`src/handler/http/router/mod.rs`	Added `cancel_multiple_query` service in `get_service_routes` function.
`src/infra/src/schema/mod.rs`	Added constants to imports, modified functions to return `Vec<String>` instead of `Result<Vec<String>>`, updated default field handling logic.
`src/ingester/src/partition.rs`	Removed `.unwrap()` calls after streaming settings functions to prevent potential panics.
`src/proto/proto/cluster/search.proto`	Added a new optional `user_id` field in the `SearchRequest` message.
`src/service/compact/merge.rs`	Removed `.unwrap()` calls in `merge_files` function to handle streaming settings, ensuring safer execution.
`src/service/search/cluster/mod.rs`	Enhanced `search` function to manage timeouts, user concurrency, and work group processing for search requests, adding user concurrency checks and a default timeout.
`src/service/search/mod.rs`	Added assignment for `req.user_id` in `search` function to include user identification.
`src/service/search/sql.rs`	Refactored handling of full-text search fields, simplifying logic and removing direct `SQL_FULL_TEXT_SEARCH_FIELDS` usage in query generation.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 9

Outside diff range and nitpick comments (2)

src/service/search/cluster/mod.rs (1)
Line range hint 655-694: Review lock release and final merge task handling.

The changes properly handle the final merge task and ensure the lock is released. However, consider refactoring to avoid duplication in the lock release logic.
- if user_id.is_some() {
-     work_group.as_ref().unwrap().done(trace_id, user_id).await?;
- }
- work_group.as_ref().unwrap().done(trace_id, None).await?;
+ work_group.as_ref().unwrap().finalize_search(trace_id, user_id).await?;
src/service/search/sql.rs (1)

457-457: Handling of ignore_case logic for full text search.

The handling of ignore_case is done based on the configuration of inverted_index_enabled and whether any fulltext terms are present. This logic seems correct, but adding a comment explaining why this decision is made would improve code readability.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between da9041a and 5785816.

Files selected for processing (13)

src/common/utils/auth.rs (1 hunks)
src/config/src/meta/search.rs (1 hunks)
src/handler/http/auth/validator.rs (1 hunks)
src/handler/http/request/search/job.rs (3 hunks)
src/handler/http/request/search/mod.rs (1 hunks)
src/handler/http/router/mod.rs (1 hunks)
src/infra/src/schema/mod.rs (3 hunks)
src/ingester/src/partition.rs (1 hunks)
src/proto/proto/cluster/search.proto (1 hunks)
src/service/compact/merge.rs (1 hunks)
src/service/search/cluster/mod.rs (6 hunks)
src/service/search/mod.rs (1 hunks)
src/service/search/sql.rs (7 hunks)

Files skipped from review due to trivial changes (2)

src/config/src/meta/search.rs
src/handler/http/request/search/mod.rs

Additional context used

Path-based instructions (10)

src/handler/http/request/search/job.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/ingester/src/partition.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/handler/http/router/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/common/utils/auth.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/infra/src/schema/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/handler/http/auth/validator.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/compact/merge.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Gitleaks

src/common/utils/auth.rs

577-577: Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches. (hashicorp-tf-password)

Additional comments not posted (11)

src/proto/proto/cluster/search.proto (1)

47-47: Addition of user_id field to SearchRequest is a positive change.

This change supports user-level search work group limits, aligning with the PR's objectives to enhance user-specific functionalities.
src/handler/http/request/search/job.rs (3)
18-18: Importing put HTTP method is appropriate.

This is necessary for handling the new put route for canceling multiple queries.

39-48: Ensure proper handling and error messages in cancel_multiple_query.

The function correctly parses and handles multiple trace IDs for cancellation. However, consider adding more specific error messages for different types of failures to enhance debugging and user feedback.
[REFACTOR_SUGGESTion]
-            return Ok(MetaHttpResponse::bad_request(e));
+            return Ok(MetaHttpResponse::bad_request(format!("Failed to parse trace IDs: {}", e)));
75-87: Check for empty trace ID list is crucial for preventing unnecessary operations.

The logic to prevent operations on an empty list of trace IDs is a good safeguard. This prevents the system from proceeding with no actionable items, which is efficient.
src/ingester/src/partition.rs (1)

113-115: Direct use of get_stream_setting_bloom_filter_fields and get_stream_setting_fts_fields simplifies the code.

Removing the unwrap() calls and directly returning the fields as vectors simplifies error handling and makes the code more robust.

src/handler/http/router/mod.rs (1)

387-387: Inclusion of new search-related routes enhances functionality.

Adding routes for canceling multiple queries and checking query statuses aligns with the PR's objectives to improve search handling and concurrency management.
src/handler/http/auth/validator.rs (1)
592-592: Refactor to handle potential errors gracefully in JSON parsing.

The use of unwrap_or_default() in JSON parsing could mask underlying issues with the authentication string format, leading to unexpected behavior.
[REFACTOR_SUGGESTion]
- config::utils::json::from_str(auth_str).unwrap_or_default();
+ match config::utils::json::from_str(auth_str) {
+     Ok(tokens) => tokens,
+     Err(e) => {
+         log::error!("Failed to parse auth tokens: {:?}", e);
+         return None;
+     }
+ }
src/service/search/mod.rs (1)
126-126: Ensure proper handling of Option type for user_id.

Assigning user_id directly without checking if it's None could lead to unexpected behavior or errors in downstream processing where user_id is expected to be a non-null string.
[REFACTOR_SUGGESTion]
- req.user_id = user_id.clone();
+ req.user_id = user_id.clone().unwrap_or_default();
This change ensures that user_id is always a string, even if it's empty, which may be safer depending on how it's used downstream.
src/service/search/cluster/mod.rs (1)

73-75: Ensure default timeout is set correctly.

The code correctly sets a default timeout if none is specified. This aligns with the PR objectives to handle timeouts effectively.
src/service/search/sql.rs (2)
410-410: Optimization of SQL query string construction.

The construction of the select_fields string is done in a way that could be optimized. Using join directly on an array of strings is more efficient and cleaner than concatenating strings in a loop.
[REFACTOR_SUGGESTion]
- let select_fields = "SELECT ".to_string() + &fields.join(",");
+ let select_fields = format!("SELECT {}", fields.join(","));
496-496: Dynamic selection of SQL function for case sensitivity.

The dynamic selection of SQL function based on ignore_case is a clever use of runtime configuration to adjust query behavior. This is a good implementation but ensure that all possible edge cases are handled correctly.

src/service/compact/merge.rs

src/service/search/cluster/mod.rs

coderabbitai · 2024-06-22T16:42:10Z

src/common/utils/auth.rs

+        } else if access_token.starts_with("Basic") || access_token.starts_with("Bearer") {
+            access_token
+        } else if access_token.starts_with("session") {
+            let session_key = access_token.strip_prefix("session ").unwrap().to_string();
+            match USER_SESSIONS.get(&session_key) {
+                Some(token) => {
+                    format!("Bearer {}", *token)
                }
-            } else {
-                format!("Bearer {}", access_token)
+                None => access_token,
            }
+        } else {
+            format!("Bearer {}", access_token)


Refactor extract_auth_str for clarity and security.

The function extract_auth_str has been modified to handle different token types, including a session-based token. Consider using a match statement instead of multiple if-else for clarity and maintainability. Additionally, ensure that session tokens are validated securely.

pub fn extract_auth_str(req: &HttpRequest) -> String { let auth_token = req.cookie("auth_tokens") .map(|cookie| json::from_str::<AuthTokens>(cookie.value()).unwrap_or_default().access_token) .unwrap_or_default(); match auth_token.as_str() { "" => req.cookie("auth_ext").map_or_else(|| "".to_string(), |cookie| cookie.value().to_string()), "Basic" | "Bearer" => auth_token, token if token.starts_with("session ") => { let session_key = token.strip_prefix("session ").unwrap().to_string(); USER_SESSIONs.get(&session_key).map_or(token, |t| format!("Bearer {}", t)) }, _ => format!("Bearer {}", auth_token), } }

src/service/search/cluster/mod.rs

src/service/search/sql.rs

src/infra/src/schema/mod.rs

src/service/search/sql.rs

taimingl · 2024-06-22T19:56:39Z

src/handler/http/request/search/job.rs

+    for trace_id in trace_ids {
+        match crate::service::search::cancel_query(trace_id).await {
+            Ok(status) => res.push(status),
+            Err(e) => return Ok(MetaHttpResponse::bad_request(e)),


maybe we can skip failed trace_id and process the rest and return the result, instead of returning right away?

Yep. both are okay.

hengfeiyang · 2024-06-24T04:10:05Z

keep improving #2506

hengfeiyang added 4 commits June 21, 2024 16:15

feat: full text fields always include default fields

4a9d6fa

feat: return error if the query wait in queue over timeout

5c1c97d

feat: support user search work group

f6a72dc

fix: user quota on search group

96ba696

github-actions bot added the ✏️ Feature label Jun 22, 2024

hengfeiyang requested review from haohuaijin, oasisk and taimingl June 22, 2024 16:35

Merge branch 'main' into feat/enhance-query

5785816

coderabbitai bot reviewed Jun 22, 2024

View reviewed changes

hengfeiyang merged commit 6a17c42 into main Jun 22, 2024

hengfeiyang deleted the feat/enhance-query branch June 22, 2024 16:59

taimingl reviewed Jun 22, 2024

View reviewed changes

taimingl pushed a commit that referenced this pull request Jul 12, 2024

feat: support user level search work group (#3790)

1fda6ea

This was referenced Oct 1, 2024

perf: query file_list in parallels by day #4669

Merged

fix: search history should not contain the search related to usage org #4673

Merged

This was referenced Oct 9, 2024

fix: report appropriate stream type #4744

Merged

feat: show errors of alerts #4445

Closed

fix: query start with work group #4772

Merged

feat: support partital result if some node return error #4778

Merged

feat: add short and long to usage report #4784

Merged

coderabbitai bot mentioned this pull request Oct 25, 2024

chore: add search type to query manager #4912

Merged

coderabbitai bot mentioned this pull request Nov 1, 2024

feat: refactor merge file on compactor #4971

Merged

This was referenced Dec 20, 2024

feat: add grpc search endpoint #5425

Merged

feat: websocket search & result_cache_v2 #4966

Merged

This was referenced Jan 3, 2025

Release kms feature #5545

Closed

feat: batch read data to avoid metrics OOM #5584

Merged

coderabbitai bot mentioned this pull request Jan 13, 2025

feat: ignore time_range when join on enrichment_tables #5684

Merged

2 tasks

coderabbitai bot mentioned this pull request Jan 21, 2025

feat: add cipher keys support for encryption/decryption #5683

Merged

coderabbitai bot mentioned this pull request Jan 31, 2025

fix: search canceled for super cluster #5848

Merged

feat: support user level search work group #3790

feat: support user level search work group #3790

Uh oh!

Conversation

hengfeiyang commented Jun 22, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New ENV

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jun 22, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot Jun 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

taimingl Jun 22, 2024

Choose a reason for hiding this comment

Uh oh!

hengfeiyang Jun 23, 2024

Choose a reason for hiding this comment

Uh oh!

hengfeiyang commented Jun 24, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

hengfeiyang commented Jun 22, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jun 22, 2024 •

edited

Loading

CodeRabbit Configration File (`.coderabbit.yaml`)