Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Jun 22, 2024

  • Full text search default fields should can be always searched
  • Return error if the query wait in queue over TIMEOUT
  • Support user level search work group limit
  • Support cancel multiple search trace_id

New ENV

  • O2_SEARCH_GROUP_USER_LOG_MAX_CONCURRENCY = 1
  • O2_SEARCH_GROUP_USER_SHORT_MAX_CONCURRENCY = 2

Summary by CodeRabbit

  • New Features

    • Added support for canceling multiple queries in the search functionality.
    • Introduced a new user concurrency management feature for search requests.
  • Improvements

    • Enhanced token extraction logic for better handling of access_token cases.
    • Improved the handling and processing of full-text search fields in SQL queries.
  • Bug Fixes

    • Fixed potential issues by removing unnecessary .unwrap() calls in several functions.
    • Corrected the handling of JSON format in authorization string processing.
  • Refactor

    • Refactored various functions to improve control flow and robustness.
    • Updated log messages for better clarity.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 22, 2024

Walkthrough

The recent code revisions focus on improving token management, enhancing search functionalities, and refining concurrency controls in a multi-user context. Key changes include explicit token handling, support for query cancellation, addition of user identification in search requests, and removing potential panics with .unwrap() calls to increase robustness. These amendments collectively aim to make the system more reliable and scalable, particularly in handling complex search operations across different user scenarios.

Changes

File / Group Summary
src/common/utils/auth.rs Refactored the extract_auth_str function to explicitly handle different access_token cases, improving token extraction and control flow.
src/config/src/meta/search.rs Added user_id: None field in SearchRequest initialization.
src/handler/http/auth/validator.rs Modified get_user_email_from_auth_str to change parameter passing method for from_str function.
src/handler/http/request/search/job.rs Introduced a put method import, added functions for handling multiple query cancellations, and refactored query cancellation logic.
src/handler/http/request/search/mod.rs Updated log message to specify "local" queue.
src/handler/http/router/mod.rs Added cancel_multiple_query service in get_service_routes function.
src/infra/src/schema/mod.rs Added constants to imports, modified functions to return Vec<String> instead of Result<Vec<String>>, updated default field handling logic.
src/ingester/src/partition.rs Removed .unwrap() calls after streaming settings functions to prevent potential panics.
src/proto/proto/cluster/search.proto Added a new optional user_id field in the SearchRequest message.
src/service/compact/merge.rs Removed .unwrap() calls in merge_files function to handle streaming settings, ensuring safer execution.
src/service/search/cluster/mod.rs Enhanced search function to manage timeouts, user concurrency, and work group processing for search requests, adding user concurrency checks and a default timeout.
src/service/search/mod.rs Added assignment for req.user_id in search function to include user identification.
src/service/search/sql.rs Refactored handling of full-text search fields, simplifying logic and removing direct SQL_FULL_TEXT_SEARCH_FIELDS usage in query generation.

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 9

Outside diff range and nitpick comments (2)
src/service/search/cluster/mod.rs (1)

Line range hint 655-694: Review lock release and final merge task handling.

The changes properly handle the final merge task and ensure the lock is released. However, consider refactoring to avoid duplication in the lock release logic.

- if user_id.is_some() {
-     work_group.as_ref().unwrap().done(trace_id, user_id).await?;
- }
- work_group.as_ref().unwrap().done(trace_id, None).await?;
+ work_group.as_ref().unwrap().finalize_search(trace_id, user_id).await?;
src/service/search/sql.rs (1)

457-457: Handling of ignore_case logic for full text search.

The handling of ignore_case is done based on the configuration of inverted_index_enabled and whether any fulltext terms are present. This logic seems correct, but adding a comment explaining why this decision is made would improve code readability.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between da9041a and 5785816.

Files selected for processing (13)
  • src/common/utils/auth.rs (1 hunks)
  • src/config/src/meta/search.rs (1 hunks)
  • src/handler/http/auth/validator.rs (1 hunks)
  • src/handler/http/request/search/job.rs (3 hunks)
  • src/handler/http/request/search/mod.rs (1 hunks)
  • src/handler/http/router/mod.rs (1 hunks)
  • src/infra/src/schema/mod.rs (3 hunks)
  • src/ingester/src/partition.rs (1 hunks)
  • src/proto/proto/cluster/search.proto (1 hunks)
  • src/service/compact/merge.rs (1 hunks)
  • src/service/search/cluster/mod.rs (6 hunks)
  • src/service/search/mod.rs (1 hunks)
  • src/service/search/sql.rs (7 hunks)
Files skipped from review due to trivial changes (2)
  • src/config/src/meta/search.rs
  • src/handler/http/request/search/mod.rs
Additional context used
Path-based instructions (10)
src/handler/http/request/search/job.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/ingester/src/partition.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/handler/http/router/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/common/utils/auth.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/infra/src/schema/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/handler/http/auth/validator.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/compact/merge.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Gitleaks
src/common/utils/auth.rs

577-577: Identified a HashiCorp Terraform password field, risking unauthorized infrastructure configuration and security breaches. (hashicorp-tf-password)

Additional comments not posted (11)
src/proto/proto/cluster/search.proto (1)

47-47: Addition of user_id field to SearchRequest is a positive change.

This change supports user-level search work group limits, aligning with the PR's objectives to enhance user-specific functionalities.

src/handler/http/request/search/job.rs (3)

18-18: Importing put HTTP method is appropriate.

This is necessary for handling the new put route for canceling multiple queries.


39-48: Ensure proper handling and error messages in cancel_multiple_query.

The function correctly parses and handles multiple trace IDs for cancellation. However, consider adding more specific error messages for different types of failures to enhance debugging and user feedback.
[REFACTOR_SUGGESTion]

-            return Ok(MetaHttpResponse::bad_request(e));
+            return Ok(MetaHttpResponse::bad_request(format!("Failed to parse trace IDs: {}", e)));

75-87: Check for empty trace ID list is crucial for preventing unnecessary operations.

The logic to prevent operations on an empty list of trace IDs is a good safeguard. This prevents the system from proceeding with no actionable items, which is efficient.

src/ingester/src/partition.rs (1)

113-115: Direct use of get_stream_setting_bloom_filter_fields and get_stream_setting_fts_fields simplifies the code.

Removing the unwrap() calls and directly returning the fields as vectors simplifies error handling and makes the code more robust.

src/handler/http/router/mod.rs (1)

387-387: Inclusion of new search-related routes enhances functionality.

Adding routes for canceling multiple queries and checking query statuses aligns with the PR's objectives to improve search handling and concurrency management.

src/handler/http/auth/validator.rs (1)

592-592: Refactor to handle potential errors gracefully in JSON parsing.

The use of unwrap_or_default() in JSON parsing could mask underlying issues with the authentication string format, leading to unexpected behavior.
[REFACTOR_SUGGESTion]

- config::utils::json::from_str(auth_str).unwrap_or_default();
+ match config::utils::json::from_str(auth_str) {
+     Ok(tokens) => tokens,
+     Err(e) => {
+         log::error!("Failed to parse auth tokens: {:?}", e);
+         return None;
+     }
+ }
src/service/search/mod.rs (1)

126-126: Ensure proper handling of Option type for user_id.

Assigning user_id directly without checking if it's None could lead to unexpected behavior or errors in downstream processing where user_id is expected to be a non-null string.
[REFACTOR_SUGGESTion]

- req.user_id = user_id.clone();
+ req.user_id = user_id.clone().unwrap_or_default();

This change ensures that user_id is always a string, even if it's empty, which may be safer depending on how it's used downstream.

src/service/search/cluster/mod.rs (1)

73-75: Ensure default timeout is set correctly.

The code correctly sets a default timeout if none is specified. This aligns with the PR objectives to handle timeouts effectively.

src/service/search/sql.rs (2)

410-410: Optimization of SQL query string construction.

The construction of the select_fields string is done in a way that could be optimized. Using join directly on an array of strings is more efficient and cleaner than concatenating strings in a loop.
[REFACTOR_SUGGESTion]

- let select_fields = "SELECT ".to_string() + &fields.join(",");
+ let select_fields = format!("SELECT {}", fields.join(","));

496-496: Dynamic selection of SQL function for case sensitivity.

The dynamic selection of SQL function based on ignore_case is a clever use of runtime configuration to adjust query behavior. This is a good implementation but ensure that all possible edge cases are handled correctly.

Comment on lines +508 to +519
} else if access_token.starts_with("Basic") || access_token.starts_with("Bearer") {
access_token
} else if access_token.starts_with("session") {
let session_key = access_token.strip_prefix("session ").unwrap().to_string();
match USER_SESSIONS.get(&session_key) {
Some(token) => {
format!("Bearer {}", *token)
}
} else {
format!("Bearer {}", access_token)
None => access_token,
}
} else {
format!("Bearer {}", access_token)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor extract_auth_str for clarity and security.

The function extract_auth_str has been modified to handle different token types, including a session-based token. Consider using a match statement instead of multiple if-else for clarity and maintainability. Additionally, ensure that session tokens are validated securely.

pub fn extract_auth_str(req: &HttpRequest) -> String {
    let auth_token = req.cookie("auth_tokens")
        .map(|cookie| json::from_str::<AuthTokens>(cookie.value()).unwrap_or_default().access_token)
        .unwrap_or_default();

    match auth_token.as_str() {
        "" => req.cookie("auth_ext").map_or_else(|| "".to_string(), |cookie| cookie.value().to_string()),
        "Basic" | "Bearer" => auth_token,
        token if token.starts_with("session ") => {
            let session_key = token.strip_prefix("session ").unwrap().to_string();
            USER_SESSIONs.get(&session_key).map_or(token, |t| format!("Bearer {}", t))
        },
        _ => format!("Bearer {}", auth_token),
    }
}

@hengfeiyang hengfeiyang merged commit 6a17c42 into main Jun 22, 2024
@hengfeiyang hengfeiyang deleted the feat/enhance-query branch June 22, 2024 16:59
for trace_id in trace_ids {
match crate::service::search::cancel_query(trace_id).await {
Ok(status) => res.push(status),
Err(e) => return Ok(MetaHttpResponse::bad_request(e)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we can skip failed trace_id and process the rest and return the result, instead of returning right away?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. both are okay.

@hengfeiyang
Copy link
Contributor Author

keep improving #2506

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants