feat: optimize sql to plan execution #4100

hengfeiyang · 2024-08-01T09:11:55Z

Now we use sql to plan to execute the query.

we split the PhysicalPlan to two stage:

Partial plan, it can be distributed will run on all the nodes.
Final plan, it can't be distributed will run on the leader querier and merge the result.

Summary by CodeRabbit

New Features
- Updated to version 0.10.9, introducing enhancements and new dependencies for improved data processing functionalities.
- Added new Protocol Buffers schemas to define structures for cluster management and data processing frameworks, enhancing integration capabilities.
Bug Fixes
- Improved error handling in configuration to prevent runtime panics.
- Enhanced logic in caching mechanisms for more reliable query responses.
Documentation
- Added module declarations to improve organization and clarity within the codebase.
Refactor
- Streamlined execution plans and SQL processing for better performance and maintainability.
Tests
- Introduced new functions for log ingestion in test suites to enhance reliability and coverage.
Style
- Made formatting adjustments across various files to improve readability and consistency.

coderabbitai · 2024-08-01T09:12:22Z

Warning

Rate limit exceeded

@hengfeiyang has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 0 minutes and 0 seconds before requesting another review.

How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

Commits

Files that changed from the base of the PR and between 4a92663 and 6a9849e.

Walkthrough

The recent changes span multiple modules in the project, focusing on enhancing functionality, improving error handling, and refining code structure. Notable updates include a version increment, the introduction of new dependencies, and significant changes to SQL processing and execution plan management. These modifications aim to enhance performance, maintainability, and the overall robustness of the application.

Changes

Files	Change Summary
`Cargo.toml`	Version updated from `0.10.8` to `0.10.9`. Introduced `datafusion-proto` as a new workspace dependency.
`src/config/src/config.rs`	Improved `get_instance_id` function to handle missing keys safely, preventing panics.
`src/proto/Cargo.toml`	Added new dependencies: `async-trait`, `datafusion`, `datafusion-proto`, and `tokio`, all marked with `workspace = true`.
`src/proto/build.rs`	Enhanced build process for protocol buffers, including improved file handling and writing operations.
`src/proto/proto/cluster/plan.proto`	Defined a new Protocol Buffers schema for a cluster management system with `NewEmptyExecNode` message.
`src/proto/proto/datafusion/datafusion.proto`	Created a comprehensive schema for DataFusion's logical and physical query planning.
`src/proto/src/datafusion/codec.rs`	Introduced codecs for encoding and decoding execution plans, enhancing the DataFusion framework's capabilities.
`src/proto/src/datafusion/emptyexec.rs`	Implemented `NewEmptyExec` struct for handling empty datasets within query plans.
`src/proto/src/datafusion/emptytable.rs`	Added `NewEmptyTable` struct for testing and plan generation without actual data.
`src/proto/src/datafusion/mod.rs`	Organized module structure by declaring sub-modules for `codec`, `emptyexec`, and `emptytable`.
`src/proto/src/generated/mod.rs`	Introduced module declarations for `cluster` and `prometheus`, centralizing related functionalities.
`src/proto/src/generated/prometheus.rs`	Defined data structures for interacting with Prometheus using Protocol Buffers.
`src/proto/src/lib.rs`	Streamlined module organization by removing old RPC modules and re-exporting them under a new `generated` module.
`src/service/search/cache/cacher.rs`	Enhanced cache checking logic to improve handling of cached query responses.
`src/service/search/cache/mod.rs`	Updated `search` function by removing `&parsed_sql` parameter, simplifying its interface.
`src/service/search/sql.rs`	Removed static regex patterns and simplified subquery handling for enhanced maintainability.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Database

    Client->>Server: Send Search Request
    Server->>Database: Execute SQL Query
    Database-->>Server: Return Results
    Server->>Client: Send Search Results

sequenceDiagram
    participant Client
    participant Server
    participant DataFusion

    Client->>Server: Request Data
    Server->>DataFusion: Create Execution Plan
    DataFusion-->>Server: Return Plan
    Server->>Client: Send Data

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share

Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>.
- Generate unit testing code for this file.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai generate unit testing code for this file.
- @coderabbitai modularize this function.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai generate interesting stats about this repository and render them as a table.
- @coderabbitai show all the console.log statements in this repository.
- @coderabbitai read src/utils.ts and generate unit testing code.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (`.coderabbit.yaml`)

You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
Please see the configuration documentation for more information.
If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between db1ca05 and e1ac886.

Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

Files selected for processing (20)

Cargo.toml (1 hunks)
src/config/src/meta/search.rs (1 hunks)
src/config/src/meta/sql.rs (8 hunks)
src/proto/proto/cluster/common.proto (1 hunks)
src/proto/proto/cluster/search.proto (2 hunks)
src/service/promql/search/grpc/storage.rs (4 hunks)
src/service/promql/search/grpc/wal.rs (3 hunks)
src/service/search/cluster/grpc.rs (1 hunks)
src/service/search/cluster/http.rs (1 hunks)
src/service/search/cluster/mod.rs (5 hunks)
src/service/search/datafusion/exec.rs (18 hunks)
src/service/search/datafusion/mod.rs (1 hunks)
src/service/search/datafusion/plan.rs (1 hunks)
src/service/search/datafusion/table_provider/memtable.rs (1 hunks)
src/service/search/datafusion/table_provider/mod.rs (6 hunks)
src/service/search/grpc/mod.rs (6 hunks)
src/service/search/grpc/storage.rs (8 hunks)
src/service/search/grpc/wal.rs (12 hunks)
src/service/search/mod.rs (4 hunks)
src/service/search/sql.rs (5 hunks)

Files skipped from review due to trivial changes (3)

Cargo.toml
src/proto/proto/cluster/common.proto
src/proto/proto/cluster/search.proto

Additional context used

Path-based instructions (17)

src/service/search/datafusion/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/grpc.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/table_provider/memtable.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/plan.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/http.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/promql/search/grpc/storage.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/promql/search/grpc/wal.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/table_provider/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/storage.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/config/src/meta/search.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/wal.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/exec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/config/src/meta/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (63)

src/service/search/datafusion/mod.rs (1)

20-20: Module renaming appears appropriate.

The module rewrite has been renamed to plan. Ensure that this renaming aligns with the new functionality or focus of the module.

src/service/search/cluster/grpc.rs (5)

58-59: Improved control flow.

The check for non-empty merge_results simplifies the control flow.

64-66: Buffer initialization and writer instantiation.

The buffer buf is correctly initialized, and the FileWriter is instantiated with the buffer and schema.

70-75: Error handling for writing batches.

The error handling for writing batches is robust, logging errors instead of using unwrap().

78-83: Error handling for finishing the writer.

The error handling for finishing the writer is improved, logging errors instead of using unwrap().

84-86: Buffer assignment.

The buffer hits_buf is correctly assigned if writer.into_inner() is successful.

src/service/search/datafusion/table_provider/memtable.rs (5)

1-14: License and copyright information.

The file includes the correct license and copyright information.

16-30: Imports and dependencies.

The necessary imports and dependencies are correctly included.

34-38: Struct definition.

The NewMemTable struct is defined with mem_table and diff_rules fields.

40-52: Constructor implementation.

The try_new constructor correctly initializes the NewMemTable struct.

55-117: TableProvider implementation.

The TableProvider trait is correctly implemented for NewMemTable, including methods for schema, constraints, table type, scan, insert_into, and get_column_default.
src/service/search/datafusion/plan.rs (5)
36-78: LGTM! Recursive logic is clear.

The recursive logic for processing the execution plan is clear and well-implemented.

80-137: LGTM! Logic for handling distributed plans and memory execution plans is clear.

The logic for handling distributed plans and memory execution plans is clear and well-implemented.

139-149: LGTM! Logic for handling memory execution plans and coalescing batches is clear.

The logic for handling memory execution plans and coalescing batches is clear and well-implemented.

36-78: Verify handling of specific plan names.

Ensure that the handling of specific plan names like HashJoinExec and CrossJoinExec is correct and that the error messages are clear.

Verification successful

Verified handling of specific plan names.

The function get_partial_plan correctly handles specific plan names like HashJoinExec and CrossJoinExec by returning a NotImplemented error when these names are encountered.

src/service/search/datafusion/plan.rs: Lines 36-78
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the handling of specific plan names in `get_partial_plan`.

# Test: Search for the function usage. Expect: Proper handling of plan names.
rg --type rust -A 5 $'get_partial_plan'
Length of output: 2270

Script:
#!/bin/bash
# Description: Extract and review the specific handling logic for `HashJoinExec` and `CrossJoinExec` within `get_partial_plan`.

# Extract the relevant portion of the `get_partial_plan` function to review the handling of specific plan names
rg --type rust -A 20 'fn get_partial_plan' src/service/search/datafusion/plan.rs
Length of output: 772
80-137: Verify handling of specific plan names.

Ensure that the handling of specific plan names like HashJoinExec and CrossJoinExec is correct and that the error messages are clear.
src/service/search/cluster/http.rs (2)
65-67: LGTM! Simplified logic for handling merge_batches.

The new implementation simplifies the logic by directly checking if merge_batches is empty and accessing the schema of the first element.

65-67: Verify correctness of the new implementation.

Ensure that the new implementation correctly handles the merge_batches variable and that the functionality remains intact.

Verification successful

The new implementation correctly handles the merge_batches variable and maintains the expected functionality.

The merge_batches variable is populated and used consistently across the codebase.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the correctness of the new implementation for handling `merge_batches`.

# Test: Search for the function usage. Expect: Correct handling of `merge_batches`.
rg --type rust -A 5 $'merge_batches'
Length of output: 3065

Script:
#!/bin/bash
# Description: Verify the correctness of the new implementation for handling `merge_batches` by examining its population and usage.

# Search for the function that populates `merge_batches` in the relevant files.
rg --type rust -A 10 'let (merge_batches' src/service/search/cluster/http.rs src/service/search/cluster/mod.rs src/service/search/cluster/super_cluster.rs

# Search for the usage of `merge_batches` in the relevant files to understand its handling.
rg --type rust -A 10 'merge_batches' src/service/search/cluster/http.rs src/service/search/cluster/mod.rs src/service/search/cluster/super_cluster.rs
Length of output: 5272
src/service/search/grpc/mod.rs (3)

40-40: LGTM! Simplified structure for SearchResult.

The new implementation simplifies the structure of the results by removing the key-value pairing associated with the previous HashMap.

132-140: LGTM! Streamlined merging process for search results.

The new implementation streamlines the merging process by directly flattening and filtering the nested vector of RecordBatch objects.

143-178: LGTM! Enhanced schema management for search results.

The new implementation enhances schema management by checking for wildcard selection in SQL queries and ensuring consistent output.
src/service/promql/search/grpc/storage.rs (3)
161-161: Good use of target_partitions in SearchSession.

The addition of target_partitions to the SearchSession struct is a good enhancement for flexibility.

142-148: Ensure configuration values are correctly set.

The logic for setting target_partitions based on cache_type looks correct. However, ensure that cfg.limit.query_thread_num and cfg.limit.cpu_num are appropriately configured in all environments.

Verification successful

Configuration values are correctly set.

The configuration settings query_thread_num and cpu_num are present and used across various files in the codebase. The logic for setting target_partitions based on cache_type in the create_context function is valid.

src/service/search/mod.rs

src/service/search/grpc/storage.rs

src/service/search/grpc/wal.rs

src/service/compact/file_list_deleted.rs

src/service/search/datafusion/exec.rs

src/service/search/datafusion/table_provider/mod.rs

src/service/promql/search/grpc/storage.rs

src/service/db/file_list/remote.rs

src/main.rs

src/infra/src/storage/mod.rs

src/infra/src/schema/mod.rs

src/infra/src/db/nats.rs

src/config/src/utils/cgroup.rs

src/config/src/config.rs

src/config/src/meta/cluster.rs

src/common/infra/wal.rs

src/common/infra/cluster/nats.rs

src/common/infra/cluster/mod.rs

src/common/infra/cluster/etcd.rs
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that `query_thread_num` and `cpu_num` are configured in the configuration files.

# Test: Search for the configuration settings. Expect: Both settings should be present.
rg --type toml 'query_thread_num|cpu_num'
Length of output: 43

Script:
#!/bin/bash
# Description: Verify that `query_thread_num` and `cpu_num` are configured in the configuration files.

# Test: Search for the configuration settings in all relevant files (not limited to TOML).
rg 'query_thread_num|cpu_num'
Length of output: 9443
170-170: Ensure register_table handles the new parameter correctly.

The register_table function now receives target_partitions. Verify that this function handles the new parameter correctly and that it doesn't introduce any side effects.

Verification successful

The register_table function handles the target_partitions parameter correctly through its usage of the prepare_datafusion_context function.

The target_partitions parameter is passed to prepare_datafusion_context within register_table.

The prepare_datafusion_context function correctly processes the target_partitions parameter.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify that `register_table` handles the `target_partitions` parameter correctly.

# Test: Search for the `register_table` function definition and its usage. Expect: Correct handling of `target_partitions`.
rg --type rust 'fn register_table' -A 10
Length of output: 769

Script:
#!/bin/bash
# Description: Search for the usage of `target_partitions` in the `src/service/search/datafusion/exec.rs` file to verify its handling.

# Test: Search for the term `target_partitions` in the file.
rg 'target_partitions' src/service/search/datafusion/exec.rs -A 5 -B 5
Length of output: 4491

Script:
#!/bin/bash
# Description: Verify the handling of the `target_partitions` parameter in the `prepare_datafusion_context` function.

# Test: Search for the implementation of the `prepare_datafusion_context` function.
rg 'fn prepare_datafusion_context' -A 20
Length of output: 1796
src/service/promql/search/grpc/wal.rs (3)
184-184: Good use of target_partitions in SearchSession.

The addition of target_partitions to the SearchSession struct is a good enhancement for flexibility.

138-139: Verify the impact of the new parameter in prepare_datafusion_context.

The addition of a sixth parameter (0) to prepare_datafusion_context needs to be verified to ensure it doesn't introduce any side effects.

Verification successful

The new parameter in prepare_datafusion_context is handled correctly.

The target_partitions parameter is integrated into the function logic and is used appropriately in setting configuration values and handling dynamic resource allocation.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the impact of the new parameter in `prepare_datafusion_context`.

# Test: Search for the `prepare_datafusion_context` function definition and its usage. Expect: Correct handling of the new parameter.
rg --type rust 'fn prepare_datafusion_context' -A 10
Length of output: 911

Script:
#!/bin/bash
# Description: Retrieve the full implementation of the `prepare_datafusion_context` function to analyze the usage of the `target_partitions` parameter.

# Extract the full implementation of `prepare_datafusion_context` function.
ast-grep --lang rust --pattern $'pub async fn prepare_datafusion_context($_, $_, $_, $_, $_, $_) -> $_ {
  $$$
}'
Length of output: 4022
193-193: Ensure register_table handles the new parameter correctly.

The register_table function now receives hashbrown::HashMap::default(). Verify that this function handles the new parameter correctly and that it doesn't introduce any side effects.
src/service/search/datafusion/table_provider/mod.rs (6)

73-73: New field diff_rules added to NewListingTable.

The addition of diff_rules to store data type transformation rules is a good enhancement. Ensure that this field is utilized correctly in all relevant methods.

88-88: New parameter rules added to try_new.

The addition of the rules parameter to try_new allows for initializing diff_rules. Ensure that this parameter is passed correctly when creating a new instance of NewListingTable.

107-107: Initialization of diff_rules in try_new.

The diff_rules field is correctly initialized with the provided rules parameter.

334-337: Check for empty diff_rules.

The logic correctly checks if diff_rules is empty before proceeding with the execution plan creation.

338-348: Apply data type transformation rules.

The logic for applying data type casts based on diff_rules is well-implemented. Ensure that all necessary data type transformations are covered by the rules.

349-355: Creation of ProjectionExec with transformation rules.

The creation of ProjectionExec with the applied transformation rules is a good enhancement. Ensure that the ProjectionExec is correctly integrated into the execution plan.

src/service/search/grpc/storage.rs (5)

88-88: LGTM! Simplified return type for empty schema versions.

The change from HashMap::new() to vec![] is a good simplification.

113-113: LGTM! Simplified return type for empty files.

The change from HashMap::new() to vec![] is a good simplification.

208-213: LGTM! Introduced target_partitions based on cache_type.

The introduction of target_partitions enhances the function's adaptability based on the caching strategy.

241-241: LGTM! Added target_partitions to the session struct.

Including target_partitions in the session struct is a necessary change for the search operation.

334-340: LGTM! Simplified results collection.

The change from HashMap to Vec for results collection reduces complexity and improves efficiency.

src/config/src/meta/search.rs (1)

40-40: LGTM! Added target_partitions to the Session struct.

The addition of target_partitions enhances the Session struct's capability to manage specific partitions.

src/service/search/grpc/wal.rs (5)

87-87: LGTM! Simplified return type for empty files.

The change from HashMap::new() to vec![] is a good simplification.

148-148: LGTM! Simplified return type for zero-length files.

The change from HashMap::new() to vec![] is a good simplification.

169-169: LGTM! Simplified return type for empty schema versions.

The change from HashMap::new() to vec![] is a good simplification.

258-258: LGTM! Added target_partitions to the session struct.

Including target_partitions in the session struct is a necessary change for the search operation.

346-353: LGTM! Simplified results collection.

The change from HashMap to Vec for results collection reduces complexity and improves efficiency.
src/service/search/datafusion/exec.rs (6)
110-114: LGTM!

The changes to use NewMemTable and register it as a table in the DataFusion context are appropriate and improve the logic.

140-183: LGTM!

The restructuring of the exec_query function improves readability and traceability. The additional logging for logical and physical plans is beneficial for debugging.

205-287: LGTM!

The renaming and signature changes improve clarity and explicit data handling. The use of EmptyTable ensures graceful handling of empty datasets.

79-81: Verify the function usage in the codebase.

Ensure that all function calls to sql match the new signature and handle the updated return type correctly.

Verification successful

Verify the function usage in the codebase.

All instances of the sql function in the codebase have been updated to match the new signature and handle the updated return type correctly.

src/service/search/grpc/wal.rs

src/service/search/grpc/storage.rs
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `sql` match the new signature and handle the updated return type correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'sql('
Length of output: 87

Script:
#!/bin/bash
# Description: Verify all function calls to `sql` match the new signature and handle the updated return type correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'sql\\('
Length of output: 14512

Script:
#!/bin/bash
# Description: Extract and inspect the relevant lines of code for each instance of the `sql` function call to verify the new signature and return type handling.

# Extract instances of the `sql` function call and display the surrounding lines for context.
rg --type rust -A 10 'exec::sql\('
Length of output: 2485
140-140: Verify the function usage in the codebase.

Ensure that all function calls to exec_query match the new signature and handle the updated return type correctly.

Verification successful

Verification successful.

The exec_query function call matches the new signature and handles the return type correctly.

src/service/search/datafusion/exec.rs: let result = exec_query(&ctx, session, sql).await?;
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `exec_query` match the new signature and handle the updated return type correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'exec_query('
Length of output: 108

Script:
#!/bin/bash
# Description: Verify all function calls to `exec_query` match the new signature and handle the updated return type correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 'exec_query\('
Length of output: 852

Script:
#!/bin/bash
# Description: Verify all function calls to `exec_query` match the new signature and handle the updated return type correctly.

# Test: Search for all occurrences of `exec_query` across the codebase.
rg 'exec_query\('
Length of output: 172
205-207: Verify the function usage in the codebase.

Ensure that all function calls to merge_partitions match the new signature and handle the updated parameters correctly.

Verification successful

Verified the function usage in the codebase.

The function call to merge_partitions in src/service/search/cluster/mod.rs matches the new signature and handles the updated parameters correctly.

src/service/search/cluster/mod.rs
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `merge_partitions` match the new signature and handle the updated parameters correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'merge_partitions('
Length of output: 126

Script:
#!/bin/bash
# Description: Verify all function calls to `merge_partitions` match the new signature and handle the updated parameters correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 'merge_partitions('
Length of output: 126

Script:
#!/bin/bash
# Description: Verify all function calls to `merge_partitions` match the new signature and handle the updated parameters correctly.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 'merge_partitions\('
Length of output: 796
src/service/search/mod.rs (3)

741-785: LGTM!

The generate_search_schema function logic ensures the resulting schema is up-to-date and accurate. The automatic addition of a timestamp field if not present is a good practice.

789-847: LGTM!

The generate_select_start_search_schema function effectively manages field discrepancies and aliases, ensuring any new fields required are added to the schema.

849-864: LGTM!

The generate_used_fields_in_query function correctly identifies the fields used in the SQL query, excluding aliases.

src/config/src/meta/sql.rs (5)

Line range hint 51-51:
LGTM!

The removal of the subquery field simplifies the Sql struct and indicates a shift in how subqueries are handled.

129-129: LGTM!

The removal of subquery handling simplifies the conversion logic and aligns with the changes to the Sql struct.

Line range hint 221-240:
LGTM!

The changes reflect a more stringent requirement that only single tables are supported, leading to clearer error handling and more predictable behavior.

250-250: LGTM!

The change to use id.value.to_string() enhances clarity in how identifiers are represented.

263-263: LGTM!

The change to use id.value.to_string() enhances clarity in how identifiers are represented.

src/service/search/cluster/mod.rs (3)

721-763: Ensure efficiency in formatting record batches.

The new logic for formatting record batches by schema looks correct, but ensure it is efficient and does not introduce performance bottlenecks.

683-684: Verify the return type change.

Ensure that all calls to the merge_grpc_result function handle the new return type correctly.

71-71: Verify the return type change.

Ensure that all calls to the search function handle the new return type correctly.

src/service/search/sql.rs (2)

43-43: Verify the impact of removing regex declarations.

Ensure that the removal of RE_COUNT_DISTINCT and RE_FIELD_FN does not affect the processing of distinct counts and field functions in SQL queries.

225-225: Ensure correctness of simplified SQL handling logic.

The SQL handling logic has been simplified. Ensure that the new logic correctly handles subqueries and time range rewrites.

coderabbitai

Actionable comments posted: 18

Outside diff range, codebase verification and nitpick comments (17)

src/service/search/cluster/grpc.rs (2)
70-75: Improve error logging in the loop.

Consider adding more context to the error logs within the loop to help with debugging.
-  log::error!(
-      "[trace_id {trace_id}] write record batch to ipc error: {}",
-      e
-  );
+  log::error!(
+      "[trace_id {trace_id}] write record batch to ipc error: {}, batch: {:?}",
+      e,
+      batch
+  );
78-83: Enhance error logging for finish operation.

Consider adding more context to the error logs for the finish operation to help with debugging.
-  log::error!(
-      "[trace_id {trace_id}] convert record batch to ipc error: {}",
-      e
-  );
+  log::error!(
+      "[trace_id {trace_id}] convert record batch to ipc error: {}, writer: {:?}",
+      e,
+      writer
+  );
src/service/search/datafusion/plan.rs (4)
36-38: Enhance function documentation.

Consider adding a docstring to explain the purpose and usage of the get_partial_plan function.
/// Retrieves a partial execution plan if certain conditions are met.
50-56: Combine conditions for readability.

Combine the two conditions checking for HashJoinExec and CrossJoinExec for better readability.
for name in ["HashJoinExec", "CrossJoinExec"] {
    if cplan.name() == name {
        return Err(datafusion::error::DataFusionError::NotImplemented(
            format!("{} is not supported", name),
        ));
    }
}
80-82: Enhance function documentation.

Consider adding a docstring to explain the purpose and usage of the get_final_plan function.
/// Constructs the final execution plan based on provided data and the initial plan.
96-101: Combine conditions for readability.

Combine the two conditions checking for HashJoinExec and CrossJoinExec for better readability.
for name in ["HashJoinExec", "CrossJoinExec"] {
    if cplan.name() == name {
        return Err(datafusion::error::DataFusionError::NotImplemented(
            format!("{} is not supported", name),
        ));
    }
}
src/service/search/cluster/http.rs (3)
Line range hint 144-178: Optimize schema formatting logic.

The schema formatting logic can be optimized by reducing redundant checks and operations.
if !results.is_empty() && RE_SELECT_WILDCARD.is_match(sql.origin_sql.as_str()) {
    let mut schema = results[0].schema();
    let schema_fields: HashSet<_> = schema.fields().iter().map(|f| f.name()).collect();
    let mut new_fields = HashSet::new();
    let mut need_format = false;

    for batch in &results {
        let batch_fields: HashSet<_> = batch.schema().fields().iter().map(|f| f.name()).collect();
        if batch_fields != schema_fields {
            need_format = true;
            new_fields.extend(batch.schema().fields().iter().cloned());
        }
    }

    if need_format {
        let new_schema = Schema::new(new_fields.into_iter().collect());
        schema = Arc::new(Schema::try_merge(vec![schema.as_ref().clone(), new_schema]).unwrap());

        results = results
            .into_iter()
            .map(|batch| format_recordbatch_by_schema(schema.clone(), batch))
            .collect();
    }
}
Line range hint 182-184: Remove debug print statement.

Remove the debug print statement to clean up the code.
println!("schema: {:?}", schema);
Line range hint 200-220: Handle potential errors in writer operations.

Ensure proper error handling for writer operations to avoid potential issues.
let buf = Vec::new();
let mut writer = ipc::writer::FileWriter::try_new_with_options(buf, &schema, ipc_options).unwrap();

for batch in results {
    hits_total += batch.num_rows();
    if let Err(e) = writer.write(&batch) {
        log::error!("[trace_id {trace_id}] write record batch to ipc error: {}", e);
    }
}

if let Err(e) = writer.finish() {
    log::error!("[trace_id {trace_id}] convert record batch to ipc error: {}", e);
}

if let Ok(v) = writer.into_inner() {
    hits_buf = v;
}
let buf = Vec::new();
if let Ok(mut writer) = ipc::writer::FileWriter::try_new_with_options(buf, &schema, ipc_options) {
    for batch in results {
        hits_total += batch.num_rows();
        if let Err(e) = writer.write(&batch) {
            log::error!("[trace_id {trace_id}] write record batch to ipc error: {}", e);
        }
    }

    if let Err(e) = writer.finish() {
        log::error!("[trace_id {trace_id}] convert record batch to ipc error: {}", e);
    } else if let Ok(v) = writer.into_inner() {
        hits_buf = v;
    }
} else {
    log::error!("[trace_id {trace_id}] failed to create IPC writer");
}
src/service/search/grpc/mod.rs (5)
40-40: Update type alias documentation.

Update the documentation for the SearchResult type alias to reflect the new structure.
/// SearchResult is a Result containing a vector of record batches and scan statistics.
88-88: Add a comment explaining the default return value.

Add a comment to explain why an empty vector and default ScanStats are returned.
// Return an empty vector and default ScanStats if WAL is skipped or not an ingester node.
103-103: Add a comment explaining the default return value.

Add a comment to explain why an empty vector and default ScanStats are returned.
// Return an empty vector and default ScanStats if WAL is skipped or not an ingester node.
116-116: Add a comment explaining the default return value.

Add a comment to explain why an empty vector and default ScanStats are returned.
// Return an empty vector and default ScanStats if the search type is WAL only.
182-184: Remove debug print statement.

Remove the debug print statement to clean up the code.
println!("schema: {:?}", schema);
src/service/search/mod.rs (3)
746-746: Typo: "cacluate" should be "calculate".

Fix the typo in the comment.
-    // cacluate the diff between latest schema and group schema
+    // calculate the diff between the latest schema and group schema
800-800: Typo: "cacluate" should be "calculate".

Fix the typo in the comment.
-    // cacluate the diff between latest schema and group schema
+    // calculate the diff between the latest schema and group schema
850-850: Typo: "note field name maybe equal to alias name" should be "note field name may be equal to alias name".

Fix the typo in the comment.
-    // note field name maybe equal to alias name
+    // note field name may be equal to alias name

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between db1ca05 and e1ac886.

Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

Files selected for processing (20)

Cargo.toml (1 hunks)
src/config/src/meta/search.rs (1 hunks)
src/config/src/meta/sql.rs (8 hunks)
src/proto/proto/cluster/common.proto (1 hunks)
src/proto/proto/cluster/search.proto (2 hunks)
src/service/promql/search/grpc/storage.rs (4 hunks)
src/service/promql/search/grpc/wal.rs (3 hunks)
src/service/search/cluster/grpc.rs (1 hunks)
src/service/search/cluster/http.rs (1 hunks)
src/service/search/cluster/mod.rs (5 hunks)
src/service/search/datafusion/exec.rs (18 hunks)
src/service/search/datafusion/mod.rs (1 hunks)
src/service/search/datafusion/plan.rs (1 hunks)
src/service/search/datafusion/table_provider/memtable.rs (1 hunks)
src/service/search/datafusion/table_provider/mod.rs (6 hunks)
src/service/search/grpc/mod.rs (6 hunks)
src/service/search/grpc/storage.rs (8 hunks)
src/service/search/grpc/wal.rs (12 hunks)
src/service/search/mod.rs (4 hunks)
src/service/search/sql.rs (5 hunks)

Files skipped from review due to trivial changes (3)

Cargo.toml
src/proto/proto/cluster/common.proto
src/proto/proto/cluster/search.proto

Additional context used

Path-based instructions (17)

src/service/search/datafusion/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/grpc.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/table_provider/memtable.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/plan.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/http.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/promql/search/grpc/storage.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/promql/search/grpc/wal.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/table_provider/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/storage.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/config/src/meta/search.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/wal.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/exec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/config/src/meta/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (38)

src/service/search/datafusion/mod.rs (1)
20-20: Verify all references to the renamed module.

Ensure that all references to the rewrite module have been updated to plan throughout the codebase.

Verification successful

All references to the renamed module have been updated.

The search results confirm that there are no remaining references to the rewrite module, and the plan module is correctly declared. No further action is needed.

src/service/search/datafusion/mod.rs: pub mod plan;
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all references to the renamed module `rewrite` have been updated to `plan`.

# Test: Search for the old module name. Expect: No occurrences of the old module name.
rg --type rust 'rewrite'
Length of output: 2034

Script:
#!/bin/bash
# Description: Verify all references to the renamed module `rewrite` have been updated to `plan`.

# Search for import statements and module usages related to `rewrite`.
rg --type rust 'mod rewrite' -A 2
rg --type rust 'use .*::rewrite' -A 2

# Search for import statements and module usages related to `plan`.
rg --type rust 'mod plan' -A 2
rg --type rust 'use .*::plan' -A 2
Length of output: 311
src/service/search/cluster/grpc.rs (2)
66-66: Verify FileWriter construction.

Ensure that the FileWriter is correctly constructed with the new buffer and schema.

Verification successful

FileWriter construction verified.

The FileWriter is correctly constructed with the new buffer and schema in the following locations:

src/service/search/grpc/mod.rs

src/service/search/cluster/grpc.rs
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the correct construction of the `FileWriter` with the new buffer and schema.

# Test: Search for the construction of the `FileWriter`. Expect: Correct construction with the new buffer and schema.
rg --type rust 'ipc::writer::FileWriter::try_new_with_options(buf, &schema, ipc_options)'
Length of output: 91

Script:
#!/bin/bash
# Description: Search for all instances of `FileWriter` construction in the codebase.

# Test: Search for any construction of `FileWriter`. Expect: Identify all instances to verify correct construction.
rg --type rust 'FileWriter::try_new_with_options'
Length of output: 305
58-58: Check buffer initialization.

Ensure that the new buffer buf is correctly initialized and used in the FileWriter.

Verification successful

Buffer initialization and usage verified.

The buffer buf is correctly initialized and used in the FileWriter in both src/service/search/cluster/grpc.rs and src/service/search/grpc/mod.rs.

src/service/search/cluster/grpc.rs: Buffer buf is initialized and passed to ipc::writer::FileWriter::try_new_with_options.

src/service/search/grpc/mod.rs: Buffer buf is initialized and passed to ipc::writer::FileWriter::try_new_with_options.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the correct initialization and usage of the buffer `buf`.

# Test: Search for the usage of the buffer `buf`. Expect: Correct initialization and usage in the `FileWriter`.
rg --type rust 'let buf = Vec::new()'
Length of output: 165

Script:
#!/bin/bash
# Description: Verify the correct usage of the buffer `buf` in the identified files.

# Test: Search for the usage of the buffer `buf` in the identified files to ensure it's correctly used in the `FileWriter`.
rg 'buf' src/service/search/cluster/grpc.rs -A 10
rg 'buf' src/service/search/grpc/mod.rs -A 10
Length of output: 2880
src/service/search/datafusion/table_provider/memtable.rs (3)
42-52: Check the creation of the in-memory table.

Ensure that the try_new function correctly initializes the NewMemTable with the provided schema, partitions, and rules.

Verification successful

Initialization of NewMemTable is correct.

The try_new function correctly initializes the NewMemTable with the provided schema, partitions, and rules.

The schema is cloned and passed.

The record_batches are wrapped in a vector and passed as partitions.

The rules are cloned and passed.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the correct initialization of the `NewMemTable`.

# Test: Search for the usage of the `try_new` function. Expect: Correct initialization with the provided schema, partitions, and rules.
rg --type rust 'NewMemTable::try_new'
Length of output: 130

Script:
#!/bin/bash
# Extract context around the `NewMemTable::try_new` call in `src/service/search/datafusion/exec.rs`
rg --type rust -A 10 -B 10 'NewMemTable::try_new' src/service/search/datafusion/exec.rs
Length of output: 701
73-103: Verify the scanning function.

Ensure that the scan function correctly handles the projection, filters, and limit, and correctly adds the diff rules to the execution plan.

Verification successful

Verify the scanning function.

The scan function in memtable.rs correctly handles the projection, filters, and limit, and appropriately adds the diff rules to the execution plan. No direct references to NewMemTable::scan were found in the broader codebase, suggesting it might be newly added or not yet integrated.

File: src/service/search/datafusion/table_provider/memtable.rs

Lines: 73-103
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify the correct implementation of the `scan` function.

# Test: Search for the usage of the `scan` function. Expect: Correct handling of the projection, filters, and limit, and correct addition of the diff rules to the execution plan.
rg --type rust 'NewMemTable::scan'
Length of output: 34

Script:
#!/bin/bash
# Description: Verify the correct implementation of the `scan` function within memtable.rs and its usage in the entire codebase.

# Step 1: Search for the implementation of the `scan` function within memtable.rs.
echo "Searching for the implementation of the 'scan' function in memtable.rs..."
rg 'async fn scan' src/service/search/datafusion/table_provider/memtable.rs

# Step 2: Search for any references to the `scan` function within the entire codebase.
echo "Searching for references to the 'scan' function in the entire codebase..."
rg 'scan' --type rust
Length of output: 22594
105-112: Check the insert function.

Ensure that the insert_into function correctly handles the insertion of data into the in-memory table.
src/service/promql/search/grpc/storage.rs (1)

142-148: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to create_context match the new signature and handle the target_partitions parameter correctly.
src/service/promql/search/grpc/wal.rs (2)
138-139: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to prepare_datafusion_context match the new signature and handle the target_partitions parameter correctly.

Verification successful

Verified: All function calls to prepare_datafusion_context match the new signature and handle the target_partitions parameter correctly.

src/service/search/datafusion/exec.rs: Multiple instances

src/service/promql/search/grpc/wal.rs: Line 138
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `prepare_datafusion_context` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'prepare_datafusion_context'
Length of output: 3639
193-193: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to register_table match the new signature and handle the hashbrown::HashMap::default() parameter correctly.
src/service/search/datafusion/table_provider/mod.rs (3)

73-73: LGTM!

The introduction of the diff_rules field enhances the functionality of the NewListingTable struct by allowing dynamic casting of fields.

Line range hint 88-107:
LGTM!

The update to the try_new constructor to accept a rules parameter and initialize the diff_rules field aligns with the changes made to the NewListingTable struct.

334-355: LGTM!

The update to the scan method to incorporate diff_rules by creating a ProjectionExec with casting rules enhances the functionality of the method.

src/service/search/grpc/storage.rs (5)

88-89: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.

113-114: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.

208-213: LGTM! Dynamic partitioning strategy.

The new logic for determining target_partitions based on cache_type enhances performance and resource management.

241-241: LGTM! Enhanced session struct.

The addition of the target_partitions field to the Session struct aligns with the new logic for dynamic partitioning.

334-340: LGTM! Simplified results collection.

The results collection mechanism has been simplified from using a HashMap to a Vec, reducing complexity and improving readability.

src/config/src/meta/search.rs (1)

40-40: LGTM! Enhanced session struct.

The addition of the target_partitions field to the Session struct enhances its functionality by allowing it to store information about the number of target partitions.

src/service/search/grpc/wal.rs (6)

87-88: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.

148-149: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.

169-170: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.

258-258: LGTM! Enhanced session struct.

The addition of the target_partitions field to the Session struct aligns with the new logic for dynamic partitioning.

346-350: LGTM! Simplified results collection.

The results collection mechanism has been simplified from using a HashMap to a Vec, reducing complexity and improving readability.

412-413: LGTM! Simplified return type.

The return type change from HashMap to Vec simplifies the function and aligns with the overall design shift.
src/service/search/datafusion/exec.rs (4)
74-79: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to sql match the new signature.

218-218: Use cfg.limit.cpu_num directly in the calculation.

Using cfg.limit.cpu_num directly in the calculation improves readability.
-    let chunk_size = std::cmp::max(1, (batches.len() + cfg.limit.cpu_num) / cfg.limit.cpu_num);
+    let chunk_size = std::cmp::max(1, (batches.len() + cfg.limit.cpu_num) / cfg.limit.cpu_num);
Likely invalid or redundant comment.

205-211: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to merge_partitions match the new signature.

Verification successful

Function usage verified successfully.

The function merge_partitions has been correctly updated and all calls to it match the new signature.

src/service/search/cluster/mod.rs
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `merge_partitions` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'merge_partitions('
Length of output: 126

Script:
#!/bin/bash
# Description: Verify all function calls to `merge_partitions` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 'merge_partitions\('
Length of output: 796
620-620: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to register_table match the new logic.
src/service/search/mod.rs (4)
741-745: LGTM!

The function generate_search_schema correctly calculates the differences between schemas and constructs a new schema.

789-794: LGTM!

The function generate_select_start_search_schema correctly manages the fields defined in the SQL query and compares them against the latest schema.

849-849: LGTM!

The function generate_used_fields_in_query correctly extracts the fields used in the SQL query.

779-779: Avoid cloning schema unnecessarily.

Cloning schema here may be unnecessary. Consider passing a reference instead.
-    if schema.field_with_name(timestamp).is_err() {
+    if schema.field_with_name(timestamp).is_err() {
Likely invalid or redundant comment.
src/config/src/meta/sql.rs (3)

Line range hint 221-240:
LGTM!

The changes to TryFrom<Source<'a>> for String simplify the conversion mechanism and are correct.

Line range hint 250-263:
LGTM!

The changes to TryFrom<Order<'a>> for (String, bool) ensure the correct representation of identifiers.

263-263: LGTM!

The changes to TryFrom<Group<'a>> for String ensure the correct representation of identifiers.

src/service/search/cluster/mod.rs (2)

683-684: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to merge_grpc_result match the new signature.

71-71: LGTM! But verify the function usage in the codebase.

The code changes are approved.

However, ensure that all function calls to search match the new signature.

src/service/search/sql.rs (1)

225-225: Simplified SQL processing logic.

The direct assignment of rewrite_time_range_sql to origin_sql simplifies the code.

src/service/search/datafusion/exec.rs

src/service/search/datafusion/plan.rs

src/service/search/cluster/http.rs

src/service/search/grpc/mod.rs

…into feat/query-plan

coderabbitai

Actionable comments posted: 7

Outside diff range, codebase verification and nitpick comments (4)

src/proto/src/datafusion/codec.rs (2)
140-160: Enhance error handling for UDF decoding and encoding.

The try_decode_udf and try_encode_udf methods currently lack logging, which makes it difficult to trace errors. Adding logging statements to capture and log errors encountered during the UDF decoding and encoding processes would improve robustness.

File: src/proto/src/datafusion/codec.rs

Lines: 140-160

Analysis chain

Ensure robust error handling for UDF decoding and encoding.

The try_decode_udf and try_encode_udf methods attempt to use multiple codecs and handle errors. Ensure all possible error scenarios are covered.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all error scenarios are handled in try_decode_udf and try_encode_udf methods.

# Test: Search for error handling in try_decode_udf and try_encode_udf methods. Expect: Comprehensive error handling.
rg --type rust 'try_decode_udf' -A 10
rg --type rust 'try_encode_udf' -A 10
Length of output: 1919

Script:
#!/bin/bash
# Description: Search for error handling or logging related to try_decode_udf and try_encode_udf methods.

# Search for any logging or error handling in the codec.rs file
rg --type rust 'log' src/proto/src/datafusion/codec.rs
rg --type rust 'error' src/proto/src/datafusion/codec.rs
Length of output: 215
163-201: Ensure comprehensive test coverage.

The test_datafusion_codec function ensures encoding and decoding work correctly for a specific scenario. To ensure comprehensive test coverage, consider adding tests for:

Different data types (e.g., Float64, Utf8).

Multiple fields in the schema.

Various configurations of the NewEmptyExec plan (e.g., different projections, filters, and limits).

Analysis chain

Ensure comprehensive test coverage.

The test for datafusion_codec ensures encoding and decoding work correctly. Ensure all possible scenarios are covered.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all test scenarios are covered in datafusion_codec test.

# Test: Search for test coverage in datafusion_codec test. Expect: Comprehensive test coverage.
rg --type rust 'test_datafusion_codec' -A 10
Length of output: 804

Script:
#!/bin/bash
# Description: Verify all test scenarios are covered in datafusion_codec test.

# Search for the complete `test_datafusion_codec` function definition.
rg --type rust 'async fn test_datafusion_codec' -A 50

# Search for other tests within the `#[cfg(test)]` module in `src/proto/src/datafusion/codec.rs`.
rg --type rust '#[tokio::test]' -A 10
Length of output: 16571
src/service/search/grpc/mod.rs (1)
Line range hint 88-140:
Simplify results merging logic.

Simplify the logic for merging results by using flat_map and filter.
let tasks = try_join_all(vec![task1, task2, task3])
    .await
    .map_err(|e| Error::ErrorCode(ErrorCodes::ServerInternalError(e.to_string())))?;
results = tasks
    .into_iter()
    .flat_map(|task| {
        let (batches, stats) = task?;
        scan_stats.add(&stats);
        Ok(batches.into_iter().flatten().filter(|v| v.num_rows() > 0))
    })
    .collect::<Result<Vec<_>, Error>>()?;
tests/ui-testing/playwright-tests/logsqueries.spec.js (1)

11-11: Remove commented-out line if not needed.

The commented-out line // await page.getByText('Login as internal user').click(); should be removed if it is no longer needed to keep the code clean.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e1ac886 and 721a03c.

Files ignored due to path filters (1)

Cargo.lock is excluded by !**/*.lock

Files selected for processing (30)

.github/workflows/playwright.yml (2 hunks)
Cargo.toml (3 hunks)
src/config/src/config.rs (1 hunks)
src/proto/Cargo.toml (1 hunks)
src/proto/build.rs (4 hunks)
src/proto/proto/cluster/plan.proto (1 hunks)
src/proto/proto/datafusion/datafusion.proto (1 hunks)
src/proto/proto/datafusion_common/datafusion_common.proto (1 hunks)
src/proto/src/datafusion/codec.rs (1 hunks)
src/proto/src/datafusion/emptyexec.rs (1 hunks)
src/proto/src/datafusion/emptytable.rs (1 hunks)
src/proto/src/datafusion/mod.rs (1 hunks)
src/proto/src/generated/mod.rs (1 hunks)
src/proto/src/generated/prometheus.rs (1 hunks)
src/proto/src/lib.rs (1 hunks)
src/service/search/cache/cacher.rs (2 hunks)
src/service/search/cluster/mod.rs (5 hunks)
src/service/search/cluster/super_cluster.rs (1 hunks)
src/service/search/datafusion/exec.rs (18 hunks)
src/service/search/datafusion/plan.rs (1 hunks)
src/service/search/datafusion/table_provider/mod.rs (6 hunks)
src/service/search/grpc/mod.rs (6 hunks)
src/service/search/mod.rs (3 hunks)
src/service/search/sql.rs (5 hunks)
tests/ui-testing/playwright-tests/alerts.spec.js (3 hunks)
tests/ui-testing/playwright-tests/dashboardtype.spec.js (1 hunks)
tests/ui-testing/playwright-tests/logsqueries.spec.js (5 hunks)
tests/ui-testing/playwright-tests/logsquickmode.spec.js (11 hunks)
tests/ui-testing/playwright-tests/schema.spec.js (7 hunks)
tests/ui-testing/playwright.config.js (1 hunks)

Files skipped from review due to trivial changes (5)

src/proto/src/datafusion/mod.rs
src/proto/src/generated/mod.rs
src/proto/src/lib.rs
tests/ui-testing/playwright-tests/alerts.spec.js
tests/ui-testing/playwright.config.js

Files skipped from review as they are similar to previous changes (5)

Cargo.toml
src/service/search/cluster/mod.rs
src/service/search/datafusion/plan.rs
src/service/search/mod.rs
src/service/search/sql.rs

Additional context used

Path-based instructions (15)

src/proto/src/datafusion/emptytable.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/proto/build.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/proto/src/datafusion/emptyexec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/proto/src/datafusion/codec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/super_cluster.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

tests/ui-testing/playwright-tests/dashboardtype.spec.js (1)

Pattern **/*.js: You are a smart javascript/typescript pull request reviewer.
You are going to review all the javascript/typescript files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

tests/ui-testing/playwright-tests/logsquickmode.spec.js (1)

Pattern **/*.js: You are a smart javascript/typescript pull request reviewer.
You are going to review all the javascript/typescript files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

tests/ui-testing/playwright-tests/schema.spec.js (1)

Pattern **/*.js: You are a smart javascript/typescript pull request reviewer.
You are going to review all the javascript/typescript files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

tests/ui-testing/playwright-tests/logsqueries.spec.js (1)

Pattern **/*.js: You are a smart javascript/typescript pull request reviewer.
You are going to review all the javascript/typescript files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/table_provider/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cache/cacher.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/proto/src/generated/prometheus.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/exec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/config/src/config.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Biome

tests/ui-testing/playwright-tests/dashboardtype.spec.js

[error] 72-72: Unexpected control character(s) in regular expression: \x00

Control characters are unusual and potentially incorrect inputs, so they are disallowed.

(lint/suspicious/noControlCharactersInRegex)

Additional comments not posted (87)

src/proto/Cargo.toml (4)

8-8: Dependency Addition: async-trait

The async-trait crate is added, which is useful for defining async functions in traits. Ensure that this addition is necessary for the current implementation.

9-9: Dependency Addition: datafusion

The datafusion crate is added, which is a data processing library. This addition suggests enhancements in data handling capabilities.

10-10: Dependency Addition: datafusion-proto

The datafusion-proto crate is added, likely to support protocol buffer definitions for DataFusion. Ensure that this is required for the new functionality.

15-15: Dependency Addition: tokio

The tokio crate is added, which is essential for asynchronous programming in Rust. This addition indicates a move towards asynchronous operations.

src/proto/proto/cluster/plan.proto (5)

1-1: Proto Syntax Declaration

The proto syntax is correctly set to proto3.

3-5: Java Code Generation Options

The options for Java code generation are set, which is useful for integrating with Java-based systems.

7-7: Package Declaration

The package is declared as cluster, which is appropriate for the context.

9-10: Proto Imports

The necessary imports from DataFusion proto files are included, which is essential for defining the schema.

12-17: Message Definition: NewEmptyExecNode

The message NewEmptyExecNode is defined with fields for schema, projection, filters, and an optional limit. This structure is appropriate for the intended use case.

src/proto/src/datafusion/emptytable.rs (6)

1-14: License Information

The file includes the GNU Affero General Public License information, which is necessary for compliance.

16-27: Imports

The necessary imports are included, covering standard libraries, async traits, and DataFusion components.

29-34: Struct Definition: NewEmptyTable

The struct NewEmptyTable is defined with fields for schema and partitions, which is appropriate for its intended use.

36-43: Constructor: new

The constructor initializes the NewEmptyTable with a schema and sets the default partitions to 1. This is a standard and clear implementation.

45-49: Method: with_partitions

The method allows setting the number of partitions, providing flexibility in the table configuration.

52-86: Trait Implementation: TableProvider

The TableProvider trait is implemented for NewEmptyTable, including methods for schema, table type, scan, and filter pushdown support. The implementation is thorough and follows best practices for async operations and trait usage.

.github/workflows/playwright.yml (6)

66-66: Verify the removal of --features mimalloc.

The --features mimalloc option has been removed from the cargo build command. Ensure that this change aligns with the project's dependency management strategy.

87-87: LGTM! Improved readability.

The reordering of test filenames enhances the readability of the matrix configuration.

90-91: LGTM! Streamlined workflow.

Cloning the repository only once improves workflow efficiency.

103-103: LGTM! Increased sleep duration for robustness.

Increasing the sleep duration from 10 seconds to 15 seconds ensures the server is fully operational before proceeding.

115-118: LGTM! Improved consistency.

The reformatting of environment variable echo commands enhances the readability and maintainability of the workflow configuration.

122-123: LGTM! Enhanced debugging capabilities.

The addition of a new step to check OpenObserve logs improves the debugging capabilities of the workflow.

src/proto/build.rs (5)

16-16: LGTM! Enabled write operations.

The import statement for std::io has been expanded to include Write, enabling file manipulation operations.

69-79: LGTM! Improved file handling for cluster.rs.

The introduction of new variables and the logic for reading and writing generated files enhance the organization and management of generated files.

133-142: LGTM! Improved file handling for prometheus.rs.

The introduction of new variables and the logic for reading and writing generated files enhance the organization and management of generated files.

22-23: LGTM! Streamlined configuration setup.

The new approach simplifies the configuration setup and includes the writing operations directly after the compilation of protocol definitions.

52-53: LGTM! Improved integration with external libraries.

The addition of .extern_path method calls enhances the integration with the datafusion_proto library and improves the accessibility of external types within the generated code.

src/proto/src/datafusion/emptyexec.rs (6)

32-41: LGTM! Well-defined struct.

The NewEmptyExec struct is well-defined and includes necessary fields for the execution plan.

43-104: LGTM! Well-implemented methods.

The methods for creating instances, handling partitions, and computing properties are well-implemented and provide necessary functionality for the execution plan.

106-144: LGTM! Clear DisplayAs implementation.

The DisplayAs implementation provides a clear and formatted string representation of the execution plan, useful for debugging and logging.

146-201: LGTM! Comprehensive ExecutionPlan implementation.

The ExecutionPlan implementation includes methods for execution, statistics, and handling children, providing a comprehensive execution plan.

176-181: LGTM! Appropriate error handling.

The execute method includes appropriate error handling for invalid partitions, ensuring robustness in the execution plan.

71-73: LGTM! Simple and appropriate data method.

The data method returns an empty vector of RecordBatch, which is appropriate for the NewEmptyExec execution plan.
src/proto/src/datafusion/codec.rs (4)
112-137: Ensure robust error handling for decoding with multiple codecs.

The try_decode method attempts to use multiple codecs and handles errors. Ensure all possible error scenarios are covered.

42-78: Ensure robust error handling for decoding.

The try_decode method decodes a NewEmptyExecNode and handles errors. Ensure all possible error scenarios are covered.

129-138: Ensure robust error handling for encoding with multiple codecs.

The try_encode method attempts to use multiple codecs and handles errors. Ensure all possible error scenarios are covered.

Verification successful

The try_encode method does handle errors by capturing the last error encountered and returning it if all codecs fail. However, to ensure robust error handling, we should verify if there are any edge cases or specific error scenarios that are not covered. This includes understanding the errors generated by codec.try_encode.

Let's gather more information about the try_encode method implementations in the codecs to see if there are any specific error types or conditions that might be missed.

Let's try a different approach to locate the try_encode method implementations in the codecs.

The previous script had a regex parse error. Let's correct the regex and try again to locate the try_encode method implementations in the codecs.

Let's search for the try_encode function definitions directly and include more context lines to capture the full implementation.

Error handling in try_encode method is robust.

The try_encode method in ComposedPhysicalExtensionCodec handles errors by capturing the last encountered error and returning it if all codecs fail. This ensures that the method does not fail silently and provides useful error information for debugging.

The method iterates through the codecs and captures errors.

No additional error handling mechanisms are necessary.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all error scenarios are handled in try_encode method with multiple codecs.

# Test: Search for error handling in try_encode method. Expect: Comprehensive error handling.
rg --type rust 'try_encode' -A 10
Length of output: 2889

Script:
#!/bin/bash
# Find all implementations of the `try_encode` method in the codecs to understand the errors they might generate.
ast-grep --lang rust --pattern 'impl $_ for $_ { fn try_encode(&self, $_, $_) -> Result<$$_> { $$$ } $$$ }'
Length of output: 109

Script:
#!/bin/bash
# Search for `impl` blocks containing `try_encode` methods to understand the errors they might generate.
rg 'impl .* {.*fn try_encode' -A 20
Length of output: 153

Script:
#!/bin/bash
# Corrected search for `impl` blocks containing `try_encode` methods to understand the errors they might generate.
rg 'impl .* \{.*fn try_encode' -A 20
Length of output: 38

Script:
#!/bin/bash
# Search for `try_encode` function definitions directly and include more context lines to capture the full implementation.
rg 'fn try_encode' -A 50
Length of output: 8362
80-101: Ensure robust error handling for encoding.

The try_encode method encodes a NewEmptyExecNode and handles errors. Ensure all possible error scenarios are covered.
src/service/search/cluster/super_cluster.rs (1)

72-77: Simplified logic for handling merge_batches.

The changes improve efficiency by directly checking if merge_batches is empty and using it for subsequent operations. Ensure correctness and consider edge cases.

tests/ui-testing/playwright-tests/dashboardtype.spec.js (3)

14-28: Consistent formatting for login function.

The login function is now consistently formatted. Ensure it performs as expected.

30-55: New function ingestion for data ingestion.

The ingestion function handles data ingestion via a POST request. Ensure it performs as expected and handles errors appropriately.

103-193: Streamlined test case for dashboard functionality.

The test case for creating and comparing area type chart images has been streamlined. Ensure it performs as expected.

src/service/search/grpc/mod.rs (2)

16-35: Imports look good.

The import statements are necessary and correctly used in the code.

40-40: Updated SearchResult type looks good.

The new vector-based structure simplifies the management of results and potentially enhances performance.

tests/ui-testing/playwright-tests/logsquickmode.spec.js (5)

9-21: Login function looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality.

25-51: Ingestion function looks good.

The ingestion function is correctly implemented and improves the test setup process by handling log ingestion via a POST request.

53-57: Function selectStreamAndStreamTypeForLogs looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality.

Line range hint 82-99:
Updated test.beforeEach function looks good.

The inclusion of the ingestion function call and adjusted wait times improve the test setup process.

Line range hint 108-311:
Updated test cases look good.

The changes align with the new focus on testing specific Kubernetes log fields and do not affect the functionality of the tests.

tests/ui-testing/playwright-tests/schema.spec.js (5)

10-21: Login function looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality.

25-50: Ingestion function looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality.

53-57: Function selectStreamAndStreamTypeForLogs looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality.

80-91: Updated test.beforeEach function looks good.

The inclusion of the ingestion function call and adjusted wait times improve the test setup process.

Line range hint 93-215:
Test case for updating stream schema settings looks good.

The minor adjustments in spacing and comment formatting do not affect the functionality of the test case.

tests/ui-testing/playwright-tests/logsqueries.spec.js (4)

24-50: Ensure secure handling of sensitive information.

The function ingestion handles sensitive information such as credentials. Ensure that these are handled securely and not logged or exposed inadvertently.

52-56: LGTM!

The formatting changes are acceptable.

80-82: LGTM!

The inclusion of the ingestion function in the test.beforeEach hook is appropriate.

99-123: LGTM!

The formatting changes are acceptable.

src/proto/proto/datafusion_common/datafusion_common.proto (10)

23-25: LGTM!

The message ColumnRelation is correctly defined.

27-30: LGTM!

The message Column is correctly defined.

32-35: LGTM!

The message DfField is correctly defined.

37-40: LGTM!

The message DfSchema is correctly defined.

42-44: LGTM!

The message CsvFormat is correctly defined.

46-50: LGTM!

The message ParquetFormat is correctly defined.

52-52: LGTM!

The message AvroFormat is correctly defined.

54-56: LGTM!

The message PrimaryKeyConstraint is correctly defined.

58-60: LGTM!

The message UniqueConstraint is correctly defined.

62-67: LGTM!

The message Constraint is correctly defined.

src/service/search/datafusion/table_provider/mod.rs (3)

73-73: LGTM!

The addition of the diff_rules field is appropriate for storing casting rules.

Line range hint 88-107: LGTM!

The changes to the try_new method are appropriate and ensure that the diff_rules field is properly initialized.

Line range hint 318-350: LGTM!

The changes to the scan method are appropriate and ensure that the diff_rules are applied correctly.

src/service/search/cache/cacher.rs (2)

65-65: Verify the correctness of the broadened criteria for returning default cached query responses.

Ensure that the logic correctly handles cases where the result timestamp column is None and either the query is an aggregate or the group_by clause is not empty.

82-86: Verify the refined conditions for the selection hack related to _timestamp.

Ensure that the logic correctly applies the selection hack for non-aggregate queries with empty group_by and order_by clauses, and the absence of a wildcard (*) in the original SQL.

src/proto/src/generated/prometheus.rs (1)

1-548: LGTM! Auto-generated code by prost-build.

The auto-generated code adheres to Rust conventions and is free of any obvious issues.

src/service/search/datafusion/exec.rs (4)

93-93: Avoid cloning rules unnecessarily.

Cloning rules here may be unnecessary. Consider passing a reference instead.

110-114: Avoid cloning schema and rules unnecessarily.

Cloning schema and rules here may be unnecessary. Consider passing references instead.

249-249: Avoid cloning schema unnecessarily.

Cloning schema here may be unnecessary. Consider passing a reference instead.

731-731: Avoid cloning rules unnecessarily.

Cloning rules here may be unnecessary. Consider passing a reference instead.

src/proto/proto/datafusion/datafusion.proto (10)

1-26: LGTM!

The file header, syntax, package, and options are correct.

27-63: LGTM!

The import statements and LogicalPlanNode message are correct.

65-95: LGTM!

The LogicalExtensionNode, ProjectionColumns, LogicalExprNodeCollection, and ListingTableScanNode messages are well-defined.

97-127: LGTM!

The ViewTableScanNode, CustomTableScanNode, ProjectionNode, and SelectionNode messages are well-defined.

129-151: LGTM!

The SortNode, RepartitionNode, HashRepartition, and EmptyRelationNode messages are well-defined.

153-185: LGTM!

The CreateExternalTableNode, PrepareNode, CreateCatalogSchemaNode, and CreateCatalogNode messages are well-defined.

187-211: LGTM!

The DropViewNode, CreateViewNode, ValuesNode, and AnalyzeNode messages are well-defined.

213-238: LGTM!

The ExplainNode, AggregateNode, WindowNode, and JoinNode messages are well-defined.

240-266: LGTM!

The DistinctNode, DistinctOnNode, CopyToNode, and UnnestNode messages are well-defined.

268-1237: LGTM!

The remaining messages and enums for logical and physical plans are well-defined.
src/config/src/config.rs (1)
213-216: Improved error handling.

The match statement in get_instance_id improves error handling by avoiding potential panics when the "instance_id" key is missing.

However, ensure that all function calls to get_instance_id are compatible with the new behavior.

Verification successful

Improved error handling confirmed.

The get_instance_id function's new behavior of returning an empty string when the "instance_id" key is missing is compatible with its usage in the codebase.

src/common/meta/telemetry.rs: Used for telemetry purposes.

src/common/infra/cluster/mod.rs: Used for configuration.

src/handler/http/request/status/mod.rs: Used for status reporting.

These contexts handle the empty string return value appropriately.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `get_instance_id` to ensure compatibility with the new behavior.

# Test: Search for the function usage. Expect: Only occurrences of the new behavior.
rg --type rust -A 5 $'get_instance_id'
Length of output: 3150

Script:
#!/bin/bash
# Extract relevant code segments from identified files to review how `get_instance_id` is used.

# Extract usage in src/common/meta/telemetry.rs
rg --type rust -A 10 'get_instance_id' src/common/meta/telemetry.rs

# Extract usage in src/common/infra/cluster/mod.rs
rg --type rust -A 10 'get_instance_id' src/common/infra/cluster/mod.rs

# Extract usage in src/handler/http/request/status/mod.rs
rg --type rust -A 10 'get_instance_id' src/handler/http/request/status/mod.rs
Length of output: 2755

src/service/search/grpc/mod.rs

src/service/search/datafusion/exec.rs

coderabbitai

Actionable comments posted: 0

Outside diff range, codebase verification and nitpick comments (1)

src/service/search/cache/mod.rs (1)
Update function calls to match the new search signature.

The search function signature has been changed, but several instances in the codebase still use the old signature with the &parsed_sql parameter. Ensure all calls to the search function are updated accordingly.

src/handler/http/request/traces/mod.rs

src/handler/http/request/search/multi_streams.rs

src/handler/http/request/search/mod.rs

src/handler/http/request/prom/mod.rs

src/handler/grpc/request/search/mod.rs

src/service/usage/stats.rs

src/service/metrics/prom.rs

src/service/search/cache/mod.rs

src/service/search/cluster/mod.rs

src/service/search/cluster/http.rs

src/service/alerts/mod.rs

src/service/db/enrichment_table.rs

Analysis chain

Line range hint 15-15:
Verify the function usage in the codebase.

The &parsed_sql parameter has been removed from the function signature. Ensure that all function calls to search are updated accordingly.
Scripts executed

The following scripts were executed for the analysis:

Script:
#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'search('
Length of output: 96

Script:
#!/bin/bash
# Description: Verify all function calls to `search` match the new signature.

# Test: Search for the function usage. Expect: Only occurrences of the new signature.
rg --type rust -A 5 $'search\\('
Length of output: 24084

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 721a03c and 4a92663.

Files selected for processing (4)

.github/workflows/playwright.yml (2 hunks)
src/service/search/cache/cacher.rs (4 hunks)
src/service/search/cache/mod.rs (1 hunks)
src/service/search/sql.rs (5 hunks)

Files skipped from review as they are similar to previous changes (2)

.github/workflows/playwright.yml
src/service/search/cache/cacher.rs

Additional context used

Path-based instructions (2)

src/service/search/cache/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/sql.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (3)

src/service/search/cache/mod.rs (1)

Line range hint 15-15:
LGTM!

The code changes are approved.

src/service/search/sql.rs (2)

137-141: Simplified subquery handling.

The handling of subqueries has been simplified, which improves readability and maintainability.

230-230: Refined SQL transformation.

The management of the rewrite_time_range_sql variable has been refined, promoting a more consistent methodology for SQL transformations.

Now we use sql to plan to execute the query. we split the PhysicalPlan to two stage: 1. Partial plan, it can be distributed will run on all the nodes. 2. Final plan, it can't be distributed will run on the leader querier and merge the result.  ## Summary by CodeRabbit - **New Features** - Updated to version 0.10.9, introducing enhancements and new dependencies for improved data processing functionalities. - Added new Protocol Buffers schemas to define structures for cluster management and data processing frameworks, enhancing integration capabilities. - **Bug Fixes** - Improved error handling in configuration to prevent runtime panics. - Enhanced logic in caching mechanisms for more reliable query responses. - **Documentation** - Added module declarations to improve organization and clarity within the codebase. - **Refactor** - Streamlined execution plans and SQL processing for better performance and maintainability. - **Tests** - Introduced new functions for log ingestion in test suites to enhance reliability and coverage. - **Style** - Made formatting adjustments across various files to improve readability and consistency.  --------- Co-authored-by: Neha P <[email protected]> Co-authored-by: oasisk <[email protected]>

hengfeiyang added 7 commits July 31, 2024 15:20

fix: optimize datafusion target partition

e7014ad

ci: update version

f1b0362

chore: format proto

a8ca0f8

feat: rewrite search engine

040fb4b

feat: optimize sql to plan execution

420003b

feat: remove subquery rewrite

070f1d1

Merge branch 'main' into feat/query-plan

e1ac886

github-actions bot added the ✏️ Feature label Aug 1, 2024

coderabbitai bot reviewed Aug 1, 2024

View reviewed changes

hengfeiyang added 2 commits August 1, 2024 17:37

fix: enterprise version

bffa043

chore: remove some logs

d445325

hengfeiyang marked this pull request as draft August 1, 2024 09:57

hengfeiyang and others added 4 commits August 1, 2024 18:14

Merge branch 'main' into feat/query-plan

f915056

chore: remove some logs

2005857

fix timestamp test

81ecf74

fix: panic when there is only one parquet file

2cdc972

hengfeiyang force-pushed the feat/query-plan branch from 81ecf74 to 2cdc972 Compare August 1, 2024 13:35

hengfeiyang and others added 11 commits August 1, 2024 21:35

Merge branch 'main' into feat/query-plan

d2e22c5

ci: update test building

401bcd4

ci: fix testing

1b0b2c2

fix: generate partial plan

900979d

Merge branch 'feat/query-plan' of github.com:openobserve/openobserve …

a176e2a

…into feat/query-plan

fix timestamp testcase

0535d18

fix test for timestampt

34aee5c

skip 3 tests

ead69ab

fix: memory and parquet plan mismatch

11485fe

fix: memory and parquet plan mismatch

16e6f9b

Merge branch 'main' into feat/query-plan

18e9ba8

neha00290 and others added 5 commits August 5, 2024 09:08

remove skip for logsquickmode

84ba1a7

Merge branch 'main' into feat/query-plan

95dfb78

Merge branch 'main' into feat/query-plan

29aa2a9

fix: ui testing for logs quick mode

c884b84

fix: we shouldn't add _timestamp to group by query

721a03c

oasisk approved these changes Aug 5, 2024

View reviewed changes

hengfeiyang self-assigned this Aug 5, 2024

hengfeiyang marked this pull request as ready for review August 5, 2024 08:52

coderabbitai bot reviewed Aug 5, 2024

View reviewed changes

hengfeiyang and others added 3 commits August 5, 2024 17:31

fix: add _timestamp by mistabke

00083fa

removed parsed sql

91fe8db

Merge branch 'main' into feat/query-plan

4a92663

coderabbitai bot reviewed Aug 5, 2024

View reviewed changes

fix

6a9849e

hengfeiyang merged commit df2488f into main Aug 5, 2024

hengfeiyang deleted the feat/query-plan branch August 5, 2024 10:38

This was referenced Sep 23, 2024

feat: check indices exists before running create index commands #4618

Merged

fix: add file list id cache #4643

Merged

This was referenced Oct 1, 2024

feat: add metrics for db realated requests #4676

Merged

perf: query file_list in parallels by day #4669

Merged

feat: short url service #4697

Merged

feat: add prefix based paritioning for index #4709

Merged

coderabbitai bot mentioned this pull request Oct 11, 2024

feat: Pipeline integration v2 #4732

Merged

coderabbitai bot mentioned this pull request Oct 30, 2024

feat: add jobs for compact old data #4952

Merged

5 tasks

This was referenced Nov 12, 2024

fix: parquet and index should have same row #5052

Merged

feat: we can remove filter in sql when enable inverted index #5065

Merged

This was referenced Nov 21, 2024

fix: join & subquery related UI changes on logs page #5055

Merged

fix: add filter back in ingester #5111

Merged

fix: add index_condition when do not have index file #5137

Merged

feat: optimize sql to plan execution #4100

feat: optimize sql to plan execution #4100

Uh oh!

Conversation

hengfeiyang commented Aug 1, 2024 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Rate limit exceeded

Walkthrough

Changes

Sequence Diagram(s)

Chat

CodeRabbit Commands (invoked as PR comments)

CodeRabbit Configuration File (.coderabbit.yaml)

Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

hengfeiyang commented Aug 1, 2024 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 1, 2024 •

edited

Loading

CodeRabbit Configuration File (`.coderabbit.yaml`)