Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Aug 14, 2024

merge plan panic

The reason is in some case the plan is difference. if there is only one partition and that is a simple query then there is no sort and simple add a new Projection plan, and it is difference with other nodes which have multiple partitions, then the merge got RecordBatch mismatch.

The solution have two:

  • Format all the RecordBatch before merge using the Schema from partial plan.
    • I used it in this PR
  • Use EmptyTable to generate plan then we can guarantee the plan is same, then replace the EmptyTable to MemoryTable or ParquetTable in different node based on the data.
    • We will use this in the next big release

By the way:

Also updated a logic of derived schema

  • If the evolution has no record will not fire the ingestion grpc request or will got error and retry, i also update the log to print the error information.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2024

Walkthrough

The recent changes enhance error handling, response validation, and schema management across multiple components of the system. Key updates include improved logging for response statuses, restructuring ingestion logic to avoid unnecessary processing, and transitioning to Arc for better memory management in concurrent contexts. These modifications promote clarity, robustness, and maintainability, ensuring better performance and error insight throughout the application's ingestion and search functionalities.

Changes

Files Change Summary
src/handler/grpc/request/ingest.rs Improved error handling and logging in response validation using match for clarity. Introduced StatusCode for better status checking.
src/service/alerts/scheduler.rs Restructured control flow in handle_derived_stream_triggers to avoid ingestion on empty local_val, enhancing logic clarity and efficiency.
src/service/enrichment_table/mod.rs Altered response for empty records from an error to a successful OK, changing the function's previous behavior related to record ingestion.
src/service/search/cluster/mod.rs Simplified merge_grpc_result by removing schema conversion logic; introduced dedicated functions for schema generation, improving maintainability.
src/service/search/datafusion/exec.rs Added format_recordbatch_by_schema to handle schema mismatches; restructured batch processing flow for improved robustness.
src/service/search/grpc/mod.rs Removed batch schema formatting logic in search, potentially leading to issues with differing schemas.
src/service/search/grpc/storage.rs Wrapped schema in Arc for better memory management, allowing safe concurrent access in asynchronous contexts.
src/service/search/grpc/wal.rs Updated search_parquet and search_memtable functions to use Arc for shared schema ownership, enhancing thread safety.
src/service/search/mod.rs Changed schema parameter type in generate_search_schema and generate_select_start_search_schema from reference to Arc, improving memory management in concurrent scenarios.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant Server
    participant Database

    Client->>Server: Send Ingestion Request
    Server->>Database: Save Enrichment Data
    alt Success
        Database-->>Server: OK Response
        Server-->>Client: Success Response
    else Failure
        Database-->>Server: Error Response
        Server-->>Client: Error Response
    end
Loading
sequenceDiagram
    participant Scheduler
    participant Ingestion Service

    Scheduler->>Ingestion Service: Check local_val
    alt Empty local_val
        Scheduler-->>Scheduler: Log "No records to ingest"
    else Non-empty local_val
        Scheduler->>Ingestion Service: Ingest Data
        Ingestion Service-->>Scheduler: Ingestion Result
    end
Loading

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai generate interesting stats about this repository and render them as a table.
    • @coderabbitai show all the console.log statements in this repository.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (invoked as PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Additionally, you can add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Aug 14, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Outside diff range, codebase verification and nitpick comments (2)
src/service/alerts/scheduler.rs (1)

577-631: Ensure robust error handling during ingestion.

The ingestion process is well-structured with appropriate logging and error handling. Ensure that the error messages provide sufficient context for debugging.

src/service/search/datafusion/exec.rs (1)

258-289: Handle potential errors in try_merge operation.

In the merge_partitions function, the use of Schema::try_merge is followed by an unwrap() call, which can lead to runtime panics if schema conflicts occur. It is advisable to handle these potential errors gracefully to avoid unexpected crashes.

  • File: src/service/search/datafusion/exec.rs
  • Line: 289

Consider implementing error handling to manage schema conflicts more robustly.

Analysis chain

Ensure schema consistency before formatting.

The logic for checking schema consistency and determining the need for formatting is well-structured. However, ensure that the try_merge operation handles all potential schema conflicts gracefully.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that the `try_merge` operation in `merge_partitions` handles schema conflicts gracefully.

# Test: Search for `try_merge` usage. Expect: Proper error handling after `try_merge`.
rg --type rust --context 5 'Schema::try_merge'

Length of output: 3096

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between e8dd2e8 and bdd9a05.

Files selected for processing (9)
  • src/handler/grpc/request/ingest.rs (2 hunks)
  • src/service/alerts/scheduler.rs (1 hunks)
  • src/service/enrichment_table/mod.rs (2 hunks)
  • src/service/search/cluster/mod.rs (3 hunks)
  • src/service/search/datafusion/exec.rs (4 hunks)
  • src/service/search/grpc/mod.rs (2 hunks)
  • src/service/search/grpc/storage.rs (2 hunks)
  • src/service/search/grpc/wal.rs (3 hunks)
  • src/service/search/mod.rs (3 hunks)
Additional context used
Path-based instructions (9)
src/handler/grpc/request/ingest.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/enrichment_table/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/storage.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/grpc/wal.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/alerts/scheduler.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/datafusion/exec.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/search/cluster/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (13)
src/handler/grpc/request/ingest.rs (2)

16-16: Good use of StatusCode for HTTP response handling.

The addition of StatusCode improves the clarity and maintainability of the code by making HTTP response handling more explicit.


68-96: Enhanced error logging is beneficial.

The detailed error logging with status codes and response bodies provides better insights during debugging. This is a positive change that enhances the robustness of the error handling mechanism.

src/service/enrichment_table/mod.rs (1)

175-178: Verify the new behavior for empty records.

The change from a BadRequest to an Ok response for empty records suggests a shift in logic. Ensure that this new behavior aligns with the overall system requirements and does not introduce unintended side effects.

src/service/search/grpc/storage.rs (2)

231-231: Good use of Arc for concurrent access.

The wrapping of schema in an Arc ensures safe shared ownership across threads, which is crucial for concurrent environments. This change is well-implemented.


249-254: Ensure consistent use of Arc when passing schema.

The schema is cloned and passed to functions, which is the correct approach when using Arc. Ensure that this pattern is consistently applied wherever schema is used in similar contexts.

Also applies to: 266-271

Verification successful

Consistent use of Arc when passing schema verified.

The schema is consistently passed using schema.clone() in function calls to generate_search_schema and generate_select_start_search_schema, which aligns with the expected usage of Arc. No issues found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify consistent use of `Arc` when passing `schema`.

# Test: Search for `generate_select_start_search_schema` and `generate_search_schema` calls.
# Expect: `schema` should be passed as `Arc::clone(&schema)`.

rg --type rust 'generate_(select_start_search_schema|search_schema)\s*\(' -A 5

Length of output: 4110

src/service/search/grpc/wal.rs (3)

248-248: Good use of Arc for concurrent access in search_parquet.

The wrapping of schema in an Arc ensures safe shared ownership across threads, which is crucial for concurrent environments. This change is well-implemented.


266-271: Ensure consistent use of Arc when passing schema in search_parquet.

The schema is cloned and passed to functions, which is the correct approach when using Arc. Ensure that this pattern is consistently applied wherever schema is used in similar contexts.


473-478: Good use of Arc for concurrent access in search_memtable.

The wrapping of schema in an Arc ensures safe shared ownership across threads, which is crucial for concurrent environments. This change is well-implemented.

src/service/alerts/scheduler.rs (1)

568-576: Efficient handling of empty local_val.

The added check for an empty local_val before attempting ingestion enhances efficiency by avoiding unnecessary operations. The logging and status update are appropriately handled.

src/service/search/datafusion/exec.rs (1)

304-316: Repartitioning logic is clear and efficient.

The repartitioning logic is straightforward and efficiently manages batch distribution based on chunk size.

src/service/search/mod.rs (2)

743-744: Adopt Arc<Schema> for better memory management.

The change to use Arc<Schema> enhances memory management by allowing shared ownership, which is beneficial in concurrent contexts.


791-792: Utilize Arc<Schema> for improved concurrency.

The change to use Arc<Schema> is appropriate and improves efficiency in concurrent scenarios by enabling shared ownership of the schema.

src/service/search/cluster/mod.rs (1)

713-731: Refactor schema generation for modularity.

The refactoring to use generate_select_start_search_schema and generate_search_schema improves modularity and maintainability by encapsulating schema logic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants