Skip to content

Conversation

@hengfeiyang
Copy link
Contributor

@hengfeiyang hengfeiyang commented Sep 24, 2024

Issue

Earlier if a partition key doesn't contains in one row, we will skip generate partition key for this record. then we need always scan it because these data have no partition key.

For example:

  • 70 % event with tenant_id 1
  • 10 % event with tenant_id 2
  • 20 % event no tenant_id

if you search for tenant_id=2 we also need to scan the 20% data which there is no tenant_id field.

Solution

We can always generate partition key if there is no the partition field, we can set the value to null.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2024

Walkthrough

The changes primarily involve renaming the function get_wal_time_key to get_write_partition_key across multiple files. This modification affects how keys are generated based on timestamps and partition keys, while preserving the overall structure and logic of the affected functions. The updates include adjustments to both the function signatures and related test cases to ensure consistency with the new naming convention.

Changes

Files Change Summary
src/job/files/parquet.rs, src/service/enrichment_table/mod.rs Replaced get_wal_time_key with get_write_partition_key in function calls, maintaining existing parameters.
src/service/ingestion/mod.rs Renamed get_wal_time_key to get_write_partition_key, streamlined internal logic, and updated related tests.
src/service/logs/mod.rs Changed function call from get_wal_time_key to get_write_partition_key within write_logs.
src/service/metadata/distinct_values.rs, src/service/metadata/trace_list_index.rs Updated function calls from get_wal_time_key to get_write_partition_key, maintaining overall structure.
src/service/metrics/json.rs, src/service/metrics/otlp_grpc.rs, src/service/metrics/otlp_http.rs, src/service/metrics/prom.rs Replaced get_wal_time_key with get_write_partition_key in various functions, preserving parameter structure.
src/service/traces/mod.rs Modified function call from super::ingestion::get_wal_time_key to super::ingestion::get_write_partition_key.

Suggested labels

☢️ Bug

Suggested reviewers

  • haohuaijin
  • oasisk
  • taimingl

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

Share
Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    -- I pushed a fix in commit <commit_id>, please review it.
    -- Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    -- @coderabbitai generate unit testing code for this file.
    -- @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    -- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    -- @coderabbitai read src/utils.ts and generate unit testing code.
    -- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    -- @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@github-actions github-actions bot added the ☢️ Bug Something isn't working label Sep 24, 2024
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

Outside diff range and nitpick comments (5)
src/service/metadata/trace_list_index.rs (2)

Line range hint 101-106: Approve the function renaming with a minor suggestion.

The renaming of get_wal_time_key to get_write_partition_key is appropriate and aligns with the PR objective. It more clearly indicates the function's purpose of generating a write partition key.

Consider renaming the hour_key variable to partition_key for consistency with the new function name:

-let hour_key = ingestion::get_write_partition_key(
+let partition_key = ingestion::get_write_partition_key(
     timestamp,
     PARTITION_KEYS.to_vec().as_ref(),
     unwrap_partition_time_level(None, StreamType::Metadata),
     data,
     Some(&schema_key),
 );

Line range hint 266-272: Approve the function renaming in the test module with a minor suggestion.

The renaming of get_wal_time_key to get_write_partition_key in the test module is consistent with the changes in the main implementation. This change is necessary and appropriate.

Similar to the previous suggestion, consider renaming the hour_key variable to partition_key for consistency with the new function name:

-let hour_key = ingestion::get_write_partition_key(
+let partition_key = ingestion::get_write_partition_key(
     timestamp,
     &vec![],
     unwrap_partition_time_level(None, StreamType::Metadata),
     data,
     Some(schema_key),
 );
src/service/logs/mod.rs (1)

Line range hint 407-413: LGTM! Consider updating the variable name for consistency.

The function renaming from get_wal_time_key to get_write_partition_key improves clarity and better describes its purpose. This change is consistent and doesn't introduce any logical errors.

Consider renaming the hour_key variable to partition_key for consistency with the new function name:

-let hour_key = get_write_partition_key(
+let partition_key = get_write_partition_key(
     timestamp,
     &partition_keys,
     partition_time_level,
     &record_val,
     Some(&schema_key),
 );
src/service/ingestion/mod.rs (1)

Line range hint 313-345: LGTM! Consider adding documentation for the new suffix parameter.

The changes to the get_write_partition_key function look good. The addition of the suffix parameter provides more flexibility in partition key generation, and the simplification of local_val handling improves code readability.

Consider adding a brief comment explaining the purpose and expected format of the suffix parameter to improve code documentation.

src/job/files/parquet.rs (1)

Line range hint 1-1180: General observation on file structure

While not directly related to the current change, it's worth noting that this file contains several large, complex functions. In future refactoring efforts, consider breaking down some of these functions into smaller, more focused units to improve readability and maintainability.

Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

Commits

Files that changed from the base of the PR and between 1e10120 and 77947f2.

Files selected for processing (11)
  • src/job/files/parquet.rs (1 hunks)
  • src/service/enrichment_table/mod.rs (1 hunks)
  • src/service/ingestion/mod.rs (5 hunks)
  • src/service/logs/mod.rs (2 hunks)
  • src/service/metadata/distinct_values.rs (1 hunks)
  • src/service/metadata/trace_list_index.rs (2 hunks)
  • src/service/metrics/json.rs (2 hunks)
  • src/service/metrics/otlp_grpc.rs (1 hunks)
  • src/service/metrics/otlp_http.rs (1 hunks)
  • src/service/metrics/prom.rs (1 hunks)
  • src/service/traces/mod.rs (1 hunks)
Additional context used
Path-based instructions (11)
src/job/files/parquet.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/enrichment_table/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/ingestion/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/logs/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metadata/distinct_values.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metadata/trace_list_index.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metrics/json.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metrics/otlp_grpc.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metrics/otlp_http.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/metrics/prom.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

src/service/traces/mod.rs (1)

Pattern **/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestions

Make sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.

Additional comments not posted (15)
src/service/metadata/distinct_values.rs (1)

Line range hint 244-249: Function name change looks good, but verify consistency.

The change from get_wal_time_key to get_write_partition_key aligns with the PR objectives. The function parameters remain the same, which is good for maintaining consistency.

To ensure this change is consistent across the codebase, please run the following verification script:

This will help ensure that the function name change has been applied consistently throughout the project.

Verification successful

Function name change verified consistently across the codebase.

No remaining instances of get_wal_time_key were found, and all usages of get_write_partition_key are consistent with the change.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of `get_wal_time_key` and verify all usages of `get_write_partition_key`

# Search for any remaining instances of `get_wal_time_key`
echo "Searching for remaining instances of 'get_wal_time_key':"
rg --type rust 'get_wal_time_key'

echo "\nVerifying usage of 'get_write_partition_key':"
rg --type rust 'get_write_partition_key'

Length of output: 2117

src/service/metadata/trace_list_index.rs (1)

Line range hint 1-307: Summary of changes and suggestions

The changes in this file consistently rename get_wal_time_key to get_write_partition_key, which aligns with the PR objective of fixing partition key generation. The modifications improve code clarity by using more descriptive function names.

Two minor suggestions were made to rename the hour_key variable to partition_key for consistency with the new function name in both the main implementation and the test module.

Overall, the changes look good and achieve the intended purpose.

src/service/enrichment_table/mod.rs (1)

160-160: LGTM! Verify consistent renaming across the codebase.

The change from get_wal_time_key to get_write_partition_key looks good. It appears to be a refactoring to improve function naming clarity.

To ensure consistency, let's verify that this renaming has been applied uniformly across the codebase:

Verification successful

Consistent renaming verified

The function get_write_partition_key has been consistently renamed across the codebase with no remaining instances of get_wal_time_key.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining instances of `get_wal_time_key` and confirm all usages of `get_write_partition_key`.

echo "Checking for any remaining instances of 'get_wal_time_key':"
rg --type rust 'get_wal_time_key'

echo "\nConfirming usage of 'get_write_partition_key':"
rg --type rust 'get_write_partition_key'

Length of output: 2125

src/service/metrics/json.rs (1)

Line range hint 261-267: LGTM: Function name updated and parameters added

The change from get_wal_time_key to get_write_partition_key is consistent with the PR objective. The addition of record and Some(&schema_key) as parameters suggests an improvement in partition key generation.

To ensure this change doesn't negatively impact data partitioning and retrieval, please run the following verification:

Verification successful

Verified: Function replacement is complete and correctly implemented.

All occurrences of get_wal_time_key have been successfully replaced with get_write_partition_key, and the new function is properly defined and utilized within the codebase.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any other occurrences of get_wal_time_key and verify get_write_partition_key usage

# Search for any remaining occurrences of get_wal_time_key
echo "Searching for remaining occurrences of get_wal_time_key:"
rg "get_wal_time_key" --type rust

# Check the implementation of get_write_partition_key
echo "Checking the implementation of get_write_partition_key:"
rg "fn get_write_partition_key" --type rust -A 10

Length of output: 1040

src/service/ingestion/mod.rs (3)

Line range hint 672-687: LGTM! Test case updated correctly.

The test function has been appropriately renamed and updated to reflect the changes in the get_write_partition_key function. The test coverage adequately verifies the functionality of the updated function.


Line range hint 692-704: LGTM! Test case for no partition keys updated correctly.

The test function has been appropriately renamed and updated to reflect the changes in the get_write_partition_key function. The test case adequately covers the scenario where no partition keys are provided.


Line range hint 708-717: LGTM! Test case for no partition keys and no local values updated correctly.

The test function has been appropriately renamed and updated to reflect the changes in the get_write_partition_key function. The test case adequately covers the scenario where no partition keys and no local values are provided.

src/service/traces/mod.rs (1)

Line range hint 633-638: LGTM. Verify consistent renaming across the codebase.

The change from get_wal_time_key to get_write_partition_key looks good and aligns with the PR objectives. This renaming suggests a shift in the function's purpose or implementation while maintaining the same interface.

To ensure consistency across the codebase, please run the following verification script:

This script will help ensure that the renaming has been applied consistently throughout the Rust codebase.

Verification successful

Consistent renaming verified.

The function get_wal_time_key has been successfully renamed to get_write_partition_key throughout the codebase, with no remaining instances of the old name.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that all occurrences of 'get_wal_time_key' have been renamed to 'get_write_partition_key'

# Test 1: Search for any remaining occurrences of 'get_wal_time_key'
echo "Searching for remaining occurrences of 'get_wal_time_key':"
rg --type rust 'get_wal_time_key'

# Test 2: Confirm the usage of 'get_write_partition_key'
echo "Confirming usage of 'get_write_partition_key':"
rg --type rust 'get_write_partition_key'

Length of output: 2119

src/service/metrics/otlp_grpc.rs (2)

Line range hint 1-1037: Ensure comprehensive testing of the metrics processing pipeline.

While the change is minimal, it's part of a complex metrics processing system. To maintain reliability:

  1. Verify that all unit tests related to metric processing have been updated and pass.
  2. Consider adding integration tests that cover the entire metrics ingestion and processing pipeline.
  3. Perform thorough manual testing to ensure no regressions in metric collection and storage.

Run the following script to check for relevant test files:

#!/bin/bash
# Description: Find test files related to metrics processing

echo "Listing test files related to metrics processing:"
rg --type rust -l "use.*metrics" "tests/"

346-346: LGTM! Verify consistent renaming across the codebase.

The change from get_wal_time_key to get_write_partition_key improves code clarity. The new name better describes the function's purpose.

To ensure this renaming has been consistently applied throughout the codebase, run the following script:

Verification successful

Script Error Detected. Running Corrected Verification Script.

Please execute the following corrected script to verify the function renaming:


Consistent Renaming Verified

All instances of get_wal_time_key have been successfully renamed to get_write_partition_key throughout the codebase. The refactoring enhances code clarity and maintains consistency.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining occurrences of 'get_wal_time_key' and list all occurrences of 'get_write_partition_key'

echo "Checking for any remaining 'get_wal_time_key' occurrences:"
rg --type rust "get_wal_time_key"

echo -e "\nListing all 'get_write_partition_key' occurrences:"
rg --type rust "get_write_partition_key"

Length of output: 245


Script:

#!/bin/bash
# Description: Check for any remaining occurrences of 'get_wal_time_key' and list all occurrences of 'get_write_partition_key'

echo "Checking for any remaining 'get_wal_time_key' occurrences:"
rg --type rust 'get_wal_time_key'

echo -e "\nListing all 'get_write_partition_key' occurrences:"
rg --type rust 'get_write_partition_key'

Length of output: 2133

src/service/metrics/prom.rs (3)

Line range hint 1-1000: Summary of changes

The modification in this file is limited to updating the function call from get_wal_time_key to get_write_partition_key within the remote_write function. This change aligns with the PR objective to fix the generation of partition keys when the field doesn't exist.

Overall, the change looks good, but please ensure:

  1. The new get_write_partition_key function is implemented correctly.
  2. There are no remaining instances of get_wal_time_key in the codebase.
  3. All tests related to this functionality are updated and passing.

Line range hint 378-384: Function call updated to use get_write_partition_key

The function call has been updated from get_wal_time_key to get_write_partition_key. This change aligns with the PR objective to fix the generation of partition keys when the field doesn't exist.

To ensure this change is consistent throughout the codebase, let's verify other occurrences:

#!/bin/bash
# Search for any remaining instances of get_wal_time_key
rg "get_wal_time_key" --type rust

Line range hint 378-384: Verify the implementation of get_write_partition_key

While the function call update looks correct, it's important to ensure that the new get_write_partition_key function maintains the same contract as the previous get_wal_time_key function. This includes handling the same input parameters and returning the expected output.

Let's check the implementation of the new function:

src/service/metrics/otlp_http.rs (1)

429-429: LGTM! Verify consistent renaming across the codebase.

The function name change from get_wal_time_key to get_write_partition_key improves clarity and better describes its purpose. This change aligns with the PR objectives.

To ensure consistency, let's verify that this function has been renamed throughout the codebase:

Verification successful

Verified renaming of get_write_partition_key throughout the codebase. No instances of get_wal_time_key found.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining instances of 'get_wal_time_key'
rg 'get_wal_time_key'

# Search for all instances of the new function name 'get_write_partition_key'
rg 'get_write_partition_key'

Length of output: 1853

src/job/files/parquet.rs (1)

846-846: LGTM. Verify the impact of the function name change.

The change from get_wal_time_key to get_write_partition_key aligns with the PR objective. This modification likely improves how partition keys are generated, especially when dealing with null fields.

To ensure this change doesn't introduce any unintended side effects, please run the following verification script:

This script will help ensure that:

  1. All instances of get_wal_time_key have been replaced.
  2. The implementation of get_write_partition_key correctly handles the case when fields don't exist (are null).
Verification successful

Further Verification Needed: Ensure get_write_partition_key Properly Handles Null Fields.

To verify that get_write_partition_key correctly handles cases when fields are null or do not exist, please run the following verification script:

This script will help ensure that within get_write_partition_key, there are appropriate checks or default mechanisms to handle null or missing fields, aligning with the PR's objective to generate partition keys with null when necessary.


Verification Successful: get_write_partition_key Properly Handles Null Fields.

No remaining usages of get_wal_time_key were found.

The implementation of get_write_partition_key includes appropriate handling of null or missing fields.

Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for any remaining usage of `get_wal_time_key` and verify `get_write_partition_key` implementation.

# Test 1: Search for any remaining usage of `get_wal_time_key`
echo "Searching for any remaining usage of get_wal_time_key:"
rg --type rust 'get_wal_time_key'

# Test 2: Verify the implementation of `get_write_partition_key`
echo "Checking the implementation of get_write_partition_key:"
rg --type rust -A 10 'fn get_write_partition_key'

Length of output: 1036


Script:

#!/bin/bash
# Description: Verify that `get_write_partition_key` properly handles null fields.

echo "Analyzing `get_write_partition_key` for null field handling:"
rg --type rust 'fn get_write_partition_key' src/service/ingestion/mod.rs -A 20 | grep -E 'Option|None|Some|unwrap|default|if .*is_none|match .*Option'

Length of output: 407

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

☢️ Bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants