-
Notifications
You must be signed in to change notification settings - Fork 715
fix: reduce CPU const for merging #3673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThe recent updates introduce memory management enhancements and adjustments to function signatures across several files. Key changes include adding parameters to control memory modes in batch concatenation and file merging functions, improving concurrency handling in the writer implementation, and refining lock ID calculations in the Postgres database implementation. Changes
Sequence Diagram(s) (Beta)sequenceDiagram
participant Client
participant Service
participant DB
Client->>Service: Request to merge files
Service->>Service: Determine memory mode
alt strict_memory_mode
Service->>Service: Call merge_parquet_files_v1
else less_memory_mode
Service->>Service: Call merge_parquet_files_v2
else
Service->>Service: Call default merge_parquet_files
end
Service->>DB: Write merged data
DB->>Service: Acknowledge write
Service->>Client: Respond with success
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (2)
- src/config/src/utils/record_batch_ext.rs (2 hunks)
- src/service/search/mod.rs (5 hunks)
Additional context used
Path-based instructions (2)
src/config/src/utils/record_batch_ext.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/mod.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (2)
src/config/src/utils/record_batch_ext.rs (1)
486-486: Simplified parameter declaration enhances function safety by avoiding unintended mutations.src/service/search/mod.rs (1)
47-47: Reorganized import enhances code readability and maintainability.
fix: rollback the change on ingester
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 3
Outside diff range and nitpick comments (1)
src/config/src/utils/record_batch_ext.rs (1)
[!TIP]
Codebase VerificationThe output confirms that there are still calls to
concat_batchesusing the old signature in the following files:
src/handler/grpc/request/event.rssrc/job/files/parquet.rssrc/service/compact/flatten.rsThese calls need to be updated to include the new
strict_memory_modeparameter.
- src/handler/grpc/request/event.rs
- src/job/files/parquet.rs
- src/service/compact/flatten.rs
Analysis chain
Line range hint
487-518: The addition ofstrict_memory_modetoconcat_batchesintroduces conditional memory management which aligns with the PR's objective to optimize resource usage. Ensure that all calls to this function have been updated to pass this new parameter.
The script output indicates that there are still calls to
concat_batchesusing the old signature. We need to identify the exact lines and contexts where these calls occur to ensure they are updated to include the newstrict_memory_modeparameter.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify all function calls to `concat_batches` match the new signature. # Test: Search for the function usage. Expect: Only occurances of the new signature. rg --type rust $'concat_batches'Length of output: 614
Script:
#!/bin/bash # Description: Identify the exact lines and contexts where the old signature of `concat_batches` is used. # Search for the old signature usage in the identified files. rg --type rust 'arrow::compute::concat_batches' src/service/compact/flatten.rs src/job/files/parquet.rs src/handler/grpc/request/event.rsLength of output: 447
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/config/src/utils/record_batch_ext.rs (2 hunks)
- src/job/files/parquet.rs (1 hunks)
- src/service/compact/merge.rs (4 hunks)
Additional context used
Path-based instructions (3)
src/config/src/utils/record_batch_ext.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/job/files/parquet.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/compact/merge.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (2)
src/service/compact/merge.rs (2)
688-688: Ensure thestrict_memory_modeparameter is consistently used across all relevant function calls.
1035-1035: The addition ofstrict_memory_modetomerge_parquet_filesfunction is consistent with the PR's objective to manage memory more efficiently.
src/service/compact/merge.rs
Outdated
| if strict_memory_mode { | ||
| for i in 0..batch_columns_len { | ||
| let i = i - sorted_columns.len(); | ||
| let sorted_column = | ||
| arrow::compute::take(&concated_record_batch.remove_column(i), &sort_indices, None)?; | ||
| sorted_columns.push(sorted_column); | ||
| } | ||
| } else { | ||
| sorted_columns = concated_record_batch | ||
| .columns() | ||
| .iter() | ||
| .map(|c| arrow::compute::take(c, &sort_indices, None)) | ||
| .collect::<std::result::Result<Vec<_>, _>>()?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The sorting logic in strict_memory_mode seems overly complex and might be error-prone due to manual index adjustments. Consider simplifying this logic or using a library function if available.
// Simplify the sorting logic to avoid manual index calculations
src/job/files/parquet.rs
Outdated
| .await | ||
| } else if stream_type == StreamType::Logs { | ||
| merge_parquet_files(thread_id, tmp_dir.name(), schema.clone()).await | ||
| merge_parquet_files(thread_id, tmp_dir.name(), schema.clone(), false).await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tip
Codebase Verification
Inconsistent handling of strict_memory_mode for StreamType::Logs in merge_parquet_files.
src/service/compact/merge.rs:merge_parquet_filesis called withtrueforStreamType::Logs.src/job/files/parquet.rs:merge_parquet_filesis called withfalseforStreamType::Logs.
Please review the memory management strategy for StreamType::Logs to ensure consistent behavior.
Analysis chain
The update to pass strict_memory_mode as false in merge_parquet_files when handling StreamType::Logs is a targeted optimization. Ensure that this behavior is consistent with the expected memory management strategies for logs.
Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify the correct handling of `strict_memory_mode` in `merge_parquet_files`.
# Test: Search for the function usage with `StreamType::Logs`. Expect: Only occurances with `strict_memory_mode` as false.
rg --type rust $'merge_parquet_files\([^,]*,[^,]*,[^,]*,[^,]*, false\)'
Length of output: 72
Script:
#!/bin/bash
# Description: Verify the correct handling of `strict_memory_mode` in `merge_parquet_files`.
# Test: Search for all instances of `merge_parquet_files` and inspect the parameters passed to it.
rg --type rust 'merge_parquet_files\('
Length of output: 479
Script:
#!/bin/bash
# Description: Extract lines around the instances of `merge_parquet_files` to inspect the parameters passed.
# Extract lines around the instance in `src/service/search/datafusion/exec.rs`
rg --type rust -A 3 -B 3 'merge_parquet_files\(' src/service/search/datafusion/exec.rs
# Extract lines around the instances in `src/service/compact/merge.rs`
rg --type rust -A 3 -B 3 'merge_parquet_files\(' src/service/compact/merge.rs
# Extract lines around the instance in `src/job/files/parquet.rs`
rg --type rust -A 3 -B 3 'merge_parquet_files\(' src/job/files/parquet.rs
Length of output: 1122
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/ingester/src/writer.rs (1 hunks)
Additional context used
Path-based instructions (1)
src/ingester/src/writer.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (1)
src/ingester/src/writer.rs (1)
246-246: Switching from read to write lock onmemtableenhances data integrity during write operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (1)
- src/infra/src/db/postgres.rs (1 hunks)
Additional context used
Path-based instructions (1)
src/infra/src/db/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Outside diff range and nitpick comments (1)
src/service/compact/merge.rs (1)
Line range hint
1115-1161: Review the implementation ofmerge_parquet_files_v2.The function
merge_parquet_files_v2is designed to use less memory but is slower. Consider the following improvements:
- The deletion of null columns (lines 1161-1176) could potentially be optimized by using built-in functions or libraries that handle such operations more efficiently.
- Ensure that the sorting logic is not overly complex and error-prone due to manual index adjustments.
// Use a more efficient method to drop null columns if available in the used libraries
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/config/src/utils/record_batch_ext.rs (2 hunks)
- src/job/files/parquet.rs (2 hunks)
- src/service/compact/merge.rs (4 hunks)
Files skipped from review as they are similar to previous changes (2)
- src/config/src/utils/record_batch_ext.rs
- src/job/files/parquet.rs
Additional context used
Path-based instructions (1)
src/service/compact/merge.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (1)
src/service/compact/merge.rs (1)
688-688: Ensure the correct function is called based on the stream type.This change correctly uses
merge_parquet_files_v2forStreamType::Logs, which is likely optimized for log data.
Summary by CodeRabbit
New Features
strict_memory_modeandless_memory_mode) for more efficient data processing and enhanced performance control.Bug Fixes
Performance Improvements