-
Notifications
You must be signed in to change notification settings - Fork 715
feat: add custom table provider #3846
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThese changes primarily focus on restructuring and enhancing the DataFusion integration within the project. Key updates include reorganizing user-defined functions (UDFs) under a new Changes
Tip Early access features
Note:
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (invoked as PR comments)
Additionally, you can add CodeRabbit Configration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 7
Outside diff range and nitpick comments (1)
src/service/search/datafusion/udf/arrzip_udf.rs (1)
Line range hint
161-161: Consider handling errors gracefully inarr_zip_impl.The function uses
.expectwhich could cause panics if the assumptions are violated. It would be safer to handle these potential errors gracefully and return aDataFusionErrorif necessary.- json::from_str(arr_field1).expect("Failed to deserialize arrzip field1"); + json::from_str(arr_field1).map_err(|e| DataFusionError::Execution(format!("Failed to deserialize arrzip field1: {}", e)))?;
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (13)
- src/handler/http/request/status/mod.rs (1 hunks)
- src/service/promql/engine.rs (1 hunks)
- src/service/search/datafusion/exec.rs (7 hunks)
- src/service/search/datafusion/mod.rs (2 hunks)
- src/service/search/datafusion/table_provider/helpers.rs (1 hunks)
- src/service/search/datafusion/table_provider/mod.rs (1 hunks)
- src/service/search/datafusion/table_provider/test_util.rs (1 hunks)
- src/service/search/datafusion/udf/arrjoin_udf.rs (2 hunks)
- src/service/search/datafusion/udf/arrzip_udf.rs (2 hunks)
- src/service/search/datafusion/udf/cast_to_arr_udf.rs (2 hunks)
- src/service/search/datafusion/udf/mod.rs (1 hunks)
- src/service/search/datafusion/udf/spath_udf.rs (2 hunks)
- src/service/search/datafusion/udf/transform_udf.rs (1 hunks)
Files skipped from review due to trivial changes (3)
- src/handler/http/request/status/mod.rs
- src/service/search/datafusion/mod.rs
- src/service/search/datafusion/udf/transform_udf.rs
Additional context used
Path-based instructions (10)
src/service/search/datafusion/table_provider/test_util.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/udf/mod.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/udf/cast_to_arr_udf.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/udf/spath_udf.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/udf/arrjoin_udf.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/udf/arrzip_udf.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/table_provider/mod.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/table_provider/helpers.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/promql/engine.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/service/search/datafusion/exec.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (12)
src/service/search/datafusion/table_provider/test_util.rs (1)
32-37: Consider handling errors more gracefully in test environments.While using
unwrap()in test environments is common, it's generally a good practice to handle errors gracefully even in tests to avoid panics and to make test failures more informative.src/service/search/datafusion/udf/spath_udf.rs (1)
89-89: Optimize JSON path processing inspath_impl.The function currently re-parses the JSON object for each path segment. Consider maintaining the parsed JSON object and traversing it according to the path, which could improve performance.
[REFACTOR_SUGGESTion]- let mut field: json::Value = json::from_str(field).expect("Failed to deserialize arrzip field1"); + let mut field: json::Value = json::from_str(field).map_err(|e| DataFusionError::Execution(format!("JSON parsing error: {}", e)))?;src/service/search/datafusion/table_provider/mod.rs (1)
83-106: Ensure robust error handling intry_new.The function uses
?for error propagation, which is good. However, consider adding more descriptive error messages or handling specific cases differently to improve the clarity and maintainability of error handling.src/service/search/datafusion/table_provider/helpers.rs (1)
129-149: Optimize thesplit_filesfunction.The function handles file partitioning well. Consider adding more comments to explain the logic, especially around the calculation of
chunk_sizewhich might not be immediately clear to other developers.src/service/promql/engine.rs (2)
1029-1029: Updated import path for REGEX_MATCH_UDF is correct.The updated import path aligns with the restructuring of the
udfmodule as described in the AI-generated summary.
1036-1037: Updated import path for REGEX_NOT_MATCH_UDF is correct.The updated import path correctly points to the new location of
REGEX_NOT_MATCH_UDFwithin theregexp_udfsub-module ofudf, consistent with the restructuring described.src/service/search/datafusion/exec.rs (6)
63-65: Review of imports and module usage.The imports from
superhave been updated to includeNewListingTableandget_all_transformfunctions, which aligns with the restructuring mentioned in the PR summary. This is a positive change as it helps in keeping the code modular and organized.
620-620: Review ofNewListingTableinstantiation.The instantiation of
NewListingTableis done correctly according to the newtable_providermodule's API. It's good to see that error handling is properly implemented with the?operator, which will propagate any errors upwards.
987-987: Review ofNewListingTableinstantiation in a different context.Similar to the previous comment, the instantiation here is done correctly. This consistency in using the
NewListingTableacross different parts of the codebase is good for maintainability.
1093-1093: Review ofNewListingTableregistration in a session context.The registration of
NewListingTablein the session context is handled properly. This part of the code also demonstrates good practice by cloning the table before registering, which ensures that the original object isn't modified unexpectedly elsewhere.
1214-1230: Review of UDF registrations.The registration of multiple UDFs is done in a clean and concise manner. It's important to ensure that all these UDFs are indeed implemented and tested as they play a critical role in the functionality of the system. The use of
clone()here is necessary due to the ownership rules in Rust, which is correctly applied.
1311-1311: Review of conditional cache handling inNewListingTable.This line demonstrates an important aspect of performance optimization by conditionally adding a cache based on the storage type. This is a smart use of resources and helps in optimizing performance for different deployment configurations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Files selected for processing (3)
- src/infra/src/file_list/mysql.rs (1 hunks)
- src/infra/src/file_list/postgres.rs (1 hunks)
- src/infra/src/file_list/sqlite.rs (1 hunks)
Additional context used
Path-based instructions (3)
src/infra/src/file_list/postgres.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/sqlite.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
src/infra/src/file_list/mysql.rs (1)
Pattern
**/*.rs: You are a smart rustlang pull request reviewer.
You are going to review all the rustlang files.
Be concise, and add a brief explanation to your suggestionsMake sure the responses are not too verbose and keep the suggestions to the point i.e. actionable.
Additional comments not posted (3)
src/infra/src/file_list/postgres.rs (1)
1001-1001: Ensure compatibility with existing data when changing default column values.Changing the default value of the
started_atcolumn to0might affect existing operations expecting aNULLor different default. It's crucial to verify that this change aligns with all uses of this column in the application to prevent unexpected behaviors.Verification successful
Changing the default value of the
started_atcolumn to0is compatible with existing operations.The
INSERTstatement already uses0forstarted_at, and theUPDATEstatement explicitly sets this column, ensuring no conflicts with the default value change.
src/infra/src/file_list/postgres.rs, Line 1001Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for usages of the `started_at` column in the PostgreSQL-related files. rg --type rust --glob '*postgres.rs' 'started_at'Length of output: 627
src/infra/src/file_list/sqlite.rs (1)
1009-1009: Check implications of the default value change on existing SQLite data.The addition of a default value of
0forstarted_atin SQLite mirrors the change in the PostgreSQL file. It's important to assess how this default value impacts existing records and queries, especially in cases whereNULLmight have been expected.Verification successful
Default value change for
started_atverifiedThe change to set the default value of the
started_atcolumn to0in SQLite does not impact existing records or queries, as there are no instances ofNULLhandling for this column in the codebase.
- Instances of
started_atare consistently set to the current timestamp, and noNULLvalues are expected.Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Search for usages of the `started_at` column in the SQLite-related files. rg --type rust --glob '*sqlite.rs' 'started_at'Length of output: 592
Script:
#!/bin/bash # Description: Search for instances where the `started_at` column is set to NULL or queries that handle NULL values for this column. rg --type rust --glob '*sqlite.rs' 'started_at.*NULL'Length of output: 55
Script:
#!/bin/bash # Description: Search for usages of the `started_at` column in the entire codebase to identify any potential handling of NULL values or implications of the default value change. rg --type rust 'started_at'Length of output: 4667
src/infra/src/file_list/mysql.rs (1)
1044-1044: Good practice to ensure data consistency.Setting a default value of
0for thestarted_atcolumn in thefile_list_jobstable is a good practice. It ensures that every record has a predictable initial state, which is crucial for processes that depend on this timestamp. This change enhances the robustness of the database schema by preventing null values in this column.
Summary by CodeRabbit
New Features
Improvements
NewListingTablestruct for efficient file listing and scanning in file systems.Bug Fixes
started_atcolumn in thefile_list_jobstable across multiple databases (MySQL, PostgreSQL, SQLite) to ensure consistent default behavior.