clp-s: Add support for projecting of a subset of columns during search.#510
clp-s: Add support for projecting of a subset of columns during search.#510wraymo merged 16 commits intoy-scope:mainfrom
Conversation
|
Seems like |
Issue being tracked in #545. You can merge main to temporarily resolve the (new) violations. |
wraymo
left a comment
There was a problem hiding this comment.
Nice work! Do you want to point out --projection can only be used after all positional options or do you want to support something like ./clp-s s --projection a b c -- archive query? Maybe we should also add a projection case in the doc later.
| * rewrite a simpler version. Probably best to write the simpler version for now since types are | ||
| * leaf-only. The type-per-token idea solves this problem (in the absence of wildcards). | ||
| */ | ||
| void resolve_columns(std::shared_ptr<SchemaTree> tree); |
There was a problem hiding this comment.
Can you add a description for this method?
| } | ||
|
|
||
| private: | ||
| void resolve_column(std::shared_ptr<SchemaTree> tree, std::shared_ptr<ColumnDescriptor> column); |
| std::shared_ptr<search::Projection> m_projection{ | ||
| new search::Projection{search::ProjectionMode::ReturnAllColumns} | ||
| }; |
There was a problem hiding this comment.
What about using make_shared?
| /** | ||
| * Add a column to the set of columns that should be included in the projected results | ||
| * @param column | ||
| * @throws OperationFailed if column contains a wildcard |
There was a problem hiding this comment.
Can you add more throw cases?
| )( | ||
| "projection", | ||
| po::value<std::vector<std::string>>(&m_projection_columns) | ||
| ->multitoken() | ||
| ->value_name("COLUMN_A COLUMN_B ..."), | ||
| "The set of projected columns that will be marshalled for matching results" |
There was a problem hiding this comment.
Can you reduce the indent of this code block?
Co-authored-by: wraymo <[email protected]>
WalkthroughThe changes introduce enhancements to the Changes
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Outside diff range and nitpick comments (13)
components/core/src/clp_s/search/Projection.hpp (2)
13-16: LGTM: ProjectionMode enum is well-defined.The enum definition is clear and concise. Using
uint8_tas the underlying type is appropriate for this small enum.Consider adding a brief comment explaining the purpose of each enum value for improved clarity:
enum ProjectionMode : uint8_t { ReturnAllColumns, // Return all columns regardless of selection ReturnSelectedColumns // Return only the columns specified in the projection };
62-64: LGTM: Private members are well-chosen and initialized.The use of
std::vectorfor selected columns andabsl::flat_hash_setfor matching nodes is appropriate and efficient. The default projection mode ofReturnAllColumnsaligns with the expected behavior.Consider making the
m_projection_modemember const since it's initialized in the constructor and doesn't appear to change:const ProjectionMode m_projection_mode;components/core/src/clp_s/JsonSerializer.hpp (1)
78-84: Improved robustness for closing JSON documentsThe changes to the
end_document()method enhance its ability to handle various scenarios, including empty documents or improperly started ones. This is a good improvement in terms of robustness.However, we could optimize this slightly by using a ternary operator:
void end_document() { m_json_string.back() != '{' ? m_json_string.back() = '}' : m_json_string += '}'; }This achieves the same result with a more concise syntax, potentially improving readability and performance.
components/core/src/clp_s/ArchiveReader.hpp (1)
190-192: Consider usingstd::make_sharedfor initialization.While the current initialization is correct, using
std::make_sharedcould potentially be more efficient:std::shared_ptr<search::Projection> m_projection{ std::make_shared<search::Projection>(search::ProjectionMode::ReturnAllColumns) };This approach performs a single allocation for both the
Projectionobject and the control block, which can be more efficient than separate allocations.components/core/src/clp_s/CommandLineArguments.hpp (1)
197-197: Consider initializing m_projection_columns in the constructor.The new
m_projection_columnsmember variable is appropriately typed and named, following the class's conventions. However, it's not initialized in the constructor. While this might be intentional, consider adding an empty initialization in the constructor to ensure consistent behaviour across all instances of the class.Example:
CommandLineArguments(std::string const& program_name) : m_program_name(program_name), m_projection_columns() {}This would guarantee that
m_projection_columnsstarts as an empty vector, preventing any potential issues with uninitialized data.components/core/src/clp_s/SchemaReader.hpp (1)
75-83: LGTM: Updated reset method signatureThe addition of the
projectionparameter to theresetmethod signature is appropriate and aligns with the PR objectives to support column projection.Suggestion: Consider updating the method documentation to include a brief description of the
projectionparameter.components/core/src/clp_s/SchemaReader.cpp (2)
560-562: Good extension of projection to unordered objectsThis change correctly extends the projection feature to unordered objects in the schema, maintaining consistency in applying user-specified column projections. Well done!
For improved code consistency, consider extracting the
matches_nodecheck into a separate method, as it's used in multiple places.Here's a suggested refactor:
+bool should_include_in_local_tree(int32_t node_id) { + return m_projection->matches_node(node_id); +} void SchemaReader::initialize_serializer() { // ... for (int32_t global_column_id : m_ordered_schema) { - if (m_projection->matches_node(global_column_id)) { + if (should_include_in_local_tree(global_column_id)) { generate_local_tree(global_column_id); } } for (auto it = m_global_id_to_unordered_object.begin(); it != m_global_id_to_unordered_object.end(); ++it) { - if (m_projection->matches_node(it->first)) { + if (should_include_in_local_tree(it->first)) { generate_local_tree(it->first); } } // ... }This refactoring improves readability and makes it easier to modify the inclusion logic in the future if needed.
567-569: Good defensive programming practiceThis change prevents attempting to generate a JSON template for an empty schema tree, which is an excellent defensive programming practice. It avoids potential errors and improves the robustness of the code.
To improve clarity, consider adding a brief comment explaining why this check is necessary:
+ // Only generate JSON template if we have nodes in the local schema tree + // This can happen if no columns match the projection if (false == m_local_schema_tree.get_nodes().empty()) { generate_json_template(m_local_schema_tree.get_root_node_id()); }This comment would help future maintainers understand the purpose of this check quickly.
components/core/src/clp_s/CommandLineArguments.cpp (1)
428-433: LGTM! Consider clarifying the description.The implementation of the new
--projectionoption looks good. It correctly allows for multiple column names and stores them appropriately.Consider slightly modifying the description to be more explicit about its purpose:
- "The set of projected columns that will be marshalled for matching results" + "Specify columns to include in the output for matching results"This change makes it clearer that the option affects the output content rather than the matching process itself.
components/core/src/clp_s/search/Projection.cpp (2)
38-53: Consider moving extensive comments to documentationThe block comment from lines 38 to 53 provides valuable context but is quite lengthy. Consider moving this explanation to external documentation or a detailed comment in the header file to keep the codebase clean and maintainable.
66-66: Simplify boolean expression for enhanced readabilityUsing
!last_tokeninstead offalse == last_tokenimproves clarity.Apply this diff:
- if (false == last_token && child_node.get_type() != NodeType::Object) { + if (!last_token && child_node.get_type() != NodeType::Object) {components/core/src/clp_s/clp-s.cpp (2)
198-198: Enhance Error Logging for ClarityThe current error message may lack context, making it harder to diagnose issues.
Consider adding more descriptive information to the error log to assist with debugging:
-SPDLOG_ERROR("{}", e.what()); +SPDLOG_ERROR("Failed to add projection columns: {}", e.what());This change provides clearer insight into what operation was being performed when the error occurred.
185-189: Simplify the Initialization ofprojectionObjectThe ternary operator used for determining the
ProjectionModecan be simplified for better readability.Consider simplifying the initialization as follows:
auto projection = std::make_shared<Projection>( - command_line_arguments.get_projection_columns().empty() - ? ProjectionMode::ReturnAllColumns - : ProjectionMode::ReturnSelectedColumns + command_line_arguments.get_projection_columns().empty() ? + ProjectionMode::ReturnAllColumns : + ProjectionMode::ReturnSelectedColumns );Alternatively, assign the mode to a variable for clarity:
ProjectionMode mode = command_line_arguments.get_projection_columns().empty() ? ProjectionMode::ReturnAllColumns : ProjectionMode::ReturnSelectedColumns; auto projection = std::make_shared<Projection>(mode);This enhances readability by breaking down the logic into manageable parts.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (11)
- components/core/src/clp_s/ArchiveReader.cpp (1 hunks)
- components/core/src/clp_s/ArchiveReader.hpp (3 hunks)
- components/core/src/clp_s/CMakeLists.txt (1 hunks)
- components/core/src/clp_s/CommandLineArguments.cpp (1 hunks)
- components/core/src/clp_s/CommandLineArguments.hpp (2 hunks)
- components/core/src/clp_s/JsonSerializer.hpp (1 hunks)
- components/core/src/clp_s/SchemaReader.cpp (1 hunks)
- components/core/src/clp_s/SchemaReader.hpp (4 hunks)
- components/core/src/clp_s/clp-s.cpp (3 hunks)
- components/core/src/clp_s/search/Projection.cpp (1 hunks)
- components/core/src/clp_s/search/Projection.hpp (1 hunks)
🧰 Additional context used
cppcheck
components/core/src/clp_s/clp-s.cpp
[error] 191-191: Exception thrown in function declared not to throw exceptions.
(throwInNoexceptFunction)
🔇 Additional comments not posted (20)
components/core/src/clp_s/search/Projection.hpp (5)
1-12: LGTM: File structure and includes are well-organized.The file header, include guards, and necessary includes are properly set up. The namespace is correctly defined, following best practices for C++ code organization.
24-39: Enhance exception documentation and consider additional throw cases.The
OperationFailedexception is well-defined, but consider expanding on potential throw cases in theadd_columnmethod documentation.Update the method documentation to include more specific throw cases:
/** * Add a column to the set of columns that should be included in the projected results * @param column The column descriptor to add * @throws OperationFailed if: * - column contains a wildcard * - column is null * - column name is empty * - column name is duplicated */ void add_column(std::shared_ptr<ColumnDescriptor> column);
40-47: Add method description for resolve_columns.As per the previous review comment, please add a description for this method to improve code documentation.
Add a method description:
/** * Resolve the added columns against the provided schema tree. * This method should be called after adding all desired columns and before calling matches_node. * @param tree The schema tree to resolve columns against * @throws OperationFailed if resolution fails for any column */ void resolve_columns(std::shared_ptr<SchemaTree> tree);
59-60: Add method description for resolve_column.As per the previous review comment, please add a description for this private method to improve code maintainability.
Add a method description:
/** * Resolve a single column against the provided schema tree. * @param tree The schema tree to resolve the column against * @param column The column descriptor to resolve * @throws OperationFailed if resolution fails for the column */ void resolve_column(std::shared_ptr<SchemaTree> tree, std::shared_ptr<ColumnDescriptor> column);
49-57: LGTM: matches_node method is well-implemented.The
matches_nodemethod is efficiently implemented as an inline function. It correctly checks the projection mode and uses the hash set for fast matching, which aligns well with the PR objectives of allowing users to specify which columns to include in the search results.components/core/src/clp_s/ArchiveReader.hpp (2)
15-15: LGTM: Appropriate include added for new functionality.The inclusion of "search/Projection.hpp" is necessary and correctly placed for the new projection functionality being introduced.
137-139: LGTM: Well-designed setter for projection.The
set_projectionmethod is appropriately implemented:
- It uses
std::shared_ptrfor managing shared ownership of the projection object.- The method is concise and follows the Single Responsibility Principle.
components/core/src/clp_s/CMakeLists.txt (1)
148-149: LGTM! The new Projection files are correctly added.The addition of
Projection.cppandProjection.hppto theCLP_S_SEARCH_SOURCESvariable is correct and aligns with the PR objectives of adding support for projecting a subset of columns during search.A few considerations:
- Ensure that any necessary include directives for these new files are added in other source files that will use the Projection functionality.
- If the Projection feature requires any new dependencies, make sure they are added to the
target_link_librariescommand for theclp-starget.- Consider updating any relevant documentation or README files to reflect the new Projection feature.
components/core/src/clp_s/CommandLineArguments.hpp (2)
111-112: LGTM: Well-designed getter method for projection columns.The new
get_projection_columns()method is a well-implemented getter. It provides efficient, read-only access to the projection columns using a const reference return type. The method is appropriately marked as const, ensuring it doesn't modify the object's state.
111-112: Verify implementation of parsing and setting projection columns.The changes to
CommandLineArgumentsclass provide a solid foundation for handling projection columns. However, the actual parsing of the--projectioncommand-line option and setting of them_projection_columnsvector is not visible in this file.To ensure complete implementation:
- Verify that the
--projectionoption is properly parsed in the command-line parsing logic (likely in a separate file or method).- Confirm that
m_projection_columnsis correctly populated with the parsed values.- Check for any validation of the projection column names (e.g., disallowing wildcards, handling case sensitivity) as mentioned in the PR objectives.
Also applies to: 197-197
components/core/src/clp_s/SchemaReader.hpp (3)
14-14: LGTM: Include statement for Projection headerThe addition of the include statement for "search/Projection.hpp" is appropriate and necessary for using the
search::Projectiontype in theSchemaReaderclass.
298-298: LGTM: New member variable for projectionThe addition of the
m_projectionmember variable is consistent with the PR objectives and matches the type used in theresetmethod signature.
Line range hint
1-303: Overall assessment: Changes align with PR objectivesThe modifications to the
SchemaReaderclass, including the new include statement, updatedresetmethod signature, and newm_projectionmember variable, are well-implemented and consistent with the PR objectives to support column projection.These changes lay the groundwork for implementing the projection feature in the
SchemaReaderclass, allowing for more efficient data retrieval based on user-specified columns.Ensure that the implementation of the projection functionality in other parts of the codebase (e.g., in the
resetmethod body and other methods that will usem_projection) is consistent with the design introduced here.components/core/src/clp_s/ArchiveReader.cpp (1)
238-238: LGTM! Verify SchemaReader class update.The addition of the
m_projectionparameter toreader.reset()aligns well with the PR objective of supporting column projection during search. This change appears to be a key component in implementing the new--projectioncommand line option.To ensure consistency across the codebase, please run the following script to verify the
SchemaReaderclass update:✅ Verification successful
Verified SchemaReader class updates. No issues found.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Verify SchemaReader class update to handle projections # Test 1: Check if SchemaReader class has been updated to handle projections rg --type cpp -g 'SchemaReader.*' 'class SchemaReader' -A 10 # Test 2: Verify if reader.reset() method signature has been updated rg --type cpp -g 'SchemaReader.*' 'void reset\(' -A 5 # Test 3: Check for any new methods related to projection handling rg --type cpp -g 'SchemaReader.*' 'projection'Length of output: 2715
components/core/src/clp_s/SchemaReader.cpp (2)
551-553: Excellent implementation of column projection!This change effectively implements the projection feature, ensuring that only the columns specified by the user are included in the local tree. This aligns perfectly with the PR objective of allowing users to specify which columns to include in the search results using the
--projectionoption.
551-569: Overall excellent implementation of the projection featureThe changes in this file successfully implement the projection feature as described in the PR objectives. The modifications allow users to specify which columns to include in the search results, and they are well-integrated into the existing code structure.
Key points:
- The projection is correctly applied to both ordered and unordered schema elements.
- Defensive programming practices have been added to handle potential edge cases.
- The changes are consistent with the existing codebase style and structure.
Minor suggestions for improvement have been provided in the individual comments, but these are not critical for the functionality of the feature. Great job on this implementation!
components/core/src/clp_s/CommandLineArguments.cpp (1)
Line range hint
1-1: Verify related changes in other files.While the addition of the
--projectionoption is implemented correctly here, there are a couple of points to consider:
- Ensure that the
m_projection_columnsmember variable is properly declared in the corresponding header file (likelyCommandLineArguments.hpp).- Verify that the actual implementation of the projection functionality is present in the appropriate file(s) where the search results are processed and output.
To check for the member variable declaration and related implementations, you can run the following script:
This script will help identify where the member variable is declared and where the projection functionality is implemented.
✅ Verification successful
Verification Successful
The
m_projection_columnsmember variable is properly declared inCommandLineArguments.hpp, and the projection functionality is correctly implemented across the relevant files.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Check for m_projection_columns declaration echo "Checking for m_projection_columns declaration:" rg "std::vector<std::string>\s+m_projection_columns;" --type cpp echo -e "\nChecking for projection-related implementations:" rg "projection|m_projection_columns" --type cpp -g '!CommandLineArguments.cpp'Length of output: 3175
components/core/src/clp_s/clp-s.cpp (3)
32-32: IncludeProjection.hppto Use Projection FunctionalityThe inclusion of
"search/Projection.hpp"is necessary to utilize theProjectionclass for handling column projections.
43-43: AddStringUtilsto Namespace ScopeThe using directive
using clp_s::StringUtils;correctly bringsStringUtilsinto scope, allowing for cleaner code when calling its methods.
194-194: Check for Exceptions fromColumnDescriptor::createEnsure that
ColumnDescriptor::create(descriptor_tokens)handles invalid input gracefully and does not throw unexpected exceptions.Run the following script to verify how
ColumnDescriptor::createhandles invalid descriptors:Ensure that any exceptions are properly documented and managed.
✅ Verification successful
Verification Complete: Exceptions Handling in
ColumnDescriptor::createConfirmedThe
ColumnDescriptor::create(descriptor_tokens)implementation does not contain anythrowstatements, indicating that it handles invalid input without raising exceptions directly. Additionally, the surrounding code inclp-s.cppappropriately manages potential exceptions.🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash # Description: Inspect ColumnDescriptor::create for exception safety. # Find the implementation of ColumnDescriptor::create rg --type cpp 'ColumnDescriptor::create\(' -A 10 # Look for any throw statements or exception handling mechanismsLength of output: 7727
Script:
#!/bin/bash # Description: Inspect ColumnDescriptor::create for exception safety. # Find the implementation files of ColumnDescriptor::create rg --type cpp 'ColumnDescriptor::create\s*\(' > create_implementations.txt # Extract file paths from the previous command FILES=$(rg --type cpp --files-with-matches 'ColumnDescriptor::create\s*\(') # Search for throw statements within these implementations for file in $FILES; do echo "Checking $file for throw statements..." rg 'throw\s+' "$file" -C 2 doneLength of output: 1688
Co-authored-by: wraymo <[email protected]>
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Outside diff range and nitpick comments (1)
components/core/src/clp_s/CommandLineArguments.cpp (1)
428-435: LGTM! Consider clarifying the projection documentation.The new options "archive-id" and "projection" are well-implemented and properly integrated into the existing command-line argument structure. The code follows the established patterns and is consistent with the rest of the file.
A minor suggestion for improvement:
Consider adding a brief example of how to use the "projection" option in the documentation string. This could help users understand the feature more quickly. For instance:
"Project only the given set of columns for matching results. This option must be" " specified after all positional options. Values that are objects or structured" -" arrays are currently unsupported." +" arrays are currently unsupported. Example: --projection column1 column2 column3"
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
📒 Files selected for processing (2)
- components/core/src/clp_s/CommandLineArguments.cpp (1 hunks)
- components/core/src/clp_s/search/Projection.hpp (1 hunks)
🔇 Additional comments (1)
components/core/src/clp_s/search/Projection.hpp (1)
63-66: Functionmatches_nodeimplementation is correct.The function correctly determines whether a node should be included based on the projection mode and matching nodes.
|
|
||
| std::vector<std::shared_ptr<ColumnDescriptor>> m_selected_columns; | ||
| absl::flat_hash_set<int32_t> m_matching_nodes; | ||
| ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns}; |
There was a problem hiding this comment.
Remove redundant in-class initialization of m_projection_mode.
Since m_projection_mode is always initialized by the constructor, the in-class initializer is redundant and can be removed to avoid potential confusion.
Apply this diff to remove the redundant initialization:
- ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns};
+ ProjectionMode m_projection_mode;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ProjectionMode m_projection_mode{ProjectionMode::ReturnAllColumns}; | |
| ProjectionMode m_projection_mode; |
| class OperationFailed : public TraceableException { | ||
| public: | ||
| // Constructors | ||
| OperationFailed(ErrorCode error_code, char const* const filename, int line_number) | ||
| : TraceableException(error_code, filename, line_number) {} | ||
| }; |
There was a problem hiding this comment.
🛠️ Refactor suggestion
Consider renaming OperationFailed to ProjectionOperationFailed for clarity.
The exception class OperationFailed has a generic name that may lead to confusion if similar exceptions exist elsewhere in the codebase. Renaming it to ProjectionOperationFailed will make its purpose more explicit and improve readability.
Apply this diff to rename the exception class:
- class OperationFailed : public TraceableException {
+ class ProjectionOperationFailed : public TraceableException {Remember to update all references to OperationFailed accordingly.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| class OperationFailed : public TraceableException { | |
| public: | |
| // Constructors | |
| OperationFailed(ErrorCode error_code, char const* const filename, int line_number) | |
| : TraceableException(error_code, filename, line_number) {} | |
| }; | |
| class ProjectionOperationFailed : public TraceableException { | |
| public: | |
| // Constructors | |
| ProjectionOperationFailed(ErrorCode error_code, char const* const filename, int line_number) | |
| : TraceableException(error_code, filename, line_number) {} | |
| }; |
…h. (y-scope#510) Co-authored-by: wraymo <[email protected]> Co-authored-by: Kirk Rodrigues <[email protected]>
…h. (y-scope#510) Co-authored-by: wraymo <[email protected]> Co-authored-by: Kirk Rodrigues <[email protected]>
Description
This PR adds support for basic projection during search. We allow the user to specify which columns they want to be marshalled for matching search results using the --projection command line option.
For example
clp-s s archives '*' --archive-id XYZ --projection @timestamp level msg.errorWould return all records in the archive XYZ, showing only the @timestamp, level, and msg.error fields.
Note that if a record doesn't contain all of columns we are asked to project we simply output the record without doing anything about the missing columns. In our example above if a record were missing all three of the fields we asked for it would simply be marshalled as
{}.Other things to note are that we allow projected columns to be polymorphic, we do not allow the set of projected columns to contain wildcards, and we do not allow projection to travel inside of arrays. Specifically if you have an array a containing objects with the field b then asking to project "a" is legal (leading to the whole array being marshalled), but asking for "a.b" is illegal.
Also note that unlike languages like SQL the names of columns being projected are case sensitive.
We can add support for things like case-insensitive column matching, advanced handling for records that don't contain a column we're asking for, polymorphism, and more in future PRs.
The implementation is quite straightforward. We take in a set of columns, expand them against the MPT to find all matching nodes, and then check if nodes in the schema of an ERT belong to that set when deciding whether to marshal the corresponding column.
Validation performed
Summary by CodeRabbit
Release Notes
New Features
--projectionfor specifying projected columns.Projectionclass for managing and resolving column projections.Bug Fixes
Enhancements
ArchiveReaderandSchemaReaderto utilize projection information for schema processing.