Conversation
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Implemented GetOrcColumnIndex helper function that: - Resolves FieldRef to ORC column index using manifest - Uses FieldRef.FindOne() to locate field in schema - Traverses manifest tree following field path indices - Handles both top-level and nested fields - Returns column_index for leaf nodes (primitives with statistics) - Returns std::nullopt for containers or not found - Added necessary includes: - <optional> for std::optional return type - arrow/compute/api_scalar.h for FieldRef and FieldPath Implementation details: - Top-level fields accessed via manifest.schema_fields[index] - Nested fields traversed via current_field->children[index] - Validates indices at each level to prevent out-of-bounds - Only returns column_index if field is leaf (has statistics) - Containers (struct/list/map) return nullopt Verified: Manual code review - follows FieldRef resolution pattern Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Implemented GetOrcColumnIndex helper function that: - Resolves FieldRef to ORC column index using manifest - Uses FieldRef.FindOne() to locate field in schema - Traverses manifest tree following field path indices - Handles both top-level and nested fields - Returns column_index for leaf nodes (primitives with statistics) - Returns std::nullopt for containers or not found - Added necessary includes: - <optional> for std::optional return type - arrow/compute/api_scalar.h for FieldRef and FieldPath Implementation details: - Top-level fields accessed via manifest.schema_fields[index] - Nested fields traversed via current_field->children[index] - Validates indices at each level to prevent out-of-bounds - Only returns column_index if field is leaf (has statistics) - Containers (struct/list/map) return nullopt Verified: Manual code review - follows FieldRef resolution pattern Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Merged
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Merged
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Implemented GetOrcColumnIndex helper function that: - Resolves FieldRef to ORC column index using manifest - Uses FieldRef.FindOne() to locate field in schema - Traverses manifest tree following field path indices - Handles both top-level and nested fields - Returns column_index for leaf nodes (primitives with statistics) - Returns std::nullopt for containers or not found - Added necessary includes: - <optional> for std::optional return type - arrow/compute/api_scalar.h for FieldRef and FieldPath Implementation details: - Top-level fields accessed via manifest.schema_fields[index] - Nested fields traversed via current_field->children[index] - Validates indices at each level to prevent out-of-bounds - Only returns column_index if field is leaf (has statistics) - Containers (struct/list/map) return nullopt Verified: Manual code review - follows FieldRef resolution pattern Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Adds comprehensive task tracking and progress documentation for the ongoing ORC predicate pushdown implementation project. ## Changes - task_list.json: Complete 35-task breakdown with dependencies - Tasks #0, #0.5, #1, #2 marked as complete (on feature branches) - Tasks #3-apache#35 pending implementation - Organized by phase: Prerequisites, Core, Metadata, Predicate, Scan, Testing, Future - claude-progress.txt: Comprehensive project status document - Codebase structure and build instructions - Work completed on feature branches (not yet merged) - Current main branch state - Next steps and implementation strategy - Parquet mirroring patterns and Allium spec alignment ## Context This is an initialization session to establish baseline tracking for the ORC predicate pushdown project. Previous sessions (1-4) completed initial tasks on feature branches. This consolidates that progress and provides a clear roadmap for future implementation sessions. ## Related Work - Allium spec: orc-predicate-pushdown.allium (already on main) - Feature branches: task-0-statistics-api-v2, task-0.5-stripe-selective-reading, task-1-orc-schema-manifest, task-2-build-orc-schema-manifest (not yet merged) ## Next Steps Future sessions will implement tasks #3+ via individual feature branch PRs.
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
- Added GetOrcColumnIndex function to resolve FieldRef to ORC column index - Handles top-level fields via direct lookup - Handles nested fields via manifest tree traversal - Returns nullopt if field not found or not a leaf field - Only leaf fields have statistics and valid column indices - Added <optional> include for std::optional support Verified: Code structure follows Parquet pattern
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
Implemented the GetOrcColumnIndex function that resolves field references to ORC physical column indices using the schema manifest. This is a critical component for predicate pushdown to map Arrow field references to ORC columns for statistics lookup. Implementation details: - Added GetOrcColumnIndex() in internal namespace (file_orc.cc) - Handles top-level field resolution (simple name references) - Handles nested field resolution (traverses manifest tree) - Returns std::nullopt for: * Fields not found in manifest * Container types (struct, list, map) with no single column index * Non-name field references (positional, etc.) Testing: - Added GetOrcColumnIndex_TopLevelFields test * Verifies resolution of simple top-level fields * Tests non-existent field returns nullopt - Added GetOrcColumnIndex_NestedFields test * Verifies nested field traversal through struct * Tests container field returns nullopt (no single column) * Tests invalid nested paths return nullopt Design follows Parquet's ResolveOneFieldRef pattern adapted for ORC's manifest structure. VERIFICATION STATUS: Build/test verification pending due to network restrictions preventing CMake from downloading dependencies. Previous session (Task #2) verified all code compiles and tests pass. This implementation follows established patterns exactly and includes the necessary <optional> header. Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
cbb330
added a commit
that referenced
this pull request
Feb 20, 2026
|
Thanks for opening a pull request! If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project. Then could you also rename the pull request title in the following format? or See also: |
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
Adds comprehensive task tracking and progress documentation for the ongoing ORC predicate pushdown implementation project. ## Changes - task_list.json: Complete 35-task breakdown with dependencies - Tasks #0, #0.5, #1, #2 marked as complete (on feature branches) - Tasks #3-apache#35 pending implementation - Organized by phase: Prerequisites, Core, Metadata, Predicate, Scan, Testing, Future - claude-progress.txt: Comprehensive project status document - Codebase structure and build instructions - Work completed on feature branches (not yet merged) - Current main branch state - Next steps and implementation strategy - Parquet mirroring patterns and Allium spec alignment ## Context This is an initialization session to establish baseline tracking for the ORC predicate pushdown project. Previous sessions (1-4) completed initial tasks on feature branches. This consolidates that progress and provides a clear roadmap for future implementation sessions. ## Related Work - Allium spec: orc-predicate-pushdown.allium (already on main) - Feature branches: task-0-statistics-api-v2, task-0.5-stripe-selective-reading, task-1-orc-schema-manifest, task-2-build-orc-schema-manifest (not yet merged) ## Next Steps Future sessions will implement tasks #3+ via individual feature branch PRs.
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
- Added GetOrcColumnIndex function to resolve FieldRef to ORC column index - Handles top-level fields via direct lookup - Handles nested fields via manifest tree traversal - Returns nullopt if field not found or not a leaf field - Only leaf fields have statistics and valid column indices - Added <optional> include for std::optional support Verified: Code structure follows Parquet pattern
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
Implemented the GetOrcColumnIndex function that resolves field references to ORC physical column indices using the schema manifest. This is a critical component for predicate pushdown to map Arrow field references to ORC columns for statistics lookup. Implementation details: - Added GetOrcColumnIndex() in internal namespace (file_orc.cc) - Handles top-level field resolution (simple name references) - Handles nested field resolution (traverses manifest tree) - Returns std::nullopt for: * Fields not found in manifest * Container types (struct, list, map) with no single column index * Non-name field references (positional, etc.) Testing: - Added GetOrcColumnIndex_TopLevelFields test * Verifies resolution of simple top-level fields * Tests non-existent field returns nullopt - Added GetOrcColumnIndex_NestedFields test * Verifies nested field traversal through struct * Tests container field returns nullopt (no single column) * Tests invalid nested paths return nullopt Design follows Parquet's ResolveOneFieldRef pattern adapted for ORC's manifest structure. VERIFICATION STATUS: Build/test verification pending due to network restrictions preventing CMake from downloading dependencies. Previous session (Task #2) verified all code compiles and tests pass. This implementation follows established patterns exactly and includes the necessary <optional> header. Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
Added GetORCType() method to ORCFileReader that returns a pointer to the ORC Type object. This is needed for building schema manifests that map Arrow schema fields to ORC physical column indices. The ORC type tree uses depth-first pre-order numbering where column 0 is the root struct, column 1 is the first top-level field, etc. Returns const void* to avoid exposing ORC headers in the public Arrow API. Callers should cast to const orc::Type* to use. Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330
added a commit
that referenced
this pull request
Feb 24, 2026
Task #3 (Add GetORCType accessor to expose ORC type tree) was completed and merged in PR apache#137.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Updates task_list.json to mark Task #0 as complete following successful merge of PR #2.