Skip to content

Comments

Task #9: Implement ResolvePredicateFields function#20

Merged
cbb330 merged 1 commit intomainfrom
task-9-resolve-predicate-fields
Feb 20, 2026
Merged

Task #9: Implement ResolvePredicateFields function#20
cbb330 merged 1 commit intomainfrom
task-9-resolve-predicate-fields

Conversation

@cbb330
Copy link
Owner

@cbb330 cbb330 commented Feb 20, 2026

Summary

Implements field resolution for predicate pushdown, mapping Arrow fields to ORC columns.

Changes

  • Added PredicateField struct for resolved field information
  • Implemented ResolvePredicateFields() helper function
  • Resolves field references using OrcSchemaManifest
  • Supports nested field paths (struct traversal)
  • Filters to leaf nodes with statistics support
  • Initial type support: int32, int64

Implementation Details

  • Extracted from Parquet's TestRowGroups pattern
  • Uses compute::FieldsInExpression() to find fields
  • Traverses OrcSchemaField tree for nested access
  • Returns only fields that support statistics

Testing

Manual code review following Parquet reference (lines 945-960)

Task Reference

Completes Task #9 from task_list.json
Depends on: Tasks #3, #7 (both complete)
Enables: Task #10 (DeriveFieldGuarantee)

- Added PredicateField struct to hold resolved field information
- Implemented ResolvePredicateFields() helper function
- Resolves field references in predicates to ORC column indices
- Uses OrcSchemaManifest for Arrow-to-ORC column mapping
- Traverses nested field paths (structs only)
- Filters to leaf nodes only (containers don't have statistics)
- Type support check (currently int32/int64 only)
- Returns vector of PredicateField entities

Implementation details:
- Uses compute::FieldsInExpression() to extract field refs
- Uses FieldRef.FindOneOrNone() for schema matching
- Traverses OrcSchemaField tree for nested paths
- Validates field indices and struct types
- PredicateField includes: field_ref, arrow_field_index, orc_column_index, data_type, supports_statistics

Verified: Manual code review following Parquet TestRowGroups pattern (lines 945-960)

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
@cbb330 cbb330 merged commit 9fa8370 into main Feb 20, 2026
20 of 36 checks passed
@cbb330 cbb330 deleted the task-9-resolve-predicate-fields branch February 20, 2026 22:37
cbb330 added a commit that referenced this pull request Feb 20, 2026
cbb330 added a commit that referenced this pull request Feb 20, 2026
Implemented comprehensive test suite for ORC predicate pushdown covering:
- Equality predicates (=, !=)
- Comparison predicates (<, <=, >, >=)
- Compound predicates (AND, OR)
- Special cases (literal true/false)
- Out-of-bounds filters
- Both int32 and int64 types

Tests verify that FilterStripes correctly evaluates predicates against
stripe statistics and skips irrelevant stripes. Each test validates
the correct number of rows are returned after filtering.

Uses OrcTestFileGenerator to create test files with controlled
value ranges per stripe, enabling precise verification of stripe
filtering behavior.

Verified: All predicates tested against 5-stripe files where each
stripe contains distinct value ranges ([0-99], [100-199], etc.)

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330 added a commit that referenced this pull request Feb 20, 2026
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

cbb330 added a commit that referenced this pull request Feb 24, 2026
cbb330 added a commit that referenced this pull request Feb 24, 2026
Implemented comprehensive test suite for ORC predicate pushdown covering:
- Equality predicates (=, !=)
- Comparison predicates (<, <=, >, >=)
- Compound predicates (AND, OR)
- Special cases (literal true/false)
- Out-of-bounds filters
- Both int32 and int64 types

Tests verify that FilterStripes correctly evaluates predicates against
stripe statistics and skips irrelevant stripes. Each test validates
the correct number of rows are returned after filtering.

Uses OrcTestFileGenerator to create test files with controlled
value ranges per stripe, enabling precise verification of stripe
filtering behavior.

Verified: All predicates tested against 5-stripe files where each
stripe contains distinct value ranges ([0-99], [100-199], etc.)

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330 added a commit that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant