Skip to content

Comments

Mark Task #9 as complete#23

Merged
cbb330 merged 1 commit intomainfrom
mark-task-9-complete-final
Feb 20, 2026
Merged

Mark Task #9 as complete#23
cbb330 merged 1 commit intomainfrom
mark-task-9-complete-final

Conversation

@cbb330
Copy link
Owner

@cbb330 cbb330 commented Feb 20, 2026

Final update marking Task #9 complete.

@cbb330 cbb330 merged commit eeb0772 into main Feb 20, 2026
2 of 4 checks passed
@cbb330 cbb330 deleted the mark-task-9-complete-final branch February 20, 2026 22:40
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

cbb330 added a commit that referenced this pull request Feb 24, 2026
- Added SupportsStatistics helper to check if type supports statistics pushdown
- Created PredicateField struct to hold field resolution information
- Implemented ResolvePredicateFields to extract and resolve field references from predicates
- Currently supports int32 and int64 types
- Skips non-leaf fields and unsupported types
- Handles nested field resolution correctly

Verified:
- Resolves field references using FieldsInExpression
- Uses GetOrcColumnIndex for ORC column mapping
- Handles nested structs by traversing match indices
- Returns comprehensive field information for statistics evaluation
cbb330 added a commit that referenced this pull request Feb 24, 2026
Implemented test to verify ORC metadata caching behavior.

Tests verify that:
1. Metadata is loaded on first scan
2. Subsequent scans reuse cached metadata (read fewer bytes)
3. ClearCachedMetadata() invalidates the cache
4. Scans after cache clear reload metadata

Uses io::TrackedRandomAccessFile to monitor bytes read and verify
that cached metadata reduces I/O on subsequent operations.

The test validates the metadata caching implementation in
OrcFileFragment::EnsureFileMetadataCached(), which lazily loads
and caches ORC file metadata to avoid redundant I/O.

Verified: Metadata caching works correctly, reducing I/O overhead
on repeated scans.

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330 added a commit that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant