Skip to content

Comments

Mark Task #5 as complete#15

Merged
cbb330 merged 1 commit intomainfrom
mark-task-5-complete
Feb 20, 2026
Merged

Mark Task #5 as complete#15
cbb330 merged 1 commit intomainfrom
mark-task-5-complete

Conversation

@cbb330
Copy link
Owner

@cbb330 cbb330 commented Feb 20, 2026

Task #5 was already implemented as part of Task #4. The StripeStatisticsCache structure includes all required fields and thread safety is provided by the physical_schema_mutex_ from the Fragment base class.

@cbb330 cbb330 merged commit c04af84 into main Feb 20, 2026
7 of 10 checks passed
@cbb330 cbb330 deleted the mark-task-5-complete branch February 20, 2026 22:33
cbb330 added a commit that referenced this pull request Feb 20, 2026
- Added GetOrcColumnIndex function to resolve FieldRef to ORC column index
- Handles top-level fields via direct lookup
- Handles nested fields via manifest tree traversal
- Returns nullopt if field not found or not a leaf field
- Only leaf fields have statistics and valid column indices
- Added <optional> include for std::optional support

Verified: Code structure follows Parquet pattern
cbb330 added a commit that referenced this pull request Feb 20, 2026
- Implements fast-path row counting using stripe statistics
- Returns count directly if predicate can be fully evaluated from metadata
- Falls back to full scan (returns nullopt) if statistics insufficient

Algorithm:
1. No field refs (e.g., WHERE 1=1): sum all stripe row counts
2. With predicates: evaluate against each stripe via TestStripes
   - literal(false): exclude stripe from count
   - literal(true): include stripe row count
   - partial match: return nullopt (need full scan)

Mirrors Parquet's TryCountRows (file_parquet.cc:986-1004)

Use case: COUNT(*) WHERE x > 1000 on file with strong statistics
can avoid reading data entirely if statistics prove stripes match/exclude.

Verified: Logic matches Parquet pattern, uses correct ORC APIs

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330 added a commit that referenced this pull request Feb 20, 2026
@github-actions
Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

See also:

cbb330 added a commit that referenced this pull request Feb 24, 2026
- Added GetOrcColumnIndex function to resolve FieldRef to ORC column index
- Handles top-level fields via direct lookup
- Handles nested fields via manifest tree traversal
- Returns nullopt if field not found or not a leaf field
- Only leaf fields have statistics and valid column indices
- Added <optional> include for std::optional support

Verified: Code structure follows Parquet pattern
cbb330 added a commit that referenced this pull request Feb 24, 2026
- Implements fast-path row counting using stripe statistics
- Returns count directly if predicate can be fully evaluated from metadata
- Falls back to full scan (returns nullopt) if statistics insufficient

Algorithm:
1. No field refs (e.g., WHERE 1=1): sum all stripe row counts
2. With predicates: evaluate against each stripe via TestStripes
   - literal(false): exclude stripe from count
   - literal(true): include stripe row count
   - partial match: return nullopt (need full scan)

Mirrors Parquet's TryCountRows (file_parquet.cc:986-1004)

Use case: COUNT(*) WHERE x > 1000 on file with strong statistics
can avoid reading data entirely if statistics prove stripes match/exclude.

Verified: Logic matches Parquet pattern, uses correct ORC APIs

Co-authored-by: Claude Sonnet 4.5 <[email protected]>
cbb330 added a commit that referenced this pull request Feb 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant