Skip to content

Conversation

@dependabot
Copy link

@dependabot dependabot bot commented on behalf of github Oct 21, 2022

Updates the requirements on cranelift to permit the latest version.

Commits

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot merge will merge this PR after your CI passes on it
  • @dependabot squash and merge will squash and merge this PR after your CI passes on it
  • @dependabot cancel merge will cancel a previously requested merge and block automerging
  • @dependabot reopen will reopen this PR if it is closed
  • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Oct 21, 2022

The following labels could not be found: auto-dependencies.

@dependabot @github
Copy link
Author

dependabot bot commented on behalf of github Oct 26, 2022

Looks like cranelift is up-to-date now, so this is no longer needed.

@dependabot dependabot bot closed this Oct 26, 2022
@dependabot dependabot bot deleted the dependabot/cargo/master/cranelift-0.89.0 branch October 26, 2022 01:10
comphead added a commit that referenced this pull request Oct 16, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.
Related to apache#18084

## Rationale for this change

Run extended suite on PRs for critical areas, to avoid post merge
bugfixing

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead added a commit that referenced this pull request Oct 16, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

Followup on
apache#18063 (review)

## Rationale for this change

Use cheaper `NullBuffer::union` to apply null mask instead of iterator
approach

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead added a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

Followup on
apache#18063 (review)

## Rationale for this change

Use cheaper `NullBuffer::union` to apply null mask instead of iterator
approach

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

(cherry picked from commit 337378a)
comphead added a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

Followup on
apache#18063 (review)

## Rationale for this change

Use cheaper `NullBuffer::union` to apply null mask instead of iterator
approach

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

(cherry picked from commit 337378a)
comphead pushed a commit that referenced this pull request Oct 17, 2025
…unctions in proto (apache#18024)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17417.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

- Support `null_treatment`, `distinct`, and `filter` for window function
in proto.
- Support `null_treatment` for aggregate udf in proto.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- [x] Add `null_treatment`, `distinct`, `filter` fields to
`WindowExprNode` message and handle them in `to/from_proto.rs`.
- [x] Add `null_treatment` field to `AggregateUDFExprNode` message and
handle them in `to/from_proto.rs`.
- [ ] Docs update: I'm not sure where to add docs as declared in the
issue description.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

- Add tests to `roundtrip_window` for respectnulls, ignorenulls,
distinct, filter.
- Add tests to `roundtrip_aggregate_udf` for respectnulls, ignorenulls.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
N/A

---------

Co-authored-by: Jeffrey Vo <[email protected]>
comphead pushed a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Doesn't close an issue.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Hi we are hiop, a Serverless Data Logistic Platform.
We use DataFusion as a core part of our backend engine, and it plays a
crucial role in our data infrastructure. Our team members are passionate
about the project and actively try contribute to its development
(@dariocurr).

We’d love to have Hiop listed among the Known Users to show our support
and help the DataFusion community continue to grow.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Just adding hiop as known user

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 17, 2025
…8117)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#3695
- Closes apache#3797

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Was looking at above issues and I don't believe we skip the failed rules
for any tests anymore (default for the config is also `false`), apart
from this cleanup, so filing this PR so we can close the issues. Seems
we only do in this `window.slt` test after this fix:


https://github.com/apache/datafusion/blob/621a24978a7a9c6d2b27973d1853dbc8776a56b5/datafusion/sqllogictest/test_files/window.slt#L2587-L2611

Which seems intentional.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Remove unnecessary `skip_failed_rules` config.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 17, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
`EXPLAIN ANALYZE` can be used for profiling and displays the results
alongside the EXPLAIN plan. The issue is that it currently shows too
many low-level details. It would provide a better user experience if
only the most commonly used metrics were shown by default, with more
detailed metrics available through specific configuration options.

### Example
In `datafusion-cli`:
```
> CREATE EXTERNAL TABLE IF NOT EXISTS lineitem
STORED AS parquet
LOCATION '/Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem';
0 row(s) fetched.
Elapsed 0.000 seconds.

explain analyze select *
from lineitem
where l_orderkey = 3000000;
```
The parquet reader includes a large number of low-level details:
```
metrics=[output_rows=19813, elapsed_compute=14ns, batches_split=0, bytes_scanned=2147308, file_open_errors=0, file_scan_errors=0, files_ranges_pruned_statistics=18, num_predicate_creation_errors=0, page_index_rows_matched=19813, page_index_rows_pruned=729088, predicate_cache_inner_records=0, predicate_cache_records=0, predicate_evaluation_errors=0, pushdown_rows_matched=0, pushdown_rows_pruned=0, row_groups_matched_bloom_filter=0, row_groups_matched_statistics=1, row_groups_pruned_bloom_filter=0, row_groups_pruned_statistics=0, bloom_filter_eval_time=21.997µs, metadata_load_time=273.83µs, page_index_eval_time=29.915µs, row_pushdown_eval_time=42ns, statistics_eval_time=76.248µs, time_elapsed_opening=4.02146ms, time_elapsed_processing=24.787461ms, time_elapsed_scanning_total=24.17671ms, time_elapsed_scanning_until_data=23.103665ms]
```

I believe only a subset of it is commonly used, for example
`output_rows`, `metadata_load_time`, and how many file/row-group/pages
are pruned, and it would better to only display the most common ones by
default.

### Existing `VERBOSE` keyword
There is a existing verbose keyword in `EXPLAIN ANALYZE VERBOSE`,
however it's turning on per-partition metrics instead of controlling
detail level. I think it would be hard to mix this partition control and
the detail level introduced in this PR, so they're separated: the
following config will be used for detail level and the semantics of
`EXPLAIN ANALYZE VERBOSE` keep unchanged.

### This PR: configurable explain analyze level
1. Introduced a new config option `datafusion.explain.analyze_level`.
When set to `dev` (default value), all existing metrics will be shown.
If set to `summary`, only `BaselineMetrics` will be displayed (i.e.
`output_rows` and `elapsed_compute`).
Note now we only include `BaselineMetrics` for simplicity, in the
follow-up PRs we can figure out what's the commonly used metrics for
each operator, and add them to `summary` analyze level, finally set the
`summary` analyze level to default.
2. Add a `MetricType` field associated with `Metric` for detail level or
potentially category in the future. For different configurations, a
certain `MetricType` set will be shown accordingly.

#### Demo
```
-- continuing the above example
> set datafusion.explain.analyze_level = summary;
0 row(s) fetched.
Elapsed 0.000 seconds.

> explain analyze select *
from lineitem
where l_orderkey = 3000000;
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type         | plan                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=5, elapsed_compute=25.339µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
|                   |   FilterExec: l_orderkey@0 = 3000000, metrics=[output_rows=5, elapsed_compute=81.221µs]                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |
|                   |     DataSourceExec: file_groups={14 groups: [[Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:0..11525426], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-0.parquet:11525426..20311205, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:0..2739647], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:2739647..14265073], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-1.parquet:14265073..20193593, Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:0..5596906], [Users/yongting/Code/datafusion/benchmarks/data/tpch_sf1/lineitem/part-2.parquet:5596906..17122332], ...]}, projection=[l_orderkey, l_partkey, l_suppkey, l_linenumber, l_quantity, l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate, l_commitdate, l_receiptdate, l_shipinstruct, l_shipmode, l_comment], file_type=parquet, predicate=l_orderkey@0 = 3000000, pruning_predicate=l_orderkey_null_count@2 != row_count@3 AND l_orderkey_min@0 <= 3000000 AND 3000000 <= l_orderkey_max@1, required_guarantees=[l_orderkey in (3000000)], metrics=[output_rows=19813, elapsed_compute=14ns] |
|                   |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
+-------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched.
Elapsed 0.025 seconds.
```
Only `BaselineMetrics` are shown.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
UT

## Are there any user-facing changes?

No
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 17, 2025
…e#18091)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

There's a few functions in
`datafusion/expr-common/src/type_coercion/aggregates.rs` that are unused
elsewhere in the codebase, likely a remnant before the refactor to UDF,
so removing them. Some are still used (`coerce_avg_type()` and
`avg_return_type()`) so these are inlined into the Avg aggregate
function (similar to Sum). Also refactor some window functions to use
already available macros.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Remove some unused functions
- Inline avg coerce & return type logic
- Refactor Spark Avg a bit to remove unnecessary code
- Refactor ntile & nth window functions to use available macros

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

Yes as these functions were publicly exported; however I'm not sure they
were meant to be used by users anyway, given what they do.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 17, 2025
apache#18099)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Fixes comparison errors when using dictionary-encoded types with
comparison functions like NULLIF.

## Rationale for this change

When using dictionary-encoded columns (e.g., Dictionary(Int32, Utf8)) in
comparison operations with literals or other types, DataFusion would
throw an error stating the types are not comparable. This was
particularly problematic for functions like NULLIF which rely on
comparison coercion.

The issue was that comparison_coercion_numeric didn't handle dictionary
types, even though the general comparison_coercion function did have
dictionary support.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

1. Refactored dictionary comparison logic: Extracted common dictionary
coercion logic into dictionary_comparison_coercion_generic to avoid code
duplication.
2. Added numeric-specific dictionary coercion: Introduced
dictionary_comparison_coercion_numeric that uses numeric-preferring
comparison rules when dealing with dictionary value types.
3. Updated comparison_coercion_numeric: Added a call to
dictionary_comparison_coercion_numeric in the coercion chain to properly
handle dictionary types.
4. Added sqllogictest cases demonstrating the fix works for various
dictionary comparison scenarios.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes, added tests in datafusion/sqllogictest/test_files/nullif.slt
covering:
  - Dictionary type compared with string literal
  - String compared with dictionary type
  - Dictionary compared with dictionary

All tests pass with the fix and would fail without it.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

This is a bug fix that enables previously failing queries to work
correctly. No breaking changes or API modifications.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
alamb added a commit that referenced this pull request Oct 18, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18135 
## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead added a commit that referenced this pull request Oct 24, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 24, 2025
…pache#17478) (apache#18130)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Related to apache#17405
- Related to apache#18072

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

See apache#17478

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
 
See apache#17478

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Filippo Rossi <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 24, 2025
…nics (apache#18013) (apache#18131)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Related to apache/datafusion-comet#2539
- Related to apache#18072

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Return errors rather than panicking.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Co-authored-by: Andy Grove <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 24, 2025
…amps (apache#17777) (apache#18129)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Related to apache#17776
- Related to apache#18072

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

If I've understood semantic equality correctly, any two timestamps
should meet the bar for equality regardless of time units and timezones,
but the current code doesn't reflect that.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Adds a branch to this method for timestamps.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Yes

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Shiv Bhatia <[email protected]>
Co-authored-by: Shiv Bhatia <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 24, 2025
…e#18161) (apache#18179)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123. -->

- Related to apache#18070
- Part of apache#18072

## Rationale for this change

Fix performance regression in Datafusion 50

## What changes are included in this PR?

Backport apache#18161 to `branch-50`

## Are these changes tested?

Yes
## Are there any user-facing changes?

Fix performance regression

Co-authored-by: Yongting You <[email protected]>
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16678.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The issue has been fixed in apache#16639, this PR just adds a testcase for it.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Add a test case for `to_timestamp(double)` with vectorized input.
Similar to the one presented in the issue.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#18070

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
See the above issue and its comment
apache#18070 (comment)

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
In nested loop join, when the join column includes `List(Utf8View)`, use
`take()` instead of `to_array_of_size()` to avoid deep copying the utf8
buffers inside `Utf8View` array.

This is the quick fix, avoiding deep copy inside `to_array_of_size()` is
a bit tricky.
Here is `ListArray`'s physical layout:
https://arrow.apache.org/rust/arrow/array/struct.GenericListArray.html
If multiple elements is pointing to the same list range, the underlying
payload can't be reused.So the potential fix in `to_array_of_size` can
only avoids copying the inner-inner utf8view array buffers, but can't
avoid copying the inner array (i.e. views are still copied), and deep
copying for other primitive types also can't be avoided. Seems this can
be better solved when `ListView` type is ready 🤔

### Benchmark
I tried query 1 in apache#18070,
but only used 3 randomly sampled `places` parquet file.

49.0.0: 4s
50.0.0: stuck > 1 minute
PR: 4s

Now the performance are similar, I suspect the most time is spend
evaluating the expensive `array_has` so the optimization in
apache#16996 can't help much.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
Existing tests
## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17913 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
- Improve SQL code block rendering by upgrading `pydata-sphinx-theme`
- fix sidebar layout 

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
4. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->
yes

## Are there any user-facing changes?

documentation ui
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- part of apache#17427 

## Rationale for this change
Adds regular joins (left, right, full, inner) for PWMJ as they behave
differently in the code path.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
Adds classic join + physical planner

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes SLT tests  + unit tests

## Follow up work to this pull request
- Handling partitioned queries and multiple record batches (fuzz testing
will be handled with this)
- Simplify physical planning
- Add more unit tests for different types (another pr as the LOC in this
pr is getting a little daunting)

next would be to implement the existence joins

---------

Co-authored-by: Yongting You <[email protected]>
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17854 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 27, 2025
…pache#18017)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#17993

## Rationale for this change

```
DataFusion CLI v50.1.0
> SET TIME ZONE = '+08:00';
0 row(s) fetched. 
Elapsed 0.011 seconds.

> SELECT arrow_typeof(now());
+---------------------------------------+
| arrow_typeof(now())                   |
+---------------------------------------+
| Timestamp(Nanosecond, Some("+08:00")) |
+---------------------------------------+
1 row(s) fetched. 
Elapsed 0.015 seconds.

> SELECT count(1) result FROM (SELECT now() as n) a WHERE n > '2000-01-01'::date;
+--------+
| result |
+--------+
| 1      |
+--------+
1 row(s) fetched. 
Elapsed 0.029 seconds.
```

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
When the timezone changes, re-register `now()` function

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

According to three-valued logic we should return `null` and that's also
what happens when the argument is not a constant as seen in the test.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Updated `ArrayHas::simplify` to explicitly handle `null`

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Updated the `array_has` SQL test and added unit tests

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

Yes, a minor change in behaviour wrt `null`

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#16820 .

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Oct 27, 2025
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#16602

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Now we have to search in the code comment (or even implementation) to
find the documentation of certain metrics, it would be better to open a
page in the `user-guide` for metrics.

The doc has to be manually updated, the metrics construction is
scattered in the codebase, so it's hard to make it auto-generated.

This PR only includes 2 common metrics, I plan to add more
operator-specific metrics while working on
apache#18116

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…ition is provided (apache#19553)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->


Replace `.chars().skip().collect::<String>()` with zero-copy string
slicing using `char_indices()` to find the byte offset, then slice with
`&value[byte_offset..]`.

This eliminates unnecessary String allocation per row when a start
position is specified.

Changes:
- Use char_indices().nth() to find byte offset for start position
(1-based)
  - Use string slicing &value[byte_offset..] instead of collecting chars
  - Added benchmark to measure performance improvements

Optimization:
- Before: Allocated new String via .collect() for each row with start
position
  - After: Uses zero-copy string slice

Benchmark results:
- size=1024, str_len=32: 96.361 µs -> 41.458 µs (57.0% faster, 2.3x
speedup)
- size=1024, str_len=128: 210.16 µs -> 56.064 µs (73.3% faster, 3.7x
speedup)
- size=4096, str_len=32: 376.90 µs -> 162.98 µs (56.8% faster, 2.3x
speedup)
- size=4096, str_len=128: 855.68 µs -> 263.61 µs (69.2% faster, 3.2x
speedup)

The optimization shows greater improvements for longer strings (up to
73% faster) since string slicing is O(1) regardless of length, while the
previous approach had allocation costs that grew with string length.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

Closes apache#17527

## Rationale for this change

Currently, DataFusion computes bounds for all queries that contain a
HashJoinExec node whenever the option enable_dynamic_filter_pushdown is
set to true (default). It might make sense to compute these bounds only
when we explicitly know there is a consumer that will use them.

## What changes are included in this PR?

As suggested in
apache#17527 (comment),
this PR adds an is_used() method to DynamicFilterPhysicalExpr that
checks if any consumers are holding a reference to the filter using
Arc::strong_count().

During filter pushdown, consumers that accept the filter and use it
later in execution have to retain a reference to Arc. For example, scan
nodes like ParquetSource.

## Are these changes tested?

I added a unit test in dynamic_filters.rs (test_is_used) that verifies
the Arc reference counting behavior.
Existing integration tests in
datafusion/core/tests/physical_optimizer/filter_pushdown/mod.rs validate
the end-to-end behavior. These tests verify that dynamic filters are
computed and filled when consumers are present.



## Are there any user-facing changes?

new is_used() function
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Ref apache#15873

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Implemented partition_statistics for WindowAggExec.

## Are these changes tested?

Check the unit tests.

## Are there any user-facing changes?

No
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#19569

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

I ran microbenchmarks comparing DataFusion with DuckDB for string
functions (see apache/datafusion-benchmarks#26)
and noticed that DF was very slow for `md5`.

This PR improves performance:

| Benchmark                  | Before | After  | Speedup     |
|----------------------------|--------|--------|-------------|
| md5_array (1024 strings)   | 206 µs | 100 µs | 2.1x faster |
| md5_scalar (single string) | 337 ns | 221 ns | 1.5x faster |

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Avoid using `write!` with a format string and use a more efficient
approach

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…#19468)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#15873

## Rationale for this change

After the introduction of the `partition_statistics` API there are some
operators which don't support this new API, the goal is to update the
operators with new API.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Updated `NestedLoopJoinExec` with the new `partition_statistics`

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

- Added relevant tests 

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#19025.

## Rationale for this change

Previously we needed to create timestamp and then extract the time
component from that.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Added the function for `to_time`
- Relevant slt test
- Relevant unit tests
- Updated docs

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

- All tests pass

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Jeffrey Vo <[email protected]>
comphead added a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Initial work before tackling apache#19004

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Found lots of code duplicated here, so unifying them.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Introduce new trait `UDFCoercionExt` to unify functions across
scalar/aggregate/window UDFs for use in a single generic implementation.
New unified functions `fields_with_udf` and `get_valid_types_with_udf`
which are generic across this new trait.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

Yes, deprecating functions.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Oleks V <[email protected]>
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- N/A

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Add new functions: hour, minute, and second.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Implementation
- Unit Tests
- slt tests

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, tests added as part of this PR.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No, these are new functions.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Jefffrey <[email protected]>
comphead pushed a commit that referenced this pull request Jan 6, 2026
…he#19572)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

For non-ASCII strings, the original implementation used string.find() to
get the byte index, then counted characters up to that byte index. This
required two passes through the string.

This optimization uses char_indices() to find the substring while
simultaneously tracking character positions, completing the search in a
single pass.

Benchmark results (UTF-8 strings):
- str_len_8:    188.98 µs → 140.54 µs  (25.4% faster)
- str_len_32:   615.69 µs → 294.15 µs  (52.2% faster)
- str_len_128:  2.2707 ms → 1.2462 ms  (45.1% faster)
- str_len_4096: 74.328 ms → 36.538 ms  (50.9% faster)

ASCII performance unchanged (already optimized with fast path).

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead added a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Adding documentation to run TPCDS benchmarks for the existing PRs
- Fixing some nits

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change
benchmark
```
factorial_array         time:   [778.63 ns 783.03 ns 787.52 ns]
                        change: [−90.314% −90.211% −90.055%] (p = 0.00 < 0.05)
                        Performance has improved.

factorial_scalar        time:   [317.32 ns 319.36 ns 321.41 ns]
                        change: [−28.219% −27.508% −26.683%] (p = 0.00 < 0.05)
                        Performance has improved.
```
<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?
1. factorial implementation from using loop to looking up precomputed
values.
2. using the `try_unary` kernel function on `PrimitiveArray`. (However,
since this isn’t `unary` function, it does not get vectorized execution)

and add benchmark
<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
No.
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#12576

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Improve performance.

### ltrim

| Scenario                   | Before  | After   | Improvement |
|----------------------------|---------|---------|-------------|
| short strings, string_view | 77.0 µs | 51.2 µs | +33.6%      |
| short strings, string      | 77.9 µs | 47.9 µs | +38.5%      |
| long strings, short trim   | 77.3 µs | 49.1 µs | +36.4%      |

### rtrim

| Scenario                   | Before  | After   | Improvement |
|----------------------------|---------|---------|-------------|
| short strings, string_view | 80.6 µs | 45.6 µs | +43.4%      |
| short strings, string      | 80.6 µs | 47.4 µs | +41.3%      |
| long strings, short trim   | 78.0 µs | 44.9 µs | +42.5%      |

### btrim

| Scenario                   | Before   | After   | Improvement |
|----------------------------|----------|---------|-------------|
| short strings, string_view | 106.4 µs | 77.3 µs | +27.4%      |
| short strings, string      | 106.3 µs | 75.1 µs | +29.4%      |
| long strings, short trim   | 109.7 µs | 74.5 µs | +32.1%      |

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Optimizations - precompute pattern instead of computing per row
- Improve benchmark to cover ltrim, rtrim, btrim

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…dep (apache#19620)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Supersedes apache#19010

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

We don't actually need the dependency on `testcontainers` per their
README:

> **Note**: you don't need to explicitly depend on `testcontainers` as
it's re-exported dependency of `testcontainers-modules` with aligned
version between these crates.

-
https://github.com/testcontainers/testcontainers-rs-modules-community/blob/331abcc6e61d9d76e5f8e6ec91566ce874d8fc32/README.md

It was causing some version conflict issues when we tried to bump
`testcontainers-modules` so remove it and keep only modules, fixing the
code to use the re-exports.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Remove `testcontainers` dependency, amend imports to use re-exports of
`testcontainers-modules`.

Bump `testcontainers-modules` from 0.13 to 0.14.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

CLI test suite compiles and runs. Ran the SLT postgres compat mode
locally successfully:

```sh
datafusion (testcontainers-0.14)$ PG_COMPAT=true PG_URI="postgresql://[email protected]/postgres" cargo test --features=postgres --test sqllogictests
    Finished `test` profile [unoptimized + debuginfo] target(s) in 0.19s
     Running bin/sqllogictests.rs (/Users/jeffrey/.cargo_target_cache/debug/deps/sqllogictests-223c346f75c524e2)
Completed 6 test files in 0 seconds 
```

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…les (apache#19607)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#19605.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

The `calculate_range` function creates invalid byte ranges (where `start
> end`) when reading single-line CSV/JSON files that are split into
multiple partitions. This causes an error like:

```
ObjectStore(Generic { store: "S3", source: Inconsistent { start: 1149247, end: 1149246 } })
```

When `find_first_newline` doesn't find a newline (single-line file), it
returns the remaining file length. This causes `start + start_delta` to
exceed `end + end_delta`, creating an invalid range. The current check
only handles `range.start == range.end`, not `range.start > range.end`.


## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

1. Added an early termination check after computing `start_delta`: if
the first newline after `start` is beyond the partition boundary (`start
+ start_delta > end`), return `TerminateEarly` since no complete records
exist in this partition.

2. Changed the final range validation from `==` to `>=` as a safety net
for edge cases.

3. Added a regression test that reproduces the bug with a single-line
file split into partitions.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes, added `test_calculate_range_single_line_file` which:
- Creates a single-line JSON file without new-lines
- Simulates partition 2 (middle to end of file)
- Verifies that `calculate_range` returns `TerminateEarly` instead of an
invalid range

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

No
comphead pushed a commit that referenced this pull request Jan 6, 2026
…pache#19567)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#15873.

## Rationale for this change

- The goal is to replace the old `statistics` API with the new
`partition_statistics`


<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?


- implemented the partition_statistics API for SortMergeJoinExec
- Removed the special case that returned unknown statistics for specific
partitions.
- Added `test_partition_statistics` to verify both aggregate and
specific partition statistics work correctly

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

- Addes new tests

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…che#19648)

Change-Id: Id62d32d14fa19b34b592e186e7962bb96a6a6964

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#19393 

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
Make datafusion doc great
## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->
Move “Refactoring of `FileSource` constructors and
`FileScanConfigBuilder` to accept schemas upfront" section to right
place.
## Are these changes tested?
No need for code test.
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->
No
<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

Signed-off-by: mag1cian <[email protected]>
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Related to apache#18092

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

Current signature for `percentile_cont` is quite confusing in what types
it accepts. Currently the code suggests it accepts all numeric types,
where floats & decimals are maintained as is, integers are cast to
float64 internally. However this is misleading as `NUMERICS` does not
contain decimals, so currently everything is cast to float.

Clean up the code to make this more clear and do various other
refactors.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

Use coercion signature to have type coercion coerce types to float so we
can remove internal code related to casting to float or handling
non-float types.

Various other refactors.

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Existing tests.

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

No.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
The previous implementation of `HashTableLookupExpr::evaluate` relied on
per-row calls to `get_matched_indices`, which incurred unnecessary
performance overhead:
1. **Memory Overhead**: Each per-row call triggered small `Vec`
allocations and potential resizes, leading to pressure on the memory
allocator.
2. **Redundant Computation**: `get_matched_indices` traverses the entire
hash chain to find all matches, which is unnecessary when we only need
to verify the existence of a key.




### Performance Results (TPC-H)
The following TPC-H results were obtained with
**`DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true`:**

```
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Query        ┃ baseline@9a9ff ┃  optimized ┃        Change ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
│ QQuery 1     │      679.51 ms │  728.06 ms │  1.07x slower │
│ QQuery 2     │      388.33 ms │  384.11 ms │     no change │
│ QQuery 3     │      864.38 ms │  856.27 ms │     no change │
│ QQuery 4     │      458.46 ms │  468.26 ms │     no change │
│ QQuery 5     │     1614.26 ms │ 1525.65 ms │ +1.06x faster │
│ QQuery 6     │      611.20 ms │  610.06 ms │     no change │
│ QQuery 7     │      950.39 ms │  940.13 ms │     no change │
│ QQuery 8     │     1214.86 ms │ 1218.21 ms │     no change │
│ QQuery 9     │     2657.61 ms │ 2482.09 ms │ +1.07x faster │
│ QQuery 10    │     1050.70 ms │ 1001.96 ms │     no change │
│ QQuery 11    │      383.92 ms │  347.27 ms │ +1.11x faster │
│ QQuery 12    │      963.14 ms │  920.78 ms │     no change │
│ QQuery 13    │      473.68 ms │  480.97 ms │     no change │
│ QQuery 14    │      363.36 ms │  345.27 ms │     no change │
│ QQuery 15    │      960.56 ms │  955.05 ms │     no change │
│ QQuery 16    │      281.95 ms │  267.34 ms │ +1.05x faster │
│ QQuery 17    │     5306.43 ms │ 4983.21 ms │ +1.06x faster │
│ QQuery 18    │     3415.11 ms │ 3016.52 ms │ +1.13x faster │
│ QQuery 19    │      761.67 ms │  759.49 ms │     no change │
│ QQuery 20    │      650.20 ms │  642.40 ms │     no change │
│ QQuery 21    │     3111.85 ms │ 2833.05 ms │ +1.10x faster │
│ QQuery 22    │      141.75 ms │  143.06 ms │     no change │
└──────────────┴────────────────┴────────────┴───────────────┘
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┓
┃ Benchmark Summary             ┃            ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━┩
│ Total Time (baseline@9a9ff)   │ 27303.30ms │
│ Total Time (optimized)        │ 25909.21ms │
│ Average Time (baseline@9a9ff) │  1241.06ms │
│ Average Time (optimized)      │  1177.69ms │
│ Queries Faster                │          7 │
│ Queries Slower                │          1 │
│ Queries with No Change        │         14 │
│ Queries with Failure          │          0 │
└───────────────────────────────┴────────────┘
```

Note that Q1 does not involve `HashJoin`.


#### Note on Configuration
Benchmarks were conducted with
`DATAFUSION_EXECUTION_PARQUET_PUSHDOWN_FILTERS=true` because
`HashTableLookupExpr::evaluate` is **NOT** invoked under default
settings.

I manually added `dbg!(&num_rows)` at [L335 in
`partitioned_hash_eval.rs`](https://github.com/apache/datafusion/blob/9a9ff8d6162b7391736b0b7c82c00cb35b0652a1/datafusion/physical-plan/src/joins/hash_join/partitioned_hash_eval.rs#L335)
and confirmed that the logic path is only triggered when this flag is
enabled. Under default settings, `HashTableLookupExpr::evaluate` is not
called; . I am uncertain if this current behavior is intentional.


## What changes are included in this PR?
- Added `JoinHashMapType::contain_hashes`: A new trait method that
processes
  a batch of hashes and updates a bitmask for existing keys.
- Refactored `HashTableLookupExpr::evaluate`: Switched from per-row
lookups to
  the new batch API.

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?
Yes
<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?
NO
<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 6, 2026
…026-0001 to get clean CI (apache#19657)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#19656

## Rationale for this change

CI is failing I think because aws-smithy-runtime was yanked

## What changes are included in this PR?

ran `cargo update` for this crate and then checked in the results:

```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ cargo update -p aws-smithy-runtime
    Updating crates.io index
     Locking 1 package to latest compatible version
 Downgrading aws-smithy-runtime v1.9.6 -> v1.9.5
note: pass `--verbose` to see 149 unchanged dependencies behind latest
```

## Are these changes tested?

I tested locally

```shell
cargo audit
```

## Are there any user-facing changes?
No this is a developmnt process only

---------

Co-authored-by: Jefffrey <[email protected]>
comphead pushed a commit that referenced this pull request Jan 6, 2026
…e#19388)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#19055.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

```
> CREATE EXTERNAL TABLE nyc_taxi_rides
STORED AS PARQUET LOCATION 's3://altinity-clickhouse-data/nyc_taxi_rides/data/tripdata_parquet/'
;
0 row(s) fetched. 
Elapsed 10.061 seconds.

> SELECT metadata_size_bytes, expires_in, unnest(metadata_list) FROM list_files_cache();
+---------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| metadata_size_bytes | expires_in | UNNEST(list_files_cache().metadata_list)                                                                                                                                                           |
+---------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200901.parquet, file_modified: 2025-05-30T09:44:23, file_size_bytes: 222192983, e_tag: "e8d016c3c7af80bf911d96387febe2c1-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200902.parquet, file_modified: 2025-05-30T09:46:00, file_size_bytes: 211023080, e_tag: "1021626ff5ef606422aa7121edd69f3b-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200903.parquet, file_modified: 2025-05-30T09:47:20, file_size_bytes: 229202874, e_tag: "96e7494b217099c6a07e9c4298cbe783-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200904.parquet, file_modified: 2025-05-30T09:44:37, file_size_bytes: 225659965, e_tag: "728c45fabdcd8e40bdef4dfc28df9b0f-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200905.parquet, file_modified: 2025-05-30T09:46:12, file_size_bytes: 232847306, e_tag: "f59e45bd8bd1d77cd7ae8ab6ab468bcc-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200906.parquet, file_modified: 2025-05-30T09:47:26, file_size_bytes: 224226575, e_tag: "8ebb698eea85f9af87065ac333efc449-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200907.parquet, file_modified: 2025-05-30T09:44:52, file_size_bytes: 217168413, e_tag: "7d7ee77f6cac4adc18aa3a9e74600dd3-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200908.parquet, file_modified: 2025-05-30T09:46:23, file_size_bytes: 217303109, e_tag: "e9883055d92a33b941aab971423e681b-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200909.parquet, file_modified: 2025-05-30T09:47:28, file_size_bytes: 223333499, e_tag: "6f0917e6003b38df9060d71c004eb961-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200910.parquet, file_modified: 2025-05-30T09:44:54, file_size_bytes: 246300471, e_tag: "8928b29da44e041021e10077683b7817-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200911.parquet, file_modified: 2025-05-30T09:46:37, file_size_bytes: 227920860, e_tag: "4cd26a1a7f82af080c33e890dc1fef27-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-200912.parquet, file_modified: 2025-05-30T09:44:24, file_size_bytes: 233873308, e_tag: "23f4584e494e3c065c777c270c9eedbc-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201001.parquet, file_modified: 2025-05-30T09:45:18, file_size_bytes: 235166925, e_tag: "effcc8cc41b40cf7ac466f911d7b9459-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201002.parquet, file_modified: 2025-05-30T09:46:59, file_size_bytes: 177367931, e_tag: "ce8b7817ecc47da86ccbfa6b51ffa06b-10", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201003.parquet, file_modified: 2025-05-30T09:44:26, file_size_bytes: 205857224, e_tag: "94a078b61e3b652387e6f2a673dc3f4e-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201004.parquet, file_modified: 2025-05-30T09:45:04, file_size_bytes: 243024246, e_tag: "a1efbebfdabc204e0041d8714aaec01a-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201005.parquet, file_modified: 2025-05-30T09:46:47, file_size_bytes: 248130090, e_tag: "d3cf585e00ce627a807348c84a42d0a6-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201006.parquet, file_modified: 2025-05-30T09:44:25, file_size_bytes: 237068130, e_tag: "831db33281a5c017f8ffc466bd47546b-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201007.parquet, file_modified: 2025-05-30T09:45:35, file_size_bytes: 234826090, e_tag: "790e05983e6592e4920c88fbd2bfe774-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201008.parquet, file_modified: 2025-05-30T09:47:14, file_size_bytes: 197990272, e_tag: "d87ddb446e5cbc0f6831fafd95cfd027-11", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201009.parquet, file_modified: 2025-05-30T09:44:27, file_size_bytes: 243408943, e_tag: "abfbe3b29942bcd68d131d95540278d3-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201010.parquet, file_modified: 2025-05-30T09:45:47, file_size_bytes: 225277041, e_tag: "f768c7b77497b2bf3efd5cb2a4362977-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201011.parquet, file_modified: 2025-05-30T09:47:23, file_size_bytes: 220010577, e_tag: "c6830cbe1f3ae918f9280db3aa847b03-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201012.parquet, file_modified: 2025-05-30T09:44:24, file_size_bytes: 219773352, e_tag: "264f7ea433076690a3bbe5566168e5c5-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201101.parquet, file_modified: 2025-05-30T09:45:52, file_size_bytes: 212535107, e_tag: "ca3bdc2707b29667c78c39517781eac4-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201102.parquet, file_modified: 2025-05-30T09:47:23, file_size_bytes: 223138164, e_tag: "e2b3c0fd0c0d66ac6363600de0c8b2ad-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201103.parquet, file_modified: 2025-05-30T09:44:26, file_size_bytes: 252843261, e_tag: "fd5d4e01568cd6e7ef1e00de76441e5b-15", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201104.parquet, file_modified: 2025-05-30T09:46:10, file_size_bytes: 233123935, e_tag: "2b510cc2c0c73d9ec7374c9e6d56c388-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201105.parquet, file_modified: 2025-05-30T09:44:24, file_size_bytes: 246843111, e_tag: "abc2f58bd520b2013aa1a333d317c70c-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201106.parquet, file_modified: 2025-05-30T09:44:58, file_size_bytes: 238786647, e_tag: "0e456698dc42a850ff7b764506cb511d-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201107.parquet, file_modified: 2025-05-30T09:46:40, file_size_bytes: 233249259, e_tag: "28177227cbff94a6a819a0568a14e9b2-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201108.parquet, file_modified: 2025-05-30T09:44:25, file_size_bytes: 212681184, e_tag: "fdcb442e1010630c0553a7018762a8ba-12", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201109.parquet, file_modified: 2025-05-30T09:45:13, file_size_bytes: 232399266, e_tag: "ccca37be5a3579a8bc644490226ed29a-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201110.parquet, file_modified: 2025-05-30T09:46:52, file_size_bytes: 248471033, e_tag: "eebe34c1bb74f63433eb607810969553-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201111.parquet, file_modified: 2025-05-30T09:44:26, file_size_bytes: 231103826, e_tag: "7c76b9fc111462b76336d63bce3253c7-13", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201112.parquet, file_modified: 2025-05-30T09:45:40, file_size_bytes: 236102882, e_tag: "26c10d1d85c4565cbb9e8fc6a7bc745c-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201201.parquet, file_modified: 2025-05-30T09:47:21, file_size_bytes: 236184052, e_tag: "8cdc15a22462579dcf90d669cea0f04b-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201202.parquet, file_modified: 2025-05-30T09:44:27, file_size_bytes: 238377570, e_tag: "4e6734c5c2e77c68dde5155a45dac81c-14", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201203.parquet, file_modified: 2025-05-30T09:46:06, file_size_bytes: 258226172, e_tag: "b7b07fa0f4fefcf0ba0fc69ba344b5c8-15", version: NULL} |
| 18138               | NULL       | {file_path: nyc_taxi_rides/data/tripdata_parquet/data-201204.parquet, file_modified: 2025-05-30T09:44:25, file_size_bytes: 248190698, e_tag: "968c13850fa9a7cb46337bc8fc9d13fa-14", version: NULL} |
| .                                                                                                                                                                                                                                     |
| .                                                                                                                                                                                                                                     |
| .                                                                                                                                                                                                                                     |
+---------------------+------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
96 row(s) fetched. (First 40 displayed. Use --maxrows to adjust)
Elapsed 0.057 seconds.
```

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

Yes

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

This will enable a new user-facing table function to datafusion cli.

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

I ran microbenchmarks comparing DataFusion with DuckDB for string
functions (see apache/datafusion-benchmarks#26)
and noticed that DF was very slow for `split_part`.

This PR fixes some obvious performance issues. Speedups are:

| Benchmark                         | Before | After | Speedup      |
|-----------------------------------|--------|-------|--------------|
| single_char_delim/pos_first       | 1.27ms | 140µs | 9.1x faster  |
| single_char_delim/pos_middle      | 1.39ms | 396µs | 3.5x faster  |
| single_char_delim/pos_last        | 1.47ms | 738µs | 2.0x faster  |
| single_char_delim/pos_negative    | 1.35ms | 148µs | 9.1x faster  |
| multi_char_delim/pos_first        | 1.22ms | 174µs | 7.0x faster  |
| multi_char_delim/pos_middle       | 1.22ms | 407µs | 3.0x faster  |
| string_view_single_char/pos_first | 1.42ms | 139µs | 10.2x faster |
| many_parts_20/pos_second          | 2.48ms | 201µs | 12.3x faster |
| long_strings_50_parts/pos_first   | 8.18ms | 178µs | 46x faster   |

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Martin Grigorov <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes apache#19528.

## Rationale for this change

The Date - Date currently returns duration which is not consistent with
other databases.

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Made changes to so it return Int instead of the days duration
- Added slt test

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Jeffrey Vo <[email protected]>
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Follow on to apache#19666

## Rationale for this change

While working on apache#19666 I
noticed there were `149` dependencies behind the latest

```shell
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion2$ cargo update -p arrow
    Updating crates.io index
     Locking 0 packages to latest compatible versions
note: pass `--verbose` to see 149 unchanged dependencies behind latest
```

## What changes are included in this PR?

Run `cargo update` and check in the result

## Are these changes tested?

By CI
## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 8, 2026
… indexing (apache#19590)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR improves the performance of the `substring_index` function by
optimizing
delimiter search and substring extraction: 

- Single-byte fast path: introduces a specialized byte-based search for
single-byte delimiters (e.g. `.`, `,`), avoiding UTF-8 pattern matching
overhead.
- Efficient index discovery: replaces the split-and-sum-length approach
with direct index location using `match_indices` / `rmatch_indices`.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

- Added a fast path for `delimiter.len() == 1` using byte-based search.
- Refactored the general path to use `match_indices` and
`rmatch_indices` for more efficient positioning.

### Benchmarks
- Single-byte delimiter benchmarks show ~2–3× speedup across batch
sizes.
- Multi-byte delimiters see a consistent ~10–15% improvement.
```
group                                               main_substrindex                       perf_substrindex
-----                                               ----------------                       ----------------
substr_index/substr_index_10000_long_delimiter      1.12   548.2±18.39µs        ? ?/sec    1.00   488.4±15.62µs        ? ?/sec
substr_index/substr_index_10000_single_delimiter    2.14   543.4±15.12µs        ? ?/sec    1.00    254.0±7.80µs        ? ?/sec
substr_index/substr_index_1000_long_delimiter       1.12     43.1±1.63µs        ? ?/sec    1.00     38.6±2.03µs        ? ?/sec
substr_index/substr_index_1000_single_delimiter     3.51     46.4±2.21µs        ? ?/sec    1.00     13.2±0.99µs        ? ?/sec
substr_index/substr_index_100_long_delimiter        1.01      3.7±0.18µs        ? ?/sec    1.00      3.7±0.20µs        ? ?/sec
substr_index/substr_index_100_single_delimiter      2.15      3.6±0.14µs        ? ?/sec    1.00  1675.9±79.15ns        ? ?/sec
```

## Are these changes tested?

- Yes, Existing unit tests pass.
- New benchmarks added to verify performance improvement.

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

No.

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#14763.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

Replace `TypeSignature::Exact` patterns with cleaner APIs:
- `isnan/iszero`: `Signature::coercible` with
`TypeSignatureClass::Float`
- `nanvl`: `Signature::uniform(2, [Float32, Float64])`

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Jeffrey Vo <[email protected]>
comphead pushed a commit that referenced this pull request Jan 8, 2026
) (apache#19319)

## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #apache#19141.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Sergey Zhukov <[email protected]>
Co-authored-by: Andrew Lamb <[email protected]>
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Part of apache#19025.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

## What changes are included in this PR?

- Added Time64/Time32 signatures to date_trunc
- Added time truncation logic (hour, minute, second, millisecond,
microsecond)
- Error for invalid granularities (day, week, month, quarter, year)

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

Yes

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->

---------

Co-authored-by: Jeffrey Vo <[email protected]>
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->

- Closes #.

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->
`ParquetOpener::open()` is a critical function for parquet planning,
it's the entry point for many major steps like row-group/file pruning.

It has almost 400 lines of code now, this PR adds some markers to the
code blocks/important steps, to make this function easier to navigate.
(though I may have overlooked some critical steps)

Ideally, we should break these blocks into utilities. I tried extracting
some of them with AI, but the resulting utilities still have unclear
semantics, with many input arguments and output items. Overall, the
complexity doesn’t seem reduced after the change. I think it’s possible
to factor them into helper functions with clear semantics, but that
likely requires someone who understands the implementation details very
well.

## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
comphead pushed a commit that referenced this pull request Jan 8, 2026
## Which issue does this PR close?

<!--
We generally require a GitHub issue to be filed for all bug fixes and
enhancements and this helps us generate change logs for our releases.
You can link an issue to this PR using the GitHub syntax. For example
`Closes #123` indicates that this PR will close issue #123.
-->
- Builds on apache#19388
- Closes apache#19573

## Rationale for this change

<!--
Why are you proposing this change? If this is already explained clearly
in the issue then this section is not needed.
Explaining clearly why changes are proposed helps reviewers understand
your changes and offer better suggestions for fixes.
-->

This PR explores one way to make `ListFilesCache` table scoped. A
session level cache is still used, but the cache key is made a
"table-scoped" path, for which a new struct
```
pub struct TableScopedPath(pub Option<TableReference>, pub Path);
```
is defined. `TableReference` comes from `CreateExternalTable` passed to
`ListingTableFactory::create`.

Additionally, when a table is dropped, all entries related to a table is
dropped by modifying `SessionContext::find_and_deregister` method.

Testing (change on adding `list_files_cache()` for cli is included for
easier testing).
- Testing cache reuse on a single table.
```
> \object_store_profiling summary
ObjectStore Profile mode set to Summary
> create external table test
stored as parquet
location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/';
0 row(s) fetched. 
Elapsed 14.878 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.030597s | 0.209235s | 0.082396s   | 36.254189s  | 440   |
| Get       | size     | 204782 B  | 857230 B  | 497304.88 B | 218814144 B | 440   |
| List      | duration | 0.192037s | 0.192037s | 0.192037s   | 0.192037s   | 1     |
| List      | size     |           |           |             |             | 1     |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select table, path, unnest(metadata_list) from list_files_cache() limit 1;
+-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| table | path                            | UNNEST(list_files_cache().metadata_list)                                                                                                                                                                                                                  |
+-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| test  | release/2025-12-17.0/theme=base | {file_path: release/2025-12-17.0/theme=base/type=bathymetry/part-00000-dd0f2f50-b436-4710-996f-f1b06181a3a1-c000.zstd.parquet, file_modified: 2025-12-17T22:32:50, file_size_bytes: 40280159, e_tag: "15090401f8f936c3f83bb498cb99a41d-3", version: NULL} |
+-------+---------------------------------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row(s) fetched. 
Elapsed 0.058 seconds.

Object Store Profiling
> select count(*) from test where type = 'infrastructure';
+-----------+
| count(*)  |
+-----------+
| 142969564 |
+-----------+
1 row(s) fetched. 
Elapsed 0.028 seconds.

Object Store Profiling
```
- Test separate cache entries are created for two tables with same path
```
> create external table test2
stored as parquet
location 's3://overturemaps-us-west-2/release/2025-12-17.0/theme=base/';
0 row(s) fetched. 
Elapsed 14.798 seconds.

Object Store Profiling
Instrumented Object Store: instrument_mode: Summary, inner: AmazonS3(overturemaps-us-west-2)
Summaries:
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Operation | Metric   | min       | max       | avg         | sum         | count |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
| Get       | duration | 0.030238s | 0.350465s | 0.073670s   | 32.414654s  | 440   |
| Get       | size     | 204782 B  | 857230 B  | 497304.88 B | 218814144 B | 440   |
| List      | duration | 0.133334s | 0.133334s | 0.133334s   | 0.133334s   | 1     |
| List      | size     |           |           |             |             | 1     |
+-----------+----------+-----------+-----------+-------------+-------------+-------+
> select count(*) from test2 where type = 'bathymetry';
+----------+
| count(*) |
+----------+
| 59963    |
+----------+
1 row(s) fetched. 
Elapsed 0.009 seconds.

Object Store Profiling
> select table, path from list_files_cache();
+-------+---------------------------------+
| table | path                            |
+-------+---------------------------------+
| test  | release/2025-12-17.0/theme=base |
| test2 | release/2025-12-17.0/theme=base |
+-------+---------------------------------+
2 row(s) fetched. 
Elapsed 0.004 seconds.
```
- Test cache associated with a table is dropped when table is dropped,
and the other table with same path is unaffected.
```
> drop table test;
0 row(s) fetched. 
Elapsed 0.015 seconds.

Object Store Profiling
> select table, path from list_files_cache();
+-------+---------------------------------+
| table | path                            |
+-------+---------------------------------+
| test2 | release/2025-12-17.0/theme=base |
+-------+---------------------------------+
1 row(s) fetched. 
Elapsed 0.005 seconds.

Object Store Profiling
> select count(*) from list_files_cache() where table = 'test';
+----------+
| count(*) |
+----------+
| 0        |
+----------+
1 row(s) fetched. 
Elapsed 0.014 seconds.
> select count(*) from test2 where type = 'infrastructure';
+-----------+
| count(*)  |
+-----------+
| 142969564 |
+-----------+
1 row(s) fetched. 
Elapsed 0.013 seconds.

Object Store Profiling
```
- Test that dropping a view does not remove cache
```
> create view test2_view as (select * from test2 where type = 'infrastructure');
0 row(s) fetched. 
Elapsed 0.103 seconds.

Object Store Profiling
> select count(*) from test2_view;
+-----------+
| count(*)  |
+-----------+
| 142969564 |
+-----------+
1 row(s) fetched. 
Elapsed 0.094 seconds.

Object Store Profiling
> drop view test2_view;
0 row(s) fetched. 
Elapsed 0.002 seconds.

Object Store Profiling
> select table, path from list_files_cache();
+-------+---------------------------------+
| table | path                            |
+-------+---------------------------------+
| test2 | release/2025-12-17.0/theme=base |
+-------+---------------------------------+
1 row(s) fetched. 
Elapsed 0.007 seconds.
```
## What changes are included in this PR?

<!--
There is no need to duplicate the description in the issue here but it
is sometimes worth providing a summary of the individual changes in this
PR.
-->

## Are these changes tested?

<!--
We typically require tests for all PRs in order to:
1. Prevent the code from being accidentally broken by subsequent changes
2. Serve as another way to document the expected behavior of the code

If tests are not included in your PR, please explain why (for example,
are they covered by existing tests)?
-->

## Are there any user-facing changes?

<!--
If there are user-facing changes then we may require documentation to be
updated before approving the PR.
-->

<!--
If there are any breaking changes to public APIs, please add the `api
change` label.
-->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant