[coalesce] Special case `BatchCoalescer` / `GenericInProgressArray` when multiple batches are pushed in with the same buffer

**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
- Part of https://github.com/apache/arrow-rs/issues/7761

[`GenericInProgressArray`](https://github.com/apache/arrow-rs/blob/2b40d1dfc35862ff350a40dfbc66f8a14f4eea31/arrow-select/src/coalesce/generic.rs#L31) is almost entirely dominated by copying strings to new buffers. It copies to new buffers to avoid accumulating large numbers of buffers that each have only a small number of rows pointed at

It has several optimizations to avoid copying and optimizing this copy when possible
The coalesce kernel has special logic to recycle string view buffers when they are not used much (TODO link)

I have a as yet unproven thesis that we could speed up the coalesce kernel by special casing when the underlying buffer is the same. 

The high level idea is that in the case of reading from Parquet the same string buffer will be used for several batches, so if the coalesce kernel detected this maybe we could avoid some copies. I intend to use the `coalesce` kernel to make parquet reading faster


**Describe the solution you'd like**

Make benchmarks kernel faster


**Describe alternatives you've considered**

The first thing I would do is check an actual parquet benchmark that the same `Buffer`s are used for multiple RecordBatches that come out of the reader:
```shell
cargo bench --features=arrow,async --bench arrow_reader_clickbench
```

If that is the case, then I would then make a benchmark that replicates the pattern (e.g. create a record batch with 32K rows, and then slice it up and send it in 8k row chunks)

Then I would try and optimize it. For example check pointer equality  and delay the string copies until it saw a new buffer pointer

**Additional context**

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[coalesce] Special case `BatchCoalescer` / `GenericInProgressArray` when multiple batches are pushed in with the same buffer #7765

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[coalesce] Special case BatchCoalescer / GenericInProgressArray when multiple batches are pushed in with the same buffer #7765

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

[coalesce] Special case `BatchCoalescer` / `GenericInProgressArray` when multiple batches are pushed in with the same buffer #7765