Commit 9baefea
This PR implements "Segmented Aggregation" to the existing aggregation
node to improve aggregation on ordered data.
A segment group is defined as "a continuous chunk of data that have the
same segment key value. e.g, if the input data looks like
```
[0, 0, 0, 1, 2, 2]
```
Then there are three segments `[0, 0, 0]` `[1]` `[2, 2]`
(Note the "group" in "segment group" here is added to differentiate from
"segment", which is defined as "a continuous chunk of data with in a
ExecBatch")
Segment aggregation can be used to replace existing hash aggregation in
the case that data are ordered. The benefit of this is
(1) We can output aggregation result earlier (as soon as a segment group
is fully consumed).
(2) We only need to hold partial aggregation for one segment group to
reduce memory usage.
See https://issues.apache.org/jira/browse/ARROW-17642
Replaces #14352
* Closes: #32884
Follow ups
=======
* #34475
* #34529
---------
Co-authored-by: Li Jin <[email protected]>
1 parent 4c05a3b commit 9baefea
File tree
11 files changed
+1731
-195
lines changed- cpp/src/arrow
- compute
- exec
- kernels
- row
11 files changed
+1731
-195
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
147 | 147 | | |
148 | 148 | | |
149 | 149 | | |
| 150 | + | |
| 151 | + | |
| 152 | + | |
| 153 | + | |
| 154 | + | |
| 155 | + | |
| 156 | + | |
| 157 | + | |
| 158 | + | |
| 159 | + | |
| 160 | + | |
| 161 | + | |
150 | 162 | | |
151 | 163 | | |
152 | 164 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
181 | 181 | | |
182 | 182 | | |
183 | 183 | | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
184 | 190 | | |
185 | 191 | | |
186 | 192 | | |
| |||
240 | 246 | | |
241 | 247 | | |
242 | 248 | | |
| 249 | + | |
| 250 | + | |
243 | 251 | | |
244 | 252 | | |
245 | 253 | | |
| |||
Large diffs are not rendered by default.
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
241 | 241 | | |
242 | 242 | | |
243 | 243 | | |
244 | | - | |
245 | | - | |
| 244 | + | |
246 | 245 | | |
247 | 246 | | |
248 | 247 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
221 | 221 | | |
222 | 222 | | |
223 | 223 | | |
224 | | - | |
| 224 | + | |
| 225 | + | |
| 226 | + | |
| 227 | + | |
| 228 | + | |
| 229 | + | |
| 230 | + | |
225 | 231 | | |
226 | 232 | | |
227 | 233 | | |
228 | 234 | | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
| 239 | + | |
229 | 240 | | |
230 | 241 | | |
231 | 242 | | |
232 | | - | |
233 | | - | |
| 243 | + | |
| 244 | + | |
| 245 | + | |
| 246 | + | |
| 247 | + | |
234 | 248 | | |
235 | 249 | | |
236 | 250 | | |
237 | | - | |
| 251 | + | |
238 | 252 | | |
| 253 | + | |
| 254 | + | |
239 | 255 | | |
240 | 256 | | |
241 | 257 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
1578 | 1578 | | |
1579 | 1579 | | |
1580 | 1580 | | |
| 1581 | + | |
| 1582 | + | |
| 1583 | + | |
| 1584 | + | |
| 1585 | + | |
| 1586 | + | |
| 1587 | + | |
| 1588 | + | |
| 1589 | + | |
| 1590 | + | |
| 1591 | + | |
| 1592 | + | |
| 1593 | + | |
| 1594 | + | |
| 1595 | + | |
| 1596 | + | |
| 1597 | + | |
| 1598 | + | |
| 1599 | + | |
| 1600 | + | |
| 1601 | + | |
| 1602 | + | |
| 1603 | + | |
| 1604 | + | |
| 1605 | + | |
| 1606 | + | |
| 1607 | + | |
| 1608 | + | |
| 1609 | + | |
| 1610 | + | |
| 1611 | + | |
| 1612 | + | |
| 1613 | + | |
| 1614 | + | |
| 1615 | + | |
| 1616 | + | |
| 1617 | + | |
| 1618 | + | |
| 1619 | + | |
| 1620 | + | |
| 1621 | + | |
| 1622 | + | |
| 1623 | + | |
| 1624 | + | |
| 1625 | + | |
| 1626 | + | |
| 1627 | + | |
| 1628 | + | |
| 1629 | + | |
| 1630 | + | |
| 1631 | + | |
| 1632 | + | |
| 1633 | + | |
| 1634 | + | |
| 1635 | + | |
| 1636 | + | |
| 1637 | + | |
| 1638 | + | |
| 1639 | + | |
| 1640 | + | |
| 1641 | + | |
| 1642 | + | |
| 1643 | + | |
| 1644 | + | |
| 1645 | + | |
| 1646 | + | |
| 1647 | + | |
| 1648 | + | |
| 1649 | + | |
| 1650 | + | |
| 1651 | + | |
| 1652 | + | |
| 1653 | + | |
| 1654 | + | |
| 1655 | + | |
| 1656 | + | |
| 1657 | + | |
| 1658 | + | |
| 1659 | + | |
| 1660 | + | |
| 1661 | + | |
| 1662 | + | |
| 1663 | + | |
| 1664 | + | |
| 1665 | + | |
| 1666 | + | |
| 1667 | + | |
| 1668 | + | |
| 1669 | + | |
| 1670 | + | |
| 1671 | + | |
| 1672 | + | |
| 1673 | + | |
| 1674 | + | |
| 1675 | + | |
| 1676 | + | |
| 1677 | + | |
| 1678 | + | |
| 1679 | + | |
| 1680 | + | |
| 1681 | + | |
| 1682 | + | |
| 1683 | + | |
1581 | 1684 | | |
1582 | 1685 | | |
0 commit comments