ARROW-10070: [C++][Compute] Implement var and std aggregate kernel#8269
ARROW-10070: [C++][Compute] Implement var and std aggregate kernel#8269cyb70289 wants to merge 6 commits intoapache:masterfrom cyb70289:agg-stdev
Conversation
|
Dev / Lint ci failure looks not related. missing license header? |
Yes, #8273 should fix that. |
|
We might want to have an option to specify the denominator? (whether it is |
Thank you, will do. |
|
Added option |
|
Two notes:
|
|
Thanks @nealrichardson
Naming is always the hardest thing :) Looks
I also thought about the |
|
Added variance kernel |
docs/source/cpp/compute.rst
Outdated
There was a problem hiding this comment.
Would rather "stddev" and "variance" respectively. "std" and "var" can easily be misunderstood, IMHO.
There was a problem hiding this comment.
Ironically, the fact that this is random identically-distributed data means that the slices will have similar means and variances. Perhaps making the array smaller would make the test more significant.
There was a problem hiding this comment.
Can we also test with chunks of very different sizes? For example [1, 2, 3, 4, 5, 6, 7] and [8].
There was a problem hiding this comment.
Can we also test with ddof = 1? Though with such a large array size, the change may be rather small.
There was a problem hiding this comment.
Naming suggestion: how about DdofOptions?
There was a problem hiding this comment.
I'm not sure if we will add options later, e.g., to ignore NaN
There was a problem hiding this comment.
I agree there might be other options in the future. But with the renaming of the functions, maybe call it VarianceOptions instead?
There was a problem hiding this comment.
Kind of struggling as stddev kernel uses this same option.
Maybe export both VarianceOptions and StddevOptions and alias them as same type internally?
There was a problem hiding this comment.
Only one is useful. Just VarianceOptions IMHO.
|
I suppose the behaviour with NaN is that any NaN in the input gives NaN as result? That might be worth adding a test for? |
|
Since we're not doing anything special, it certainly will. I think that can be tackled in a separate JIRA, when adding a nan handling option perhaps. |
No description provided.