wrap behavior and normalizing factor
One of the essential vectorization components of image convolution process are the im2col and col2im kernels, which in arrayfire are implemented as af::unwrap and af::wrap. However this snippet from the documentation makes me wonder about something crucially important that I cannot find being mentioned:
However, in the case of the sliding window configuration (the stride is smaller than the window size), some of the columns will map to overlapping sections in the output array, and so elements from multiple columns will map to the same position on the output array. Recomposing the array then requires some way to choose between competing elements to place in that position. To address this contention, wrap simply sums all of the competing elements and places the sum in that position. The figure below illustrates this behavior: the fourth element of the first column and the third element of the second column in the input array both map to the same position on the output array, and thus their sum is placed on that position (this happens on the second and third column of the input as well - they both map to the third element of the second column in the output). Given this behavior, it is up to the user to pre-process the input (unwrapped) array (or post-process the output (wrapped) array) in a way that somehow takes all of the competing elements into consideration.
The problem I don't see mentioned in the documentation is that it should be a mean, not a sum, which would give the original sliding window values from the overlapping components... but for computing a mean you need a cardinality or a number of items being summed. In the documentation image the brightness of the blue entry suggests this, but is not mentioned
One could easily compute the normalizing tensor by using an input image filled with ones, but there should be a way to incorporate this directly into the kernel and avoid a separate division. could batching avoid this overhead?
So my question is: how to compute the normalizing cardinality required to convert the sum into a proper mean, and make af::wrap a proper inverse of af::unwrap?