At least on the bdf benchmark, capping the downsample at 0.5 (the minimum for linear filtering to avoid dropping data) and removing mip computation is 1) The same speed on iOS and 2) ~20ms faster on a Pixel 7.
This is an alternative to #150722
I'll investigate on some less artifical examples.