Tighten alignment promises for halide_malloc() + use aligned_alloc() #7206
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This makes several hand-in-hand changes to the behavior of
halide_malloc():Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).
Currently, the
halide_malloc_alignment()function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)Currently, all our
halide_malloc()implementations just usemalloc()/free(), user overallocation tricks to ensure the right alignment. This PR adds a new implementation, which uses the C11/C++17aligned_alloc()call instead. By default, we use this implementation on all Unixy platforms, with a new Feature,no_aligned_alloc, to allow forcing the use ofmalloc()instead. This is necessary because while ~all modern Linux versions support this, Android doesn't support it till API >= 28, and OSX doesn't support it till >= 10.15. (The QuRT allocator will continue to usemalloc()for now, pending some post-holiday investigation by QC.)We also add a Windows-specific variant that uses their
_aligned_malloc()/_aligned_free()calls; IIRC, the MSVC team has stated that they are unlikely to ever support the standardaligned_alloc()calls, for reasons that aren't important here, but do support these as a partial workaround.This will likely need some torture testing, since it's possible that some platforms offer
aligned_alloc()implementations that have inferior performance tomalloc().