Skip to content

Conversation

@steven-johnson
Copy link
Contributor

This makes several hand-in-hand changes to the behavior of halide_malloc():

  • Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).

  • Currently, the halide_malloc_alignment() function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)

  • Currently, all our halide_malloc() implementations just use malloc()/free(), user overallocation tricks to ensure the right alignment. This PR adds a new implementation, which uses the C11/C++17 aligned_alloc() call instead. By default, we use this implementation on all Unixy platforms, with a new Feature, no_aligned_alloc, to allow forcing the use of malloc() instead. This is necessary because while ~all modern Linux versions support this, Android doesn't support it till API >= 28, and OSX doesn't support it till >= 10.15. (The QuRT allocator will continue to use malloc() for now, pending some post-holiday investigation by QC.)

  • We also add a Windows-specific variant that uses their _aligned_malloc()/_aligned_free() calls; IIRC, the MSVC team has stated that they are unlikely to ever support the standard aligned_alloc() calls, for reasons that aren't important here, but do support these as a partial workaround.

This will likely need some torture testing, since it's possible that some platforms offer aligned_alloc() implementations that have inferior performance to malloc().

This makes several hand-in-hand changes to the behavior of `halide_malloc()`:

* Currently, halide_malloc must return a pointer aligned to the maximum meaningful alignment for the platform for the purpose of vector loads and stores. This PR also adds the requirement that the memory returned must be legal to access in an integral multple of alignment >= the requested size (in other words: you should be able to do vector load/stores "off the end" without causing any faults).

* Currently, the `halide_malloc_alignment()` function is used to determine the default alignment; this cannot be overridden by user code (well, it can be, but the override will have no useful effect). It is intended to be "internal only" but is used in at least one place outside the runtime (apps/hannk). This change removes the call entirely, in favor of a call that is harder to access from outside the runtime and much less likely for end users to attempt to call. (It also changes apps/hannk to stop using it.)

* Currently, all our `halide_malloc()` implementations just use `malloc()`/`free()`, user overallocation tricks to ensure the right alignment. This PR adds a new implementation, which uses the C11/C++17 `aligned_alloc()` call instead. By default, we use this implementation on all Unixy platforms, with a new Feature, `no_aligned_alloc`, to allow forcing the use of `malloc()` instead. This is necessary because while ~all modern Linux versions support this, Android doesn't support it till API >= 28, and OSX doesn't support it till >= 10.15. (The QuRT allocator will continue to use `malloc()` for now, pending some post-holiday investigation by QC.)

* We also add a Windows-specific variant that uses their `_aligned_malloc()`/`_aligned_free()` calls; IIRC, the MSVC team has stated that they are unlikely to ever support the standard `aligned_alloc()` calls, for reasons that aren't important here, but do support these as a partial workaround.

This will likely need some torture testing, since it's possible that some platforms offer `aligned_alloc()` implementations that have inferior performance to `malloc()`.
@steven-johnson
Copy link
Contributor Author

I'm going to split this PR in two, so I'm closing this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants