-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
From #6846, a number of issues:
-
The
.py.cppcode that we produce for AOT-compiled Python extensions operates solely on the Python Buffer protocol, which is CPU-memory only (it has no provision for a buffer's memory to live on device). As such it really should always callcopy_to_host()on all output buffers, to ensure results are flushed properly, but it does not. (This should be an easy fix). -
More problematically, there isn't currently a way to avoid needless copy-to-host calls right now, since Python Buffer doesn't support anything but host, and we don't have the equivalent of
Halide::Runtime::Bufferin our Python bindings. We could add such an equivalent, but it would be vastly preferable to adopt an existing solution already in use by other GPU-accelerated Python libraries.dlpack(https://dmlc.github.io/dlpack/latest/index.html) appears to be a likely candidate but investigation is needed.