🐛 Describe the bug
From an internal report. We're seeing some_tensor.to("cpu", non_blocking=True) becomes sync under PT2 while async in eager mode. Under eager, the profiler trace shows Memcpy DtoH (Device -> Pinned) while under PT2, it shows Memcpy DtoH (Device -> Pageable)
Versions
main
cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov