Skip to content

some_tensor.to("cpu", non_blocking=True) becomes sync under PT2 while async in eager mode #155121

@masnesral

Description

@masnesral

🐛 Describe the bug

From an internal report. We're seeing some_tensor.to("cpu", non_blocking=True) becomes sync under PT2 while async in eager mode. Under eager, the profiler trace shows Memcpy DtoH (Device -> Pinned) while under PT2, it shows Memcpy DtoH (Device -> Pageable)

Versions

main

cc @chauhang @penguinwu @voznesenskym @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @ipiszy @chenyang78 @kadeng @muchulee8 @amjames @aakhundov

Metadata

Metadata

Assignees

Labels

module: inductoroncall: pt2triagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions