Skip to content

Force PyTorch to clear CUDA cache #72117

@twsl

Description

@twsl

🚀 The feature, motivation and pitch

Especially during hyperparameter optimization, exceptions like OOM can occur.
I'm looking for a way to restore and recover from OOM exceptions and would like to propose an additional force parameter for torch.cuda.empty_cache(), that forces PyTorch to release all cache, even if due to a memory leak some elements remain.
Optionally a function like torch.cuda.reset() would obviously work as well.

Current suggestions with gc.collect and torch.cuda.empty_cache() are not reliable enough to restore the initial state.

Alternatives

Completely restart Python kernel releases all CUDA memory, but does not work on HPO.

Additional context

Suggestions how to properly track down memory leaks and solve my core problem are appreciated.

cc @ngimel

Metadata

Metadata

Assignees

No one assigned

    Labels

    module: cudaRelated to torch.cuda, and CUDA support in generaltriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions