Skip to content

"Parameters of a model after .cuda() will be different objects with those before the call." is wrong. #7844

@freud14

Description

@freud14

Hi,

In the documentation, it is written:

If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.

In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.

However, doing .cuda() after intialiazing the optimizer still works. This is because the Module class applies the .cuda() in this way:

param.data = fn(param.data)
if param._grad is not None:
    param._grad.data = fn(param._grad.data)

Thus, by modifying the .data attribute, it modifies the parameter tensors in-place.

I then suggest to remove this "warning" from the documentation since I actually find this quite useful to be able to initialize the optimizer before doing .cuda().

Thank you.

Frédérik

Metadata

Metadata

Assignees

Labels

module: optimizerRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate module

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions