"Parameters of a model after .cuda() will be different objects with those before the call." is wrong.

Hi,

In the documentation, it is written:

> If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
>
> In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.

However, doing .cuda() after intialiazing the optimizer still works. This is because the Module class applies the .cuda() in this way:
```python
param.data = fn(param.data)
if param._grad is not None:
    param._grad.data = fn(param._grad.data)
```
Thus, by modifying the `.data` attribute, it modifies the parameter tensors in-place. 

I then suggest to remove this "warning" from the documentation since I actually find this quite useful to be able to initialize the optimizer before doing .cuda().

Thank you.

Frédérik


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

"Parameters of a model after .cuda() will be different objects with those before the call." is wrong. #7844

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Parameters of a model after .cuda() will be different objects with those before the call." is wrong. #7844

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions