-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Description
Hi,
In the documentation, it is written:
If you need to move a model to GPU via .cuda(), please do so before constructing optimizers for it. Parameters of a model after .cuda() will be different objects with those before the call.
In general, you should make sure that optimized parameters live in consistent locations when optimizers are constructed and used.
However, doing .cuda() after intialiazing the optimizer still works. This is because the Module class applies the .cuda() in this way:
param.data = fn(param.data)
if param._grad is not None:
param._grad.data = fn(param._grad.data)Thus, by modifying the .data attribute, it modifies the parameter tensors in-place.
I then suggest to remove this "warning" from the documentation since I actually find this quite useful to be able to initialize the optimizer before doing .cuda().
Thank you.
Frédérik