-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Closed
Labels
module: docsRelated to our documentation, both in docs/ and docblocksRelated to our documentation, both in docs/ and docblocksmodule: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module
Description
📚 The doc issue
In the doc of optim.RMSprop(), momentum, centered and capturable parameter are in order as shown below:
Class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False, capturable=False, foreach=None, maximize=False, differentiable=False)
But in Parameters section, momentum, centered and capturable parameter are explained in different order as shown below:
Parameters
- ...
- lr (float, optional) – learning rate (default: 1e-2)
- momentum (float, optional) – momentum factor (default: 0) <- Here
- alpha (float, optional) – smoothing constant (default: 0.99)
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
- centered (bool, optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance <- Here
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
- foreach (bool, optional) – whether foreach implementation of optimizer is used. If unspecified by the user (so foreach is None), we will try to use foreach over the for-loop implementation on CUDA, since it is usually significantly more performant. Note that the foreach implementation uses ~ sizeof(params) more peak memory than the for-loop version due to the intermediates being a tensorlist vs just one tensor. If memory is prohibitive, batch fewer parameters through the optimizer at a time or switch this flag to False (default: None)
- maximize (bool, optional) – maximize the objective with respect to the params, instead of minimizing (default: False)
- capturable (bool, optional) – whether this instance is safe to capture in a CUDA graph. Passing True can impair ungraphed performance, so if you don’t intend to graph capture this instance, leave it False (default: False) <- Here
- differentiable (bool, optional) – whether autograd should occur through the optimizer step in training. Otherwise, the step() function runs in a torch.no_grad() context. Setting to True can impair performance, so leave it False if you don’t intend to run autograd through this instance (default: False)
Suggest a potential alternative/fix
So in Parameters section, momentum, centered and capturable parameter should be explained in order as shown below:
Parameters
- ...
- lr (float, optional) – learning rate (default: 1e-2)
- alpha (float, optional) – smoothing constant (default: 0.99)
- eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
- weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
- momentum (float, optional) – momentum factor (default: 0) <- Here
- centered (bool, optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance <- Here
- capturable (bool, optional) – whether this instance is safe to capture in a CUDA graph. Passing True can impair ungraphed performance, so if you don’t intend to graph capture this instance, leave it False (default: False) <- Here
- foreach (bool, optional) – whether foreach implementation of optimizer is used. If unspecified by the user (so foreach is None), we will try to use foreach over the for-loop implementation on CUDA, since it is usually significantly more performant. Note that the foreach implementation uses ~ sizeof(params) more peak memory than the for-loop version due to the intermediates being a tensorlist vs just one tensor. If memory is prohibitive, batch fewer parameters through the optimizer at a time or switch this flag to False (default: None)
- maximize (bool, optional) – maximize the objective with respect to the params, instead of minimizing (default: False)
- differentiable (bool, optional) – whether autograd should occur through the optimizer step in training. Otherwise, the step() function runs in a torch.no_grad() context. Setting to True can impair performance, so leave it False if you don’t intend to run autograd through this instance (default: False)
cc @svekars @brycebortree @sekyondaMeta @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar
Metadata
Metadata
Assignees
Labels
module: docsRelated to our documentation, both in docs/ and docblocksRelated to our documentation, both in docs/ and docblocksmodule: optimizerRelated to torch.optimRelated to torch.optimtriagedThis issue has been looked at a team member, and triaged and prioritized into an appropriate moduleThis issue has been looked at a team member, and triaged and prioritized into an appropriate module