`momentum`, `centered` and `capturable` parameter of `optim.RMSprop()` should be explained in order

### 📚 The doc issue

In [the doc](https://pytorch.org/docs/stable/generated/torch.optim.RMSprop.html) of `optim.RMSprop()`, `momentum`, `centered` and `capturable` parameter are in order as shown below:

> Class torch.optim.RMSprop(params, lr=0.01, alpha=0.99, eps=1e-08, weight_decay=0, momentum=0, centered=False, capturable=False, foreach=None, maximize=False, differentiable=False)

But in `Parameters` section, `momentum`, `centered` and `capturable` parameter are explained in different order as shown below:

> Parameters
> - ...
> - lr ([float](https://docs.python.org/3/library/functions.html#float), optional) – learning rate (default: 1e-2)
> - momentum ([float](https://docs.python.org/3/library/functions.html#float), optional) – momentum factor (default: 0) <- Here
> - alpha ([float](https://docs.python.org/3/library/functions.html#float), optional) – smoothing constant (default: 0.99)
> - eps ([float](https://docs.python.org/3/library/functions.html#float), optional) – term added to the denominator to improve numerical stability (default: 1e-8)
> - centered ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance <- Here
> - weight_decay ([float](https://docs.python.org/3/library/functions.html#float), optional) – weight decay (L2 penalty) (default: 0)
> - foreach ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether foreach implementation of optimizer is used. If unspecified by the user (so foreach is None), we will try to use foreach over the for-loop implementation on CUDA, since it is usually significantly more performant. Note that the foreach implementation uses ~ sizeof(params) more peak memory than the for-loop version due to the intermediates being a tensorlist vs just one tensor. If memory is prohibitive, batch fewer parameters through the optimizer at a time or switch this flag to False (default: None)
> - maximize ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – maximize the objective with respect to the params, instead of minimizing (default: False)
> - capturable ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether this instance is safe to capture in a CUDA graph. Passing True can impair ungraphed performance, so if you don’t intend to graph capture this instance, leave it False (default: False) <- Here
> - differentiable ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether autograd should occur through the optimizer step in training. Otherwise, the step() function runs in a torch.no_grad() context. Setting to True can impair performance, so leave it False if you don’t intend to run autograd through this instance (default: False)

### Suggest a potential alternative/fix

So in `Parameters` section, `momentum`, `centered` and `capturable` parameter should be explained in order as shown below:

> Parameters
> - ...
> - lr ([float](https://docs.python.org/3/library/functions.html#float), optional) – learning rate (default: 1e-2)
> - alpha ([float](https://docs.python.org/3/library/functions.html#float), optional) – smoothing constant (default: 0.99)
> - eps ([float](https://docs.python.org/3/library/functions.html#float), optional) – term added to the denominator to improve numerical stability (default: 1e-8)
> - weight_decay ([float](https://docs.python.org/3/library/functions.html#float), optional) – weight decay (L2 penalty) (default: 0)
> - momentum ([float](https://docs.python.org/3/library/functions.html#float), optional) – momentum factor (default: 0) <- Here
> - centered ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance <- Here
> - capturable ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether this instance is safe to capture in a CUDA graph. Passing True can impair ungraphed performance, so if you don’t intend to graph capture this instance, leave it False (default: False) <- Here
> - foreach ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether foreach implementation of optimizer is used. If unspecified by the user (so foreach is None), we will try to use foreach over the for-loop implementation on CUDA, since it is usually significantly more performant. Note that the foreach implementation uses ~ sizeof(params) more peak memory than the for-loop version due to the intermediates being a tensorlist vs just one tensor. If memory is prohibitive, batch fewer parameters through the optimizer at a time or switch this flag to False (default: None)
> - maximize ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – maximize the objective with respect to the params, instead of minimizing (default: False)
> - differentiable ([bool](https://docs.python.org/3/library/functions.html#bool), optional) – whether autograd should occur through the optimizer step in training. Otherwise, the step() function runs in a torch.no_grad() context. Setting to True can impair performance, so leave it False if you don’t intend to run autograd through this instance (default: False)

cc @svekars @brycebortree @sekyondaMeta @vincentqb @jbschlosser @albanD @janeyx99 @crcrpar

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

`momentum`, `centered` and `capturable` parameter of `optim.RMSprop()` should be explained in order #137391

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

momentum, centered and capturable parameter of optim.RMSprop() should be explained in order #137391

Description

📚 The doc issue

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

`momentum`, `centered` and `capturable` parameter of `optim.RMSprop()` should be explained in order #137391