Handling of no/zero gradients in cpp Function

The problem here is what should be the returned value of a cpp `Function` when the output was indépendant of the inputs or no gradient was backpropagated for this specific output (it was unused later in the graph).

For python `Function`, the wrapper takes care, when no gradient exist for an output to create a full dense tensor filled with `0`s of the size of the output. The same happens when a python `Function` returns `None` for a given parameter, the next `Function` will actually see a dense tensor of `0`s.

In cpp, such wrapper does not exist, so if an output was not used, the corresponding element in the input `variable_list` will be and empty `shared_ptr`. And the cpp `Function` should handle empty `shared_ptr` as if it was a dense Tensor of `0`s (possibly skipping a lot of computations as in `ConvBackwardBackward`).
Right now, some `Function` do handle this case properly, but not all of them (leading for example to [this](https://discuss.pytorch.org/t/error-using-autograd-grad-with-2-directly-following-conv2d-layers/4661) issue in the forum).

The question is: are we ok with the above design saying that any input which is an empty `shared_ptr` is equivalent to a tensor of all `0`s ? And all cpp `Function`s should handle empty `shared_ptr` as input ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Handling of no/zero gradients in cpp Function #2003

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Handling of no/zero gradients in cpp Function #2003

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions