Skip to content

Handling of no/zero gradients in cpp Function #2003

@albanD

Description

@albanD

The problem here is what should be the returned value of a cpp Function when the output was indépendant of the inputs or no gradient was backpropagated for this specific output (it was unused later in the graph).

For python Function, the wrapper takes care, when no gradient exist for an output to create a full dense tensor filled with 0s of the size of the output. The same happens when a python Function returns None for a given parameter, the next Function will actually see a dense tensor of 0s.

In cpp, such wrapper does not exist, so if an output was not used, the corresponding element in the input variable_list will be and empty shared_ptr. And the cpp Function should handle empty shared_ptr as if it was a dense Tensor of 0s (possibly skipping a lot of computations as in ConvBackwardBackward).
Right now, some Function do handle this case properly, but not all of them (leading for example to this issue in the forum).

The question is: are we ok with the above design saying that any input which is an empty shared_ptr is equivalent to a tensor of all 0s ? And all cpp Functions should handle empty shared_ptr as input ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions