Skip to content

Inconsistent use of the "input" parameter to the Backward method in ANNs #3551

@akropp

Description

@akropp

Issue description

It seems that there is some inconsistency in the input parameter of the Backward method on ANN layers. The name input implies that it should contain the input values to the layer, and in certain cases, that is in fact what is assumed (such as in the loss functions, and in the top-level call to the network in ffn_impl. In many of the layers, the input is actually not used at all, however, in some of the layers, it is actually assumed to be the output from the layer. This is true in softmax_impl and logsoftmax_impl. This is also true in some subset of the activation functions (but seemingly not all of them).

The implementation of MultiLayer makes this assumption as well, passing the output values from each layer to its Backward method (as the input parameter) when backpropagating the errors. Thus for the layers which are interpreting that parameter as the output, they get the expected values, and for the layers that ignore it, they don't care. However, there are certain cases where a layer might actually need the input to implement the Backward method, so having this inconsistency can cause bugs (and may actually be causing some so far undetected ones).

I propose this is either formally changed to be output (and every method that uses it is confirmed to make that assumption), or that it be correctly implemented as input (meaning the implementation of MultiLayer and of any classes that currently assume it to be output need to be changed). I have already gone partially down this path in pr #3546, but realized there are more impacts than I thought. I wanted to post here to get some clarity on what the desired interface/behavior is.

Your environment

  • version of mlpack: 4.2.1
  • operating system: any (linux)
  • compiler: any (gcc)
  • version of dependencies (Boost/Armadillo):
  • any other environment information you think is relevant:

Steps to reproduce

Expected behavior

Actual behavior

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions