Skip to content

Gradient accumulation is untested #3036

@seanbell

Description

@seanbell

I often run into the same problem as #2532, where it is necessary to zero-initialize the bottom diff, but not the parameter diffs. This is unintuitive, but acceptable as long as it's both documented and tested. The problem is that there are no tests to check each layer for this behavior.

If you are opposed to fixing the test as per tnarihi@7d45526 (the "+1, -1" trick), then an alternative would be to add a function to every layer that says whether or not it supports gradient accumulation (default false). Then, the gradient checker would apply the "+1, -1" trick only to those that claim to support it. In the case that someone uses iter_size > 1, the net would check all its layers and raise an exception if any layer has parameters but doesn't support gradient accumulation.

I'm opening this issue because I think it needs to be addressed in one way or another.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions