Skip to content

Treat bottoms and params more uniformly? #1474

@longjon

Description

@longjon

#1471 reminded me of something that's been weighing on my mind lately.

Caffe is designed with a hard distinction between "data blobs", the intermediate results of computation, and "parameter blobs", the variables on which gradient descent is performed.

This distinction is artificially imposed on the inputs to layers, which are just functions (with derivatives). For example, inner product layer performs a matrix multiply, C = AB, where B must be a bottom, C must be a top, and A must be a parameter blob. Meanwhile, @mcheshkov wants to compute the same function, except with A and B both as bottoms.

A layer should really just be a function with its first derivative, leaving you free to compose layers into functions (i.e., nets) which have free variables (parameters) and bound variables (intermediate data). The distinction between bottoms and parameters is really the business of Net and Solver rather than Layer, as it's only after composition that one learns which variables are free.

You can find some symptoms of this imposition in the code, e.g., we have both propagate_down and param_propagate_down. It'll also come up when generating nets from Python.

Obviously changing this would be rather major. I don't want to propose any specific course of action at this time, but just to make a note and let other minds compute.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions