Treat bottoms and params more uniformly?

#1471 reminded me of something that's been weighing on my mind lately.

Caffe is designed with a hard distinction between "data blobs", the intermediate results of computation, and "parameter blobs", the variables on which gradient descent is performed.

This distinction is artificially imposed on the inputs to layers, which are just functions (with derivatives). For example, inner product layer performs a matrix multiply, `C = AB`, where `B` must be a bottom, `C` must be a top, and `A` must be a parameter blob. Meanwhile, @mcheshkov wants to compute _the same function_, except with `A` and `B` both as bottoms.

A layer should really just be a function with its first derivative, leaving you free to compose layers into functions (i.e., nets) which have free variables (parameters) and bound variables (intermediate data). The distinction between bottoms and parameters is really the business of `Net` and `Solver` rather than `Layer`, as it's only after composition that one learns which variables are free.

You can find some symptoms of this imposition in the code, e.g., we have both `propagate_down` and `param_propagate_down`. It'll also come up when generating nets from Python.

Obviously changing this would be rather major. I don't want to propose any specific course of action at this time, but just to make a note and let other minds compute.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Treat bottoms and params more uniformly? #1474

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Treat bottoms and params more uniformly? #1474

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions