Compute loss in the forward pass #209

jeffdonahue · 2014-03-13T21:10:03Z

Caffe currently computes loss in the backward pass. From looking at loss layer code and talking to @Yangqing, this was done to save a bit of time in the loss layer as intermediate computations are often shared between the gradient and loss computations. However, this could be remedied (at the cost of a bit of memory - probably pretty insignificant in the loss layers) by storing as needed these intermediate computations in the forward pass and reusing them in the backward pass.

The main motivating factors are:

Sometimes it may be useful to be able to compute the loss at training time without actually doing an entire backwards pass (e.g., if one wanted to perform backtracking line search to set the step size/learning rate).
The forward pass seems to me the natural place to perform loss computation as it is where the "inference" is supposed to happen.

I think the only cost to this change would be the aforementioned bit of memory in the loss layers, but feel free to discuss - maybe there are issues I've overlooked.

jeffdonahue · 2014-03-14T06:16:47Z

This now implements the PR as described above with all tests passing (and no additional lint errors -- haven't touched the current list because I believe @kloudkl fixed them in another PR). @shelhamer @Yangqing @sergeyk and anyone else feel free to discuss and merge if deemed appropriate.

I did not end up using any extra memory to implement this. The only penalty ended up being a small amount of computational overhead in some of the loss layers, e.g. calls to blob data accessors and some loops are now repeated in both forward and backward pass.

One minor but immediate benefit of this PR is that gradient check unit tests (by far the slowest ones) are sped up slightly since 2 of the 3 of the Backward calls the code made before are eliminated --the ones used to compute numeric gradients by computing losses f(x+eps) and f(x-eps). (I'm running multiple Caffe training processes on this machine while running the tests so my numbers are likely unreliable, but the gradient unit tests seem to have sped up by 10+%.)

shelhamer · 2014-03-14T09:57:42Z

Sweet! This looks good to me.

(Trivial history polishing: could you remove the placeholders fe9c28e and 21a9685?)

jeffdonahue · 2014-03-14T17:34:39Z

Thanks for taking a look! Removed those commits.

kloudkl · 2014-03-15T07:17:50Z

include/caffe/net.hpp

How about using default parameter value to keep backward compatibility of the public API?

const vector<Blob<Dtype>*>& ForwardPrefilled(Dtype* loss = NULL);

jeffdonahue · 2014-03-15T20:08:23Z

Thanks for the suggestions @kloudkl - I already had methods that didn't take a loss pointer for backwards compatibility, but the NULL defaults are a cleaner solution.

jeffdonahue · 2014-03-15T20:10:57Z

No longer ready to merge, some tests may be failing.

jeffdonahue · 2014-03-15T20:46:38Z

Never mind - tests pass (must have run them in an inconsistent build state or something...)

sguada · 2014-03-17T04:23:50Z

I agree with @jeffdonahue that computing the loss in the forward pass makes more sense, even if there is a bit of overhead. Actually it could pay off as other methods could benefit from that info.
It also seems to me that test involving Gradient checks are 10%-15% faster, at least in my macbook pro.
Since this change affects many files, and will affect other PR in the way, we @shelhamer @sergeyk should think when is the best moment to merge it.

definitions appropriately

backward pass to compute analytic gradient (the thing being checked) now

but apparently not)

forward

jeffdonahue · 2014-03-21T19:52:09Z

Merging this now. In case anyone was curious about the actual speedup of the gradient check unit tests, I'm posting the results of running ./build/src/caffe/test/test_all.testbin --gtest_filter="*Gradient*" below:

results @ loss-in-forward-pass:

[==========] 76 tests from 30 test cases ran. (244648 ms total)
[  PASSED  ] 76 tests.

results @ dev:

[==========] 76 tests from 30 test cases ran. (272185 ms total)
[  PASSED  ] 76 tests.

This is a speedup of ~10%.

Compute loss in the forward pass

…nd-value [DetectNetTransformation] Use 0 for background value instead of -0.5

shelhamer added enhancement labels Mar 13, 2014

kloudkl reviewed Mar 15, 2014
View reviewed changes

sguada mentioned this pull request Mar 17, 2014

Refactor LayerParameter into per-layer Parameter messages #219

Merged

jeffdonahue added 11 commits March 19, 2014 12:37

change specification of forward/backward function and fix layer

748aaff

definitions appropriately

fix net_speed_benchmark so 'make all' works

aee5f54

make tests compile and pass

305e731

test_gradient_check_util: blobid -> blob_id

5e98253

gradient checker optimization with forward pass loss: only need to run

d54833e

backward pass to compute analytic gradient (the thing being checked) now

revert unnecessary reordering of lines in softmaxwithlosslayer backward

74ed9e0

remove accidentally added empty line

8a3f0c2

fix softmax loss layer bug; all tests pass

ed23b68

loss in forward pass for concat layer (thought i'd rebased to latest dev

44fbb82

but apparently not)

null pointer defaults for forward loss outputs

0551d93

post rebase fixes: images layer and padding layer compute loss in

a6ae5be

forward

jeffdonahue added a commit that referenced this pull request Mar 21, 2014

Merge pull request #209 from jeffdonahue/loss-in-forward-pass

e6ef9ca

Compute loss in the forward pass

jeffdonahue merged commit e6ef9ca into BVLC:dev Mar 21, 2014

jeffdonahue deleted the loss-in-forward-pass branch April 2, 2014 17:31

shelhamer mentioned this pull request Apr 5, 2014

Bugs fixed in the euclidean loss layer. #137

Closed

shelhamer mentioned this pull request May 20, 2014

Next: 0.999 #429

Merged

mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014

Merge pull request BVLC#209 from jeffdonahue/loss-in-forward-pass

c70c0ba

Compute loss in the forward pass

lukeyeager added a commit to lukeyeager/caffe that referenced this pull request Aug 18, 2016

Merge pull request BVLC#209 from lukeyeager/nvidia/detectnet-backgrou…

a5569bb

…nd-value [DetectNetTransformation] Use 0 for background value instead of -0.5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Compute loss in the forward pass #209

Compute loss in the forward pass #209

Uh oh!

jeffdonahue commented Mar 13, 2014

Uh oh!

jeffdonahue commented Mar 14, 2014

Uh oh!

shelhamer commented Mar 14, 2014

Uh oh!

jeffdonahue commented Mar 14, 2014

Uh oh!

kloudkl Mar 15, 2014

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

sguada commented Mar 17, 2014

Uh oh!

jeffdonahue commented Mar 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Compute loss in the forward pass #209

Compute loss in the forward pass #209

Uh oh!

Conversation

jeffdonahue commented Mar 13, 2014

Uh oh!

jeffdonahue commented Mar 14, 2014

Uh oh!

shelhamer commented Mar 14, 2014

Uh oh!

jeffdonahue commented Mar 14, 2014

Uh oh!

kloudkl Mar 15, 2014

Choose a reason for hiding this comment

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

jeffdonahue commented Mar 15, 2014

Uh oh!

sguada commented Mar 17, 2014

Uh oh!

jeffdonahue commented Mar 21, 2014

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants