Skip to content

Conversation

@jeffdonahue
Copy link
Contributor

Caffe currently computes loss in the backward pass. From looking at loss layer code and talking to @Yangqing, this was done to save a bit of time in the loss layer as intermediate computations are often shared between the gradient and loss computations. However, this could be remedied (at the cost of a bit of memory - probably pretty insignificant in the loss layers) by storing as needed these intermediate computations in the forward pass and reusing them in the backward pass.

The main motivating factors are:

  1. Sometimes it may be useful to be able to compute the loss at training time without actually doing an entire backwards pass (e.g., if one wanted to perform backtracking line search to set the step size/learning rate).

  2. The forward pass seems to me the natural place to perform loss computation as it is where the "inference" is supposed to happen.

I think the only cost to this change would be the aforementioned bit of memory in the loss layers, but feel free to discuss - maybe there are issues I've overlooked.

@jeffdonahue
Copy link
Contributor Author

This now implements the PR as described above with all tests passing (and no additional lint errors -- haven't touched the current list because I believe @kloudkl fixed them in another PR). @shelhamer @Yangqing @sergeyk and anyone else feel free to discuss and merge if deemed appropriate.

I did not end up using any extra memory to implement this. The only penalty ended up being a small amount of computational overhead in some of the loss layers, e.g. calls to blob data accessors and some loops are now repeated in both forward and backward pass.

One minor but immediate benefit of this PR is that gradient check unit tests (by far the slowest ones) are sped up slightly since 2 of the 3 of the Backward calls the code made before are eliminated --the ones used to compute numeric gradients by computing losses f(x+eps) and f(x-eps). (I'm running multiple Caffe training processes on this machine while running the tests so my numbers are likely unreliable, but the gradient unit tests seem to have sped up by 10+%.)

@shelhamer
Copy link
Member

Sweet! This looks good to me.

(Trivial history polishing: could you remove the placeholders fe9c28e and 21a9685?)

@jeffdonahue
Copy link
Contributor Author

Thanks for taking a look! Removed those commits.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about using default parameter value to keep backward compatibility of the public API?

const vector<Blob<Dtype>*>& ForwardPrefilled(Dtype* loss = NULL);

@jeffdonahue
Copy link
Contributor Author

Thanks for the suggestions @kloudkl - I already had methods that didn't take a loss pointer for backwards compatibility, but the NULL defaults are a cleaner solution.

@jeffdonahue
Copy link
Contributor Author

No longer ready to merge, some tests may be failing.

@jeffdonahue
Copy link
Contributor Author

Never mind - tests pass (must have run them in an inconsistent build state or something...)

@sguada
Copy link
Contributor

sguada commented Mar 17, 2014

I agree with @jeffdonahue that computing the loss in the forward pass makes more sense, even if there is a bit of overhead. Actually it could pay off as other methods could benefit from that info.
It also seems to me that test involving Gradient checks are 10%-15% faster, at least in my macbook pro.
Since this change affects many files, and will affect other PR in the way, we @shelhamer @sergeyk should think when is the best moment to merge it.

@jeffdonahue
Copy link
Contributor Author

Merging this now. In case anyone was curious about the actual speedup of the gradient check unit tests, I'm posting the results of running ./build/src/caffe/test/test_all.testbin --gtest_filter="*Gradient*" below:

results @ loss-in-forward-pass:

[==========] 76 tests from 30 test cases ran. (244648 ms total)
[  PASSED  ] 76 tests.

results @ dev:

[==========] 76 tests from 30 test cases ran. (272185 ms total)
[  PASSED  ] 76 tests.

This is a speedup of ~10%.

jeffdonahue added a commit that referenced this pull request Mar 21, 2014
@jeffdonahue jeffdonahue merged commit e6ef9ca into BVLC:dev Mar 21, 2014
@jeffdonahue jeffdonahue deleted the loss-in-forward-pass branch April 2, 2014 17:31
@shelhamer shelhamer mentioned this pull request May 20, 2014
mitmul pushed a commit to mitmul/caffe that referenced this pull request Sep 30, 2014
lukeyeager added a commit to lukeyeager/caffe that referenced this pull request Aug 18, 2016
…nd-value

[DetectNetTransformation] Use 0 for background value instead of -0.5
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants