-
Notifications
You must be signed in to change notification settings - Fork 18.6k
Compute loss in the forward pass #209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
This now implements the PR as described above with all tests passing (and no additional lint errors -- haven't touched the current list because I believe @kloudkl fixed them in another PR). @shelhamer @Yangqing @sergeyk and anyone else feel free to discuss and merge if deemed appropriate. I did not end up using any extra memory to implement this. The only penalty ended up being a small amount of computational overhead in some of the loss layers, e.g. calls to blob data accessors and some loops are now repeated in both forward and backward pass. One minor but immediate benefit of this PR is that gradient check unit tests (by far the slowest ones) are sped up slightly since 2 of the 3 of the |
|
Sweet! This looks good to me. (Trivial history polishing: could you remove the placeholders fe9c28e and 21a9685?) |
|
Thanks for taking a look! Removed those commits. |
include/caffe/net.hpp
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about using default parameter value to keep backward compatibility of the public API?
const vector<Blob<Dtype>*>& ForwardPrefilled(Dtype* loss = NULL);|
Thanks for the suggestions @kloudkl - I already had methods that didn't take a loss pointer for backwards compatibility, but the NULL defaults are a cleaner solution. |
|
No longer ready to merge, some tests may be failing. |
|
Never mind - tests pass (must have run them in an inconsistent build state or something...) |
|
I agree with @jeffdonahue that computing the loss in the forward pass makes more sense, even if there is a bit of overhead. Actually it could pay off as other methods could benefit from that info. |
definitions appropriately
backward pass to compute analytic gradient (the thing being checked) now
but apparently not)
|
Merging this now. In case anyone was curious about the actual speedup of the gradient check unit tests, I'm posting the results of running results @ loss-in-forward-pass: results @ dev: This is a speedup of ~10%. |
Compute loss in the forward pass
Compute loss in the forward pass
…nd-value [DetectNetTransformation] Use 0 for background value instead of -0.5
Caffe currently computes loss in the backward pass. From looking at loss layer code and talking to @Yangqing, this was done to save a bit of time in the loss layer as intermediate computations are often shared between the gradient and loss computations. However, this could be remedied (at the cost of a bit of memory - probably pretty insignificant in the loss layers) by storing as needed these intermediate computations in the forward pass and reusing them in the backward pass.
The main motivating factors are:
Sometimes it may be useful to be able to compute the loss at training time without actually doing an entire backwards pass (e.g., if one wanted to perform backtracking line search to set the step size/learning rate).
The forward pass seems to me the natural place to perform loss computation as it is where the "inference" is supposed to happen.
I think the only cost to this change would be the aforementioned bit of memory in the loss layers, but feel free to discuss - maybe there are issues I've overlooked.