Adam solver by PatWie · Pull Request #2856 · BVLC/caffe

PatWie · 2015-08-04T09:52:26Z

This commit implements the Adam solver by Kingma et. al for CPU and
GPU. All solver parameters are defined in the caffe.proto. This also
adds an example for the MNIST dataset.

see issue #2827

Before merging, please review the code. I will add changes to this branch (and rebase) if there should be something to change.

shelhamer · 2015-08-04T18:31:48Z

@philkr could you review this if you have a chance?

philkr · 2015-08-04T18:50:39Z

src/caffe/proto/caffe.proto

Why not use momentum here, to be consistent with other solvers?

shelhamer · 2015-08-04T19:09:09Z

@PatWie thanks for the solver! All SGD solvers need gradient checks. See for instance the AdaGrad tests https://github.com/BVLC/caffe/blob/master/src/caffe/test/test_gradient_based_solver.cpp#L431-L483

philkr · 2015-08-04T19:09:39Z

src/caffe/solver.cpp

Why divide by stepsize here?

I think t is the epoch rather than the iteration by the definition of Caffe.

PatWie · 2015-08-04T19:14:26Z

@shelhamer Ah. i didn't realized that there is already a unit-test. I will add one of course.

philkr · 2015-08-04T19:21:14Z

src/caffe/solver.cpp

The three commands above can be written as a single caffe_cpu_axpby using beta1 instead of 0 and val_m_ instead of val_t_.

Ah I see. Blas is completely new for me. I will change this tomorrow at all places in my code.

Nanne · 2015-08-05T12:40:42Z

I think there's some confusion going on in the code due to the usage of the stepsize param. Which is made worse by using lr_policy "step" in the MNIST example.

The alpha/stepsize from the paper should be set via the base_lr, and used with lr_policy: "fixed" as I don't see any recommendations for changing alpha during training. This way you can also get rid of gamma and power in the prototxt (the latter wasn't being used anyway).

The stepsize param should only be used together with lr_policy "step", and if we already set the alpha via base_lr then it is not needed at all. t can just be iter_ +1, as it's afaik not needed to compute the effective stepsize. This also removes the need for a check if stepsize > 0 in the header.

Moreover, it makes sense to change the MNIST example to use the recommended value from the paper for the base_lr (0.001) and explicitly set momentum and momentum2 to 0.9 and 0.999 respectively, rather than relying on the default values.

PatWie · 2015-08-05T15:12:29Z

I applied all changes in memory usage, solver_mnist proto, t is now the current iteration instead of epoche. And there is a unit test.
There are still some trivial things todo:

update solver page in the wiki
implement learning rate heuristic 1/sqrt(t) for reproducing results from the paper

philkr · 2015-08-05T17:03:43Z

src/caffe/solver.cpp

A reference to a shared pointer is never good idea, just copy the pointer.

philkr · 2015-08-05T17:07:00Z

The solver looks good to me now. I haven't tested it though. I do think that caffe_ipow should be it's own PR if we really want to add it. Currently it just bloats this PR request and doesn't add any benefit (see https://en.wikipedia.org/wiki/Amdahl%27s_law).

jeffdonahue · 2015-08-05T20:20:08Z

src/caffe/solver.cpp

please fix indentation -- use 4 space indents when continuing from previous lines (https://google-styleguide.googlecode.com/svn/trunk/cppguide.html#Spaces_vs._Tabs)

ronghanghu · 2015-08-06T07:10:50Z

@PatWie Thank you for this great PR! I just added some comment on the code. Please fix indent to 4 space indents when continuing from previous lines, and add more test cases. After that, squash into one commit and I can merge.

ronghanghu · 2015-08-09T07:48:59Z

#2836 and #2866 introduced new conflicts to be resolved.

shelhamer added the focus label Aug 4, 2015

philkr reviewed Aug 4, 2015
View reviewed changes

src/caffe/proto/caffe.proto Outdated

Copy link
Copy Markdown

Contributor

philkr Aug 4, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not use momentum here, to be consistent with other solvers?

shelhamer mentioned this pull request Aug 4, 2015

Adaptive Solvers: AdaDelta, RMSprop, and ADAM #2860

Closed

3 tasks

philkr reviewed Aug 4, 2015
View reviewed changes

philkr reviewed Aug 5, 2015
View reviewed changes

src/caffe/solver.cpp Outdated

Copy link
Copy Markdown

Contributor

philkr Aug 5, 2015

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A reference to a shared pointer is never good idea, just copy the pointer.

ronghanghu added the RH label Aug 5, 2015

jeffdonahue reviewed Aug 5, 2015
View reviewed changes

ronghanghu mentioned this pull request Aug 7, 2015

RMSprop clean up and rebase #2867

Merged

Conversation

PatWie commented Aug 4, 2015

Uh oh!

shelhamer commented Aug 4, 2015

Uh oh!

philkr Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

shelhamer commented Aug 4, 2015

Uh oh!

philkr Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

PatWie Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

PatWie commented Aug 4, 2015

Uh oh!

philkr Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

PatWie Aug 4, 2015

Choose a reason for hiding this comment

Uh oh!

Nanne commented Aug 5, 2015

Uh oh!

PatWie commented Aug 5, 2015

Uh oh!

philkr Aug 5, 2015

Choose a reason for hiding this comment

Uh oh!

philkr commented Aug 5, 2015

Uh oh!

jeffdonahue Aug 5, 2015

Choose a reason for hiding this comment

Uh oh!

ronghanghu commented Aug 6, 2015

Uh oh!

ronghanghu commented Aug 9, 2015

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants