You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Performance on the MNIST autoencoder demo is more or less on par with standard SGD+momentum but not as good as the Nesterov solver. The lack of a learning rate does seem to be a problem towards later iterations in that loss/accuracy don't entirely converge, but this could be due to an implementation issue.
Iteration 64500, Testing net (#0)
Test loss: 59.4627
Test net output #0: cross_entropy_loss = 59.4627 (* 1 = 59.4627 loss)
Test net output #1: l2_error = 1.82881
Iteration 64500, Testing net (#1)
Test loss: 59.7422
Test net output #0: cross_entropy_loss = 59.7422 (* 1 = 59.7422 loss)
Test net output #1: l2_error = 1.92399
Iteration 65000, loss = 62.1569
Iteration 65000, Testing net (#0)
Test loss: 60.7756
Test net output #0: cross_entropy_loss = 60.7756 (* 1 = 60.7756 loss)
Test net output #1: l2_error = 2.05861
Iteration 65000, Testing net (#1)
Test loss: 61.0705
Test net output #0: cross_entropy_loss = 61.0705 (* 1 = 61.0705 loss)
Test net output #1: l2_error = 2.15354
Adadelta requires the tracking of both gradient and update history. I chose to store both sequentially in "history_" for a couple of reasons, e.g. to reuse SnapshotSolverState() and RestoreSolverState().
All the tests pass, but a couple (those with multiple iterations) are ridiculously slow, even though the MNIST demo for example is not noticeably slower with Adadelta compared to the other solvers. I still need to look into this.
Performance on the MNIST autoencoder demo is more or less on par with
standard SGD+momentum but not as good as the Nesterov solver. The lack of a
learning rate does seem to be a problem towards later iterations in that
loss/accuracy don't entirely converge, but this could be due to an
implementation issue.
Iteration 64500, Testing net (#0)
Test loss: 59.4627
Test net output #0: cross_entropy_loss = 59.4627 (* 1 = 59.4627 loss)
Test net output #1: l2_error = 1.82881
Iteration 64500, Testing net (#1)
Test loss: 59.7422
Test net output #0: cross_entropy_loss = 59.7422 (* 1 = 59.7422 loss)
Test net output #1: l2_error = 1.92399
Iteration 65000, loss = 62.1569
Iteration 65000, Testing net (#0)
Test loss: 60.7756
Test net output #0: cross_entropy_loss = 60.7756 (* 1 = 60.7756 loss)
Test net output #1: l2_error = 2.05861
Iteration 65000, Testing net (#1)
Test loss: 61.0705
Test net output #0: cross_entropy_loss = 61.0705 (* 1 = 61.0705 loss)
Test net output #1: l2_error = 2.15354
A couple of things to note:
Adadelta requires the tracking of both gradient and update history.
I chose to store both sequentially in "history_" for a couple of reasons,
e.g. to reuse SnapshotSolverState() and RestoreSolverState.
All the tests pass, but a couple (those with multiple iterations)
are ridiculously slow, even though the MNIST demo for example is not
noticeably slower with Adadelta compared to the other solvers. I still need
to look into this.
slow tests: The slowness of the tests seems to have nothing to do with the number of iterations, but rather how I'm storing both update and gradient history. It's unreasonably slow even with for a small number of iterations. I haven't had a chance to look into it in detail
learning rate: Yes, I suspect that the lack of some global learning rate might be a problem. It could also be that the history needs to be randomly initialised, since the initial step size is otherwise too large and takes a while to decrease to a reasonable value. It could also be a bug. I would need to double check the implementation against the one in ConvNetJS, but I can only get around to it next week.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Initial implementation of the Adadelta solver as proposed in "ADADELTA: An Adaptive Learning Rate Method" (Zeiler, 2012). Motivation: http://cs.stanford.edu/people/karpathy/convnetjs/demo/trainers.html
Performance on the MNIST autoencoder demo is more or less on par with standard SGD+momentum but not as good as the Nesterov solver. The lack of a learning rate does seem to be a problem towards later iterations in that loss/accuracy don't entirely converge, but this could be due to an implementation issue.
(for comparison see: #741 (comment))
A couple of things to note: