Fix a bug of C++ L-BFGS optimizer #27606

auroraustc · 2019-10-09T13:37:32Z

Fixes #27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model:

terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245)

Add some if (!parameter.grad().defined()) {...} in the torch/csrc/api/src/optim/lbfgs.cpp

…re one or more registered tensors with no grad in the model.

yf225

Thanks a lot for the fix @auroraustc! I left some comments.

yf225 · 2019-10-09T20:25:29Z

torch/csrc/api/src/optim/lbfgs.cpp

  for (auto& parameter : parameters_) {
+    if (!parameter.grad().defined()) {
+      continue;
+    }


It seems that the Python implementation for this part is

pytorch/torch/optim/lbfgs.py

Lines 250 to 251 in 7c472ec

if p.grad is None:

view = p.new(p.numel()).zero_()

Do you think we can do the same for C++ implementation?

Thank you for your review! I have modified the code in torch/csrc/api/src/optim/lbfgs.cpp according to the code in python pytorch/torch/optim/lbfgs.py:

... if (!parameter.grad().defined()) { views.push_back(parameter.new_empty({parameter.numel()}).zero_()); } else if (parameter.grad().is_sparse()) { views.push_back(parameter.grad().to_dense().view(-1)); } else { views.push_back(parameter.grad().view(-1)); } ...

Now the test in #27605 can run without error on both CPU and GPU:

#Output Epoch: 0 Batch: 0 loss: 0.903995 [ Variable[CUDAFloatType]{} ] Epoch: 0 Batch: 0 loss: 1.413 [ Variable[CUDAFloatType]{} ] Epoch: 1 Batch: 0 loss: 0.731126 [ Variable[CUDAFloatType]{} ] Epoch: 1 Batch: 0 loss: 0.771402 [ Variable[CUDAFloatType]{} ] Epoch: 2 Batch: 0 loss: 0.482919 [ Variable[CUDAFloatType]{} ] Epoch: 2 Batch: 0 loss: 0.296223 [ Variable[CUDAFloatType]{} ] Epoch: 3 Batch: 0 loss: 0.22991 [ Variable[CUDAFloatType]{} ] Epoch: 3 Batch: 0 loss: 0.0694233 [ Variable[CUDAFloatType]{} ] Epoch: 4 Batch: 0 loss: 0.0918769 ...

However, another error occured (caused by torch::optim::LBFGS::add_grad) if there is a sparse tensor in the model:

terminate called after throwing an instance of 'c10::Error' what(): add(sparse, dense) is not supported. Use add(dense, sparse) instead. (add_out_sparse_cuda at /home/aurora/Softwares/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu:390)

I tested using python, and got a similar same error:

Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/lib/python3.5/dist-packages/torch/optim/lbfgs.py", line 427, in step self._add_grad(t, d) File "/usr/local/lib/python3.5/dist-packages/torch/optim/lbfgs.py", line 264, in _add_grad p.data.add_(step_size, update[offset:offset + numel].view_as(p.data)) RuntimeError: add(sparse, dense) is not supported. Use add(dense, sparse) instead.

The sparse tensor is added to the model by adding a sparse_fc3 tensor:

sparse_fc3 = register_parameter("sparse_fc3", (3 * torch::rand({128,1})).to(torch::kInt).to(torch::kFloat).to_sparse(), true);

and by adding

x = torch::mm(x, sparse_fc3.to_dense());

in the forward() function of the model.
I will try to work on this problem.

Thanks so much for the investigation! I think sparse tensor support is still experimental within PyTorch, and we will need to also fix the Python version of LBFGS to make the sparse use case work. I think this PR as it stands now fixes the problem mentioned in the original issue, and we can merge it now and fix the sparse tensor issue (if desired) in another PR.

yf225 · 2019-10-09T20:27:13Z

torch/csrc/api/src/optim/lbfgs.cpp

  for (auto& parameter : parameters_) {
+    if (!parameter.grad().defined()) {
+      continue;
+    }


I suspect that we don't need this change, because this function doesn't access parameter.grad() after the check.

yf225 · 2019-10-09T20:28:29Z

torch/csrc/api/src/optim/lbfgs.cpp

+    if (!parameter.grad().defined()) {
+      continue;
+    }
+


It would be awesome to add a minimal test for this fix, to show that the bug doesn't occur anymore after the fix. :D Thanks!

yf225

Thanks a lot for the fix @auroraustc!

facebook-github-bot

@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2019-10-10T22:40:10Z

@yf225 merged this pull request in f7d7c4b.

Summary: Fixes pytorch#27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model: ``` terminate called after throwing an instance of 'c10::Error' what(): There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view. This usually means that this function requires a non-empty list of Tensors. Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245) ``` Add some `if (!parameter.grad().defined()) {...}` in the ` torch/csrc/api/src/optim/lbfgs.cpp` Pull Request resolved: pytorch#27606 Differential Revision: D17866550 Pulled By: yf225 fbshipit-source-id: bcaf0bf75b93c57304856b03d8984c1617ebbfef

Fix a bug of C++ l-bfgs optimizer: optimizer will not work if there a…

6a3744a

…re one or more registered tensors with no grad in the model.

auroraustc requested review from ebetica, goldsborough and yf225 as code owners October 9, 2019 13:37

pytorchbot added the module: cpp Related to C++ API label Oct 9, 2019

yf225 reviewed Oct 9, 2019

View reviewed changes

Modify according to python lbfgs.py

240da64

yf225 approved these changes Oct 10, 2019

View reviewed changes

facebook-github-bot reviewed Oct 10, 2019

View reviewed changes

facebook-github-bot closed this in f7d7c4b Oct 10, 2019

facebook-github-bot added the merged label Oct 10, 2019

mruberry added the Merged label Oct 28, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix a bug of C++ L-BFGS optimizer #27606

Fix a bug of C++ L-BFGS optimizer #27606

Uh oh!

auroraustc commented Oct 9, 2019

Uh oh!

yf225 left a comment

Uh oh!

yf225 Oct 9, 2019

Uh oh!

auroraustc Oct 10, 2019

Uh oh!

yf225 Oct 10, 2019

Uh oh!

yf225 Oct 9, 2019

Uh oh!

yf225 Oct 9, 2019

Uh oh!

yf225 left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Oct 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Fix a bug of C++ L-BFGS optimizer #27606

Fix a bug of C++ L-BFGS optimizer #27606

Uh oh!

Conversation

auroraustc commented Oct 9, 2019

Uh oh!

yf225 left a comment

Choose a reason for hiding this comment

Uh oh!

yf225 Oct 9, 2019

Choose a reason for hiding this comment

Uh oh!

auroraustc Oct 10, 2019

Choose a reason for hiding this comment

Uh oh!

yf225 Oct 10, 2019

Choose a reason for hiding this comment

Uh oh!

yf225 Oct 9, 2019

Choose a reason for hiding this comment

Uh oh!

yf225 Oct 9, 2019

Choose a reason for hiding this comment

Uh oh!

yf225 left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Oct 10, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants