Skip to content

Conversation

@auroraustc
Copy link
Contributor

Fixes #27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model:

terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245)

Add some if (!parameter.grad().defined()) {...} in the torch/csrc/api/src/optim/lbfgs.cpp

…re one or more registered tensors with no grad in the model.
@pytorchbot pytorchbot added the module: cpp Related to C++ API label Oct 9, 2019
Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix @auroraustc! I left some comments.

for (auto& parameter : parameters_) {
if (!parameter.grad().defined()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that the Python implementation for this part is

if p.grad is None:
view = p.new(p.numel()).zero_()

Do you think we can do the same for C++ implementation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your review! I have modified the code in torch/csrc/api/src/optim/lbfgs.cpp according to the code in python pytorch/torch/optim/lbfgs.py:

...
    if (!parameter.grad().defined()) {
      views.push_back(parameter.new_empty({parameter.numel()}).zero_());
    }
    else if (parameter.grad().is_sparse()) {
      views.push_back(parameter.grad().to_dense().view(-1));
    }
    else {
      views.push_back(parameter.grad().view(-1));
    }
...

Now the test in #27605 can run without error on both CPU and GPU:

#Output
Epoch: 0 Batch: 0 loss: 0.903995
[ Variable[CUDAFloatType]{} ]
Epoch: 0 Batch: 0 loss: 1.413
[ Variable[CUDAFloatType]{} ]
Epoch: 1 Batch: 0 loss: 0.731126
[ Variable[CUDAFloatType]{} ]
Epoch: 1 Batch: 0 loss: 0.771402
[ Variable[CUDAFloatType]{} ]
Epoch: 2 Batch: 0 loss: 0.482919
[ Variable[CUDAFloatType]{} ]
Epoch: 2 Batch: 0 loss: 0.296223
[ Variable[CUDAFloatType]{} ]
Epoch: 3 Batch: 0 loss: 0.22991
[ Variable[CUDAFloatType]{} ]
Epoch: 3 Batch: 0 loss: 0.0694233
[ Variable[CUDAFloatType]{} ]
Epoch: 4 Batch: 0 loss: 0.0918769
...

However, another error occured (caused by torch::optim::LBFGS::add_grad) if there is a sparse tensor in the model:

terminate called after throwing an instance of 'c10::Error'
  what():  add(sparse, dense) is not supported. Use add(dense, sparse) instead. (add_out_sparse_cuda at /home/aurora/Softwares/pytorch/aten/src/ATen/native/sparse/cuda/SparseCUDATensorMath.cu:390)

I tested using python, and got a similar same error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/lbfgs.py", line 427, in step
    self._add_grad(t, d)
  File "/usr/local/lib/python3.5/dist-packages/torch/optim/lbfgs.py", line 264, in _add_grad
    p.data.add_(step_size, update[offset:offset + numel].view_as(p.data))
RuntimeError: add(sparse, dense) is not supported. Use add(dense, sparse) instead.

The sparse tensor is added to the model by adding a sparse_fc3 tensor:

sparse_fc3 = register_parameter("sparse_fc3", (3 * torch::rand({128,1})).to(torch::kInt).to(torch::kFloat).to_sparse(), true);

and by adding

x = torch::mm(x, sparse_fc3.to_dense());

in the forward() function of the model.
I will try to work on this problem.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the investigation! I think sparse tensor support is still experimental within PyTorch, and we will need to also fix the Python version of LBFGS to make the sparse use case work. I think this PR as it stands now fixes the problem mentioned in the original issue, and we can merge it now and fix the sparse tensor issue (if desired) in another PR.

for (auto& parameter : parameters_) {
if (!parameter.grad().defined()) {
continue;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that we don't need this change, because this function doesn't access parameter.grad() after the check.

if (!parameter.grad().defined()) {
continue;
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to add a minimal test for this fix, to show that the bug doesn't occur anymore after the fix. :D Thanks!

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for the fix @auroraustc!

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@yf225 merged this pull request in f7d7c4b.

thiagocrepaldi pushed a commit to thiagocrepaldi/pytorch that referenced this pull request Feb 4, 2020
Summary:
Fixes pytorch#27605: The C++ L-BFGS Optimizer will not work properly if there are one or more registered tensors with no grad in the model:
```
terminate called after throwing an instance of 'c10::Error'
  what():  There were no tensor arguments to this function (e.g., you passed an empty list of Tensors), but no fallback function is registered for schema aten::view.  This usually means that this function requires a non-empty list of Tensors.  Available functions are [CUDATensorId, QuantizedCPUTensorId, VariableTensorId, CPUTensorId, MkldnnCPUTensorId] (lookup_ at /pytorch/aten/src/ATen/core/dispatch/DispatchTable.h:245)
```

Add some `if (!parameter.grad().defined()) {...}` in the ` torch/csrc/api/src/optim/lbfgs.cpp`
Pull Request resolved: pytorch#27606

Differential Revision: D17866550

Pulled By: yf225

fbshipit-source-id: bcaf0bf75b93c57304856b03d8984c1617ebbfef
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpp Related to C++ API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

C++ L-BFGS optimizer not working for models containing one or more registered parameters with requires_grad=false

5 participants