"Reduce Failed to Synchronise" in F.binary_cross_entropy 

Since upgrading PyTorch to the master branch, I am occasionally receiving the following error:

```
/home/user/cuda-ubuntu-16.04-ec2/pytorch/aten/src/THCUNN/BCECriterion.cu:30: Acctype bce_functor<Dtype, Acctype>::operator()(Tuple) [with Tuple = thrust::detail::tuple_of_iterator_references<thrust::device_reference<float>, thrust::device_reference<float>, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type, thrust::null_type>, Dtype = float, Acctype = float]: block: [0,0,0], thread: [223,0,0] Assertion `input >= 0. && input <= 1.` failed.
Traceback (most recent call last):
  File "train_model.py", line 138, in <module>
    train_model(config)
  File "train_model.py", line 105, in train_model
    worker.train(train_loader, plot_lr=plot_lr, on_iter=on_iter)
  File "/home/user/src/worker.py", line 204, in train
    time_loss = F.binary_cross_entropy(time_pred, time_hist.float())
  File "/home/user/miniconda3/envs/cuda/lib/python3.6/site-packages/torch/nn/functional.py", line 1507, in binary_cross_entropy
    return torch._C._nn.binary_cross_entropy(input, target, weight, size_average, reduce)
RuntimeError: reduce failed to synchronize: device-side assert triggered
```

In this trace, `time_loss` is the output of a linear network with `nn.Sigmoid()` on the output, and `time_hist` is from a binary dataset, which I am confident is correct (because I can complete multiple epoch before it fails).

I haven't checked if `F.binary_cross_entropy_with_logits` fixes the issue.

System details:
- OS: Ubuntu 16.0.4
- PyTorch version: 0.4.0a0+55c64e5
- How you installed PyTorch (conda, pip, source): source
- Python version: Python 3.6.1
- CUDA/cuDNN version: CUDA release 9.0, V9.0.176 / CUDNN 7005
- GPU models and configuration: 4x Nvidia M60
- GCC version (if compiling from source): GCC 4.4.7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"Reduce Failed to Synchronise" in F.binary_cross_entropy #5560

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

"Reduce Failed to Synchronise" in F.binary_cross_entropy #5560

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions