In certain situations BCEWithLogitsLoss returns the wrong answer. This seems to happen in the case where the output has singleton dimensions (such as the typical case when it's the output of an nn.Linear in classification).
Snippet to reproduce
import torch
import torch.nn as nn
from torch.autograd import Variable
import numpy as np
sigmoid = nn.Sigmoid()
t = np.round(np.random.rand(64))
o = np.random.rand(64,1) - 0.5
t = Variable(torch.Tensor(t))
o = Variable(torch.Tensor(o))
print(nn.BCEWithLogitsLoss()(o, t))
print(nn.BCELoss()(sigmoid(o), t)) # Different numbers
o = np.random.rand(64) - 0.5
o = Variable(torch.Tensor(o))
print(nn.BCEWithLogitsLoss()(o, t))
print(nn.BCELoss()(sigmoid(o), t)) # Same numbers
Thanks :)