-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Add high order grad support for Some operator #1507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
* master: Add F.normalize (pytorch#1467) Expose custom attributes from C++ functions (pytorch#1430) Add high order gradient support for Sigmoid (pytorch#1496)
torch/autograd/_functions/reduce.py
Outdated
| if ctx.dim is None: | ||
| input, = ctx.saved_variables | ||
| if ctx.norm_type == 2: | ||
| scale = (grad_output[0] / ctx.norm).data[0] |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/autograd/_functions/reduce.py
Outdated
| scale = grad_output[0] / self.norm ** (self.norm_type - 1) | ||
| return input.mul(pow).mul(scale) | ||
| pow = input.abs().pow(ctx.norm_type - 2) | ||
| scale = (grad_output[0] / ctx.norm ** (ctx.norm_type - 1)).data[0] |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That looks great! Thanks for the PR
torch/autograd/_functions/reduce.py
Outdated
| if ctx.dim is None: | ||
| input, = ctx.saved_variables | ||
| if ctx.norm_type == 2: | ||
| scale_v = (grad_output[0] / ctx.norm).expand_as(input) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/autograd/_functions/reduce.py
Outdated
| scale = grad_output[0] / self.norm ** (self.norm_type - 1) | ||
| return input.mul(pow).mul(scale) | ||
| pow = input.abs().pow(ctx.norm_type - 2) | ||
| scale_v = (grad_output[0] / ctx.norm ** (ctx.norm_type - 1)).expand_as(input) |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
torch/autograd/_functions/reduce.py
Outdated
| self.norm_type = norm_type | ||
| self.dim = dim | ||
| @staticmethod | ||
| def forward(ctx, input, norm_type=2, dim=None): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| return grad_input.mul(-1) | ||
|
|
||
|
|
||
| class Threshold(Function): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
| ) | ||
| else: | ||
| mask = input > ctx.threshold | ||
| grad_input = mask.type_as(grad_output) * grad_output |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
@apaszke Thank you for your suggestions All suggestions besides the last one I have tried using Here is my test code: def calc_gradient_penalty(netD, real_data, fake_data):
alpha = torch.rand(BATCH_SIZE, 1)
alpha = alpha.expand(real_data.size())
alpha = alpha.cuda() if use_cuda else alpha
interpolates = alpha * real_data + ((1 - alpha) * fake_data)
if use_cuda:
interpolates = interpolates.cuda()
interpolates = autograd.Variable(interpolates, requires_grad=True)
disc_interpolates = netD(interpolates)
gradients = autograd.grad(outputs=disc_interpolates, inputs=interpolates,
grad_outputs=torch.ones(disc_interpolates.size()).cuda() if use_cuda else torch.ones(disc_interpolates.size()),
create_graph=True, only_inputs=True, retain_graph=True)[0]
gradient_penalty = ((gradients.norm(2, dim=1) - 1) ** 2).mean() * LAMBDA
return gradient_penalty
use_cuda = False
BATCH_SIZE=256
LAMBDA = 0.1
DIM = 256
noise = torch.neg(torch.randn(BATCH_SIZE, 2))
if use_cuda:
noise = noise.cuda()
noisev = autograd.Variable(noise)
noise1 = torch.randn(BATCH_SIZE, 2)
if use_cuda:
noise1 = noise1.cuda()
noise1v = autograd.Variable(noise1)
netD = nn.Sequential(
nn.Linear(2, DIM),
nn.ReLU(True),
nn.Linear(DIM, DIM),
nn.ReLU(True),
nn.Linear(DIM, DIM),
nn.ReLU(True),
nn.Linear(DIM, 1),
)
netD.zero_grad()
print netD
gp = calc_gradient_penalty(netD, noisev.data, noise1v.data)
gp.backward()
for p in netD.parameters():
print p.gradThen I get an error: NotImplementedErrorTraceback (most recent call last)
<ipython-input-2-a841c9970f31> in <module>()
44 print netD
45 gp = calc_gradient_penalty(netD, noisev.data, noise1v.data)
---> 46 gp.backward()
47 for p in netD.parameters():
48 print p.grad
/home/users/gang.cao/env/lib/python2.7/site-packages/torch/autograd/variable.pyc in backward(self, gradient, retain_variables)
150 raise TypeError("gradient has to be a Tensor, Variable or None")
151 gradient = Variable(gradient, volatile=True)
--> 152 self._execution_engine.run_backward((self,), (gradient,), retain_variables)
153
154 def register_hook(self, hook):
/home/users/gang.cao/env/lib/python2.7/site-packages/torch/autograd/function.pyc in backward(*grad_outputs)
170 be the gradient w.r.t. the corresponding input.
171 """
--> 172 raise NotImplementedError
173
174
NotImplementedError: The only difference from current commit is the modification of |
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I looked into that masked_fill issue and it turns out that it was a bug (I've included a fix in #1506). Can you please change it as I said? The tests will fail now, but should be ok once my PR is merged.
| value, | ||
| inplace | ||
| ) | ||
| return output |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
|
Yes, I'll push that soon as well |
|
Can you please fix the conflicts? My branch is merged now. |
* master: (26 commits) Fix Linear function Fix comparison functions Expose variable attribute of AccumulateGrad Don't modify non-volatile grads in zero_grad Minor fix in Prod backward Add new flags to Variable.backward Replace retain_variables with retain_graph Improve output wrapping logic in autograd Remove spurious memo argument in Module.parameters() (pytorch#1527) Make torch.cat not synchronize the host and device Reference counting documentation. (pytorch#1520) Restore examples with keepdim=True default. Explicitly pass keepdim=False for tests that require it. Change keepdim default to False. Fix test_normalize NN test. Add a keepdim test to torch_test. Make (non-legacy) nn backwards compatible. Add autograd tests for keepdim Add documentation for keepdim. Change all legacy/nn modules to use keepdim=True (even if tests don't fail). ... # Conflicts: # torch/autograd/_functions/reduce.py # torch/autograd/variable.py
|
Hi @apaszke , current commit maybe still something wrong with the With the same test code above. I got another error |
|
Hi, @apaszke when will this PR be merged? |
|
I've been busy working on other things. I'll try to review it this weekend |
|
@pytorchbot test this please |
|
Thank you! |
|
Hi, @caogang @apaszke, I am trying the WGAN-GP implementation with pytorch installed from the latest git, but still got the same error @caogang. Any idea how to solve it?? TypeError: torch.addmm received an invalid combination of arguments - got (int, torch.ByteTensor, int, torch.ByteTensor, torch.FloatTensor, out=torch.ByteTensor), but expected one of:
* (torch.ByteTensor source, torch.ByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (torch.ByteTensor source, torch.SparseByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (int beta, torch.ByteTensor source, torch.ByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (torch.ByteTensor source, int alpha, torch.ByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (int beta, torch.ByteTensor source, torch.SparseByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (torch.ByteTensor source, int alpha, torch.SparseByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
* (int beta, torch.ByteTensor source, int alpha, torch.ByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
didn't match because some of the arguments have invalid types: (int, torch.ByteTensor, int, torch.ByteTensor, !torch.FloatTensor!, out=torch.ByteTensor)
* (int beta, torch.ByteTensor source, int alpha, torch.SparseByteTensor mat1, torch.ByteTensor mat2, *, torch.ByteTensor out)
didn't match because some of the arguments have invalid types: (int, torch.ByteTensor, int, !torch.ByteTensor!, !torch.FloatTensor!, out=torch.ByteTensor) |
|
@EthanZhu90 , this may be a bug existing in current branch. torch/nn/_functions/thnn/activation.py
else:
+ mask = input > ctx.threshold
+ grad_input = mask.type_as(grad_output) * grad_output
- grad_input = grad_output.masked_fill(input > ctx.threshold, 0)
return grad_input, None, None, None |
…d49783 (pytorch#12676) Summary: Pull Request resolved: pytorch#12676 Previous import was 06f6d63d5529e3a94533c9f34c402be1793420b1 Included changes: - **[1cbe274](onnx/onnx@1cbe274)**: fix the optimizer (pytorch#1510) <Lu Fang> - **[481ad99](onnx/onnx@481ad99)**: Fix TensorProto int32_data comment (pytorch#1509) <Lutz Roeder> - **[f04fbe0](onnx/onnx@f04fbe0)**: fix ninja external (pytorch#1507) <Rui Zhu> Reviewed By: jamesr66a, wanchaol Differential Revision: D10388438 fbshipit-source-id: ebc67073ca64daae0591873fcfeadc9885308ef5
…d49783 (#12676) Summary: Pull Request resolved: #12676 Previous import was 06f6d63d5529e3a94533c9f34c402be1793420b1 Included changes: - **[1cbe274](onnx/onnx@1cbe274)**: fix the optimizer (#1510) <Lu Fang> - **[481ad99](onnx/onnx@481ad99)**: Fix TensorProto int32_data comment (#1509) <Lutz Roeder> - **[f04fbe0](onnx/onnx@f04fbe0)**: fix ninja external (#1507) <Rui Zhu> Reviewed By: jamesr66a, wanchaol Differential Revision: D10388438 fbshipit-source-id: 298100589ce226c63d4e58edf185c9227fd52c85
For high order grad support in new-style function, solving the issue #1483 . I have finished the features: