Due to my fault of last implementation PR #1507 . :(
The logic of here is wrong.
torch/nn/_functions/thnn/activation.py
else:
grad_input = grad_output.masked_fill(input > ctx.threshold, 0)
return grad_input, None, None, None
And this will cause some innormal value in my application.
I will fix it after a while.