Skip to content

Conversation

@SeanNaren
Copy link

Should be ready to go! Added skip_input for RNNs (look here for information). Let me know of any feedback etc!

for x_layer, y_layer in zip(rnn.all_weights, weights_val):
for x, y in zip(x_layer, y_layer):
x.data.copy_(y.data)
if x is not None and y is not None:

This comment was marked as off-topic.

grad_output = torch.randn(batch, seq_length, hidden_size * num_directions)
grad_output = torch.randn(seq_length, batch, hidden_size * num_directions)
if skip_input:
input_val = torch.randn(seq_length, batch, hidden_size)

This comment was marked as off-topic.

for param_from, param_to in zip(layer_params_from, layer_params_to):
assert param_from.type() == param_to.type()
param_to.copy_(param_from)
if param_from is not None and param_to is not None:

This comment was marked as off-topic.

gh = F.linear(hidden, w_hh, b_hh)
i_r, i_i, i_n = gi.chunk(3, 1)
i_r, i_i, i_n = [x.squeeze(1) for x in gi.chunk(3, 1)]
h_r, h_i, h_n = gh.chunk(3, 1)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

grad_weight)
if self.skip_input:
grad_weight = [tuple(w for w in layer_grad_weight if w is not None)
for layer_grad_weight in grad_weight]

This comment was marked as off-topic.

hx, cx = hidden
gates = F.linear(input, w_ih, b_ih) + F.linear(hx, w_hh, b_hh)
ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
x_h = input.unsqueeze(1).expand(input.size(0), 4, input.size(1)) if w_ih is None else F.linear(input, w_ih, b_ih)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

ingate, forgetgate, cellgate, outgate = gates.chunk(4, 1)
x_h = input.unsqueeze(1).expand(input.size(0), 4, input.size(1)) if w_ih is None else F.linear(input, w_ih, b_ih)
gates = x_h + F.linear(hx, w_hh, b_hh)
ingate, forgetgate, cellgate, outgate = [x.squeeze(1) for x in gates.chunk(4, 1)]

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

gi = input.unsqueeze(1).expand(input.size(0), 3, input.size(1)) if w_ih is None else F.linear(input, w_ih, b_ih)
gh = F.linear(hidden, w_hh, b_hh)
i_r, i_i, i_n = gi.chunk(3, 1)
i_r, i_i, i_n = [x.squeeze(1) for x in gi.chunk(3, 1)]

This comment was marked as off-topic.

SeanNaren added 3 commits May 3, 2017 09:57
@SeanNaren
Copy link
Author

Back on trying to fix this, making some progress! I'm uncertain how to deal with the current issue; the bias is still being added in cudnn v6 on the input layer, when skip input is set to true which isn't the correct behaviour. @apaszke @ngimel what do you think is the best solution for this? (refer here for more info on this issue)!

Sean Naren added 2 commits May 12, 2017 15:15
* Fixes for skip rnn

* Fixes for RNN cells, patch cuDNN for true skip input behaviour
@SeanNaren
Copy link
Author

@apaszke, tests are passing but needs a review! Let me know of any feedback :)

Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good for the most part!

grad_output = make_noncontig(grad_output)
grad_hy = make_noncontig(grad_hy)
input_var = make_noncontig(input_val)
input_val = make_noncontig(input_val)

This comment was marked as off-topic.

assert param_from.type() == param_to.type()
param_to.copy_(param_from)
assert not ((param_from is None or param_from.dim() == 0) ^ (param_to is None or param_to.dim() == 0))
if not ((param_from is None or param_from.dim() == 0) and (param_to is None or param_to.dim() == 0)):

This comment was marked as off-topic.

if fn.skip_input:
params = get_parameters(fn, handle, w)
for layer_index in range(fn.num_directions):
params[layer_index][2].fill_(0)

This comment was marked as off-topic.

if fn.skip_input:
for layer_index in range(fn.num_directions):
params[layer_index][0] = None
params[layer_index][2] = None

This comment was marked as off-topic.

def LSTMCell(input, hidden, w_ih, w_hh, b_ih=None, b_hh=None):
if input.is_cuda:
igates = F.linear(input, w_ih)
igates = input.expand(4, input.size(0), input.size(1)).transpose(0, 1) if w_ih is None else F.linear(input,

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

return state(gi, gh, hidden) if b_ih is None else state(gi, gh, hidden, b_ih, b_hh)

gi = F.linear(input, w_ih, b_ih)
gi = input.expand(3, input.size(0), input.size(1)).transpose(0, 1).contiguous() if w_ih is None else \

This comment was marked as off-topic.

i_r, i_i, i_n = gi.chunk(3, 1)
h_r, h_i, h_n = gh.chunk(3, 1)
i_r, i_i, i_n = torch.unbind(gi.view(input.size(0), 3, -1), 1)
h_r, h_i, h_n = torch.unbind(gh.view(input.size(0), 3, -1), 1)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@SeanNaren
Copy link
Author

Made some changes as requested, but still have to figure out the unbind and fused rnn stuff!

@SeanNaren
Copy link
Author

SeanNaren commented May 17, 2017

Hey @apaszke any thoughts/feedback :)

EDIT: Removed unbind commands with chunk

@SeanNaren
Copy link
Author

I've modified the line to now work with chunks rather than unbind as previously implemented!

@SeanNaren
Copy link
Author

Can I get a status on the PR (blocking some deep speech stuff) :)

@SeanNaren
Copy link
Author

After speaking to peeps I'm going to close this PR in favour of a correct implementation of skip input until cuDNN addresses this! thanks @justinchiu :)

@SeanNaren SeanNaren closed this Jun 5, 2017
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request Aug 5, 2021
* Fix placement of block sync with halo loop
* hdiff test
jaglinux pushed a commit to jaglinux/pytorch that referenced this pull request Feb 16, 2022
zhuhong61 pushed a commit to zhuhong61/pytorch that referenced this pull request Jun 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants