Skip to content

Conversation

@alykhantejani
Copy link
Contributor

Once merged we can probably close #101

torch/nn/init.py Outdated
return tensor
else:
fan_in, _ = _calculate_fan_in_and_fan_out(tensor)
std = gain * np.sqrt(1.0 / fan_in)

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

@alykhantejani
Copy link
Contributor Author

Darn, looks like the build fails because scipy isn't installed. I use these to do goodness of fit tests on the returned weights to a particular distribution. Do you know of alternative package I can use for this, if we don't want to have scipy as a requirement?

@soumith
Copy link
Contributor

soumith commented Feb 23, 2017

@alykhantejani you can keep scipy parts in the tests. Do the following:

at the top of test_nn.py add these lines:

TEST_SCIPY = True
try:
    import scipy
except ImportError:
    TEST_SCIPY = False

Then, in the tests where you have scipy references, to the tests, add the annotation:

@unittest.skipIf(not TEST_SCIPY, "SCIPY unavailable")
Here's an example: https://github.com/pytorch/pytorch/blob/master/test/test_nn.py#L673

Then, here: https://github.com/pytorch/pytorch/blob/master/.travis.yml#L21
Add the line: - travis_retry pip install scipy

torch/nn/init.py Outdated
return tensor.normal_(0, std)


def kaiming_uniform(tensor, gain=1):

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
return tensor
else:
fan_in, _ = _calculate_fan_in_and_fan_out(tensor)
std = gain * np.sqrt(1.0 / fan_in)

This comment was marked as off-topic.

test/test_nn.py Outdated
expected_std = gain * np.sqrt(2.0 / ((tensor_shape[1] + tensor_shape[0]) * receptive_field))
assert self._is_normal(input_tensor, 0, expected_std)

def test_kaiming_unifrom_errors_on_inputs_smaller_than_2d(self):

This comment was marked as off-topic.

@alykhantejani
Copy link
Contributor Author

@soumith The pip install scipy in .travis.yaml seems to be failing for the 2.x builds. Any idea why?

@soumith
Copy link
Contributor

soumith commented Feb 24, 2017

Copy link
Member

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks pretty good. I think the most important thing is to remove the numpy dependency, since PyTorch does not currently require numpy.

torch/nn/init.py Outdated
return tensor
else:
fan_in, _ = _calculate_fan_in_and_fan_out(tensor)
std = gain * np.sqrt(1.0 / fan_in)

This comment was marked as off-topic.

torch/nn/init.py Outdated
@@ -1,0 +1,240 @@
import numpy as np

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
"""Fills the input Tensor or Variable with values according to the method described in "Understanding the difficulty of training
deep feedforward neural networks" - Glorot, X. and Bengio, Y., using a uniform distribution.
The resulting tensor will have values sampled from U(-a, a) where a = gain * sqrt(2/(fan_in + fan_out))

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = np.prod(tensor.numpy().shape[2:])
receptive_field_size = reduce(mul, (tensor.numpy().shape[2:]))

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
raise ValueError("Only tensors with 2 or more dimensions are supported.")

flattened_shape = (tensor.size(0), int(np.prod(tensor.numpy().shape[1:])))
flattened_shape = (tensor.size(0), int(reduce(mul, tensor.numpy().shape[1:])))

This comment was marked as off-topic.

This comment was marked as off-topic.

test/test_nn.py Outdated
if rows > cols:
assert np.allclose(np.dot(flattened_tensor.T, flattened_tensor), np.eye(cols) * gain ** 2,
atol=1e-6)
assert torch.dist(torch.mm(flattened_tensor.t(), flattened_tensor),

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
else:
num_input_fmaps = tensor.size(1)
num_output_fmaps = tensor.size(0)
receptive_field_size = reduce(mul, (tensor.numpy().shape[2:]))

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

This comment was marked as off-topic.

torch/nn/init.py Outdated
if isinstance(tensor, Variable):
uniform(tensor.data, a=a, b=b)
return tensor
else:

This comment was marked as off-topic.

This comment was marked as off-topic.

@alykhantejani
Copy link
Contributor Author

alykhantejani commented Feb 26, 2017

I've answered most reviews, but I still think there are the following open questions:

  1. As @szagoruyko pointed out, in Kaiming initialization, there are two variants where one divides by nInputPlane*kw*kh, another by nOutputPlane*kw*kh (see here). The method in this PR divides by nInputPlane*kw*kh (fan_in). Do we want to support both? If so, what's the best way to modify the interface i.e. should we take an additional param use_fan_out=False?

  2. Should we remove the gain parameter from the xavier_* functions? I initially added it as this is what the Lasagne implementation has (used as reference). However, now not sure it's needed or useful?

  3. Is it still the preferred method to rebase and squash all commits? If so, I will do this once we are all happy (so no more commits go in)

@apaszke
Copy link
Contributor

apaszke commented Feb 26, 2017

  1. I think adding a use_fan_out flag makes sense. But if Kaiming says that he used nOutputPlane, I think we should use that as the default.
  2. No strong opinions on that. I can't see the value, but I don't have a problem either. It's already implemented so I think we can leave it.
  3. No need to squash yourself, we can do that on GitHub while merging the PR.

test/test_nn.py Outdated

fan_in = input_tensor.size(1)
if input_tensor.dim() > 2:
fan_in *= input_tensor[0][0].numel()

This comment was marked as off-topic.

This comment was marked as off-topic.

@szagoruyko
Copy link
Contributor

regarding fan_out, fan_in stabilizes forward activations, fan_out stabilizes gradients on backward, so it depends on architecture which one works better.
I think we should have a string argument so that we can extend it in future, eg. tf also has fan_avg = (fan_in + fan_out) / 2

@apaszke
Copy link
Contributor

apaszke commented Feb 26, 2017

String arguments sound good to me.

@alykhantejani
Copy link
Contributor Author

@szagoruyko @apaszke I've now added a mode kwarg to the kaiming_* initializers.

@apaszke apaszke closed this Mar 1, 2017
@apaszke apaszke reopened this Mar 1, 2017
@apaszke
Copy link
Contributor

apaszke commented Mar 1, 2017

@pytorchbot add to whitelist

@apaszke apaszke merged commit 37e0548 into pytorch:master Mar 1, 2017
@apaszke
Copy link
Contributor

apaszke commented Mar 1, 2017

Thank you!

@alykhantejani alykhantejani deleted the add_nn_init branch March 2, 2017 08:53
jjsjann123 pushed a commit to jjsjann123/pytorch that referenced this pull request May 19, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

weight initializations for conv2d and linear

7 participants