Skip to content

Conversation

@suyashb95
Copy link
Contributor

@suyashb95 suyashb95 commented Oct 22, 2019

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyash458 Thanks so much for the awesome work! I left some comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyash458 Thanks so much for the awesome work! I think we can change the arguments in DropoutOptions to include p and inplace instead of rate to match the Python version better. And we can have using AlphaDropoutOptions = DropoutOptions; and then use AlphaDropoutOptions everywhere in the AlphaDropout code :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 Thanks for having a look! Adding p and inplace makes sense, I'll add the extra params to DropoutOptions. This'll affect DropoutImpl and FeatureDropoutImpl as well

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes and I think we should rename rate to p in DropoutOptions, and it is okay to break backward compatibility in favor of code simplicity and parity with Python API

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe we can write

Suggested change
inline Tensor alpha_dropout(const Tensor& input, const DropoutOptions& options, bool is_training) {
inline Tensor alpha_dropout(const Tensor& input, const AlphaDropoutOptions& options = {}, bool training = false) {

to match the Python names better and also more consistent with the naming of other functional options :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to express the same logic as in Python version:

    if p < 0. or p > 1.:
        raise ValueError("dropout probability has to be between 0 and 1, "
                         "but got {}".format(p))
    return (_VF.alpha_dropout_(input, p, training)
            if inplace
            else _VF.alpha_dropout(input, p, training))

Copy link
Contributor Author

@suyashb95 suyashb95 Oct 24, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering, wouldn't this bit of code in dropout.cpp take care of the check for the p value?

DropoutImplBase<Derived>::DropoutImplBase(const DropoutOptions& options_)
    : options(options_) {
  TORCH_CHECK(options.rate() >= 0, "Dropout rate must not be less than zero");
  TORCH_CHECK(options.rate() <= 1, "Dropout rate must not be greater than one");
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes I think it would take care of the check when the user is calling F::alpha_dropout from the AlphaDropout module, but the user might also call F::alpha_dropout directly, and they would be out of luck :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I've added the same checks in the F::alpha_dropout function

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this can be

Suggested change
return torch::alpha_dropout(input, options.rate(), this->is_training());
return F::alpha_dropout(input, options.p(), this->is_training());

to match the Python version even better :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to print both p and inplace, to match the Python version even better :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the awesome tests! It would be great if we could also add a test for the F::alpha_dropout(input) (i.e. no value passed for either options or training) use case :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, I missed that. Will add a test case for it

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your help!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be awesome to add a module test similar to

TEST_F(ModulesTest, Dropout) {
Dropout dropout(0.5);
torch::Tensor x = torch::ones(100, torch::requires_grad());
torch::Tensor y = dropout(x);
y.backward(torch::ones_like(y));
ASSERT_EQ(y.ndimension(), 1);
ASSERT_EQ(y.size(0), 100);
ASSERT_LT(y.sum().item<float>(), 130); // Probably
ASSERT_GT(y.sum().item<float>(), 70); // Probably
dropout->eval();
y = dropout(x);
ASSERT_EQ(y.sum().item<float>(), 100);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a test similar to the one for Dropout. From what I read in this paper, the sum of the output should lie between 40 and 130

New Dropout Technique. Standard dropout randomly sets an activation x to zero with probability 1 − q for 0 < q 6 1 . In order to preserve the mean, the activations are scaled by 1 /q during training. If x has mean E(x) = μ and variance Var(x) = ν , and the dropout variable d follows a binomial distribution B(1 ,q ) , then the mean E(1 /qdx ) = μ is kept. Dropout fits well to rectified linear units, since zero is in the low variance region and corresponds to the default value. For scaled exponential linear units, the default and low variance value is lim x →−∞ selu(x) = − λα = α ′ . Therefore, we propose “alpha dropout”, that randomly sets inputs to α ′ . The new mean and new variance is E(xd + α ′(1 − d)) = qμ + (1 − q)α ′ , and Var(xd + α ′(1 − d)) = q((1 − q)(α ′ − μ) 2 + ν) . We aim at keeping mean and variance to their original values after “alpha dropout”, in order to ensure the self-normalizing property even for “alpha dropout”. The affine transformation a (xd + α ′ (1 − d)) + b allows to determine parameters a and b such that mean and variance are kept to their values: E( a ( xd + α ′ (1 − d )) + b ) = μ and Var( a ( xd + α ′ (1 − d )) + b ) = ν . In contrast to dropout, a and b will depend on μ and ν , however our SNNs converge to activations with zero mean and unit variance. With μ = 0 and ν = 1 , we obtain a = (q + α′2q(1 − q )) − 1 / 2 and b = − ( q + α ′ 2 q (1 − q ) ) − 1 / 2 ((1 − q ) α ′ ) . The parameters a and b only depend on the dropout rate 1 − q and the most negative activation α

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for adding the test! :D

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: would be awesome to put this line above #include <torch/nn/functional/embedding.h>, following alphabetical order

@suyashb95
Copy link
Contributor Author

@yf225 I have added another commit with the required changes, lemme know if they look good!

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyash458 Thanks so much for the update! I left some minor comments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for your help!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To strictly match the Python version, maybe we could do the following instead:

TORCH_CHECK(
  options.p() >= 0 && options.p() <= 1,
  "dropout probability has to be between 0 and 1, but got ", options.p());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the Python version, I think we execute different code paths based on the value of options.inplace():

    return (_VF.alpha_dropout_(input, p, training)
            if inplace
            else _VF.alpha_dropout(input, p, training))

It would be awesome to do the same in C++ version as well (we might need to change the type of input from const Tensor& to Tensor in order to achieve this) :D

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Functions.h I see two declarations for alpha_dropout

static inline Tensor & alpha_dropout_(Tensor & self, double p, bool train) {...}
static inline Tensor alpha_dropout(const Tensor & self, double p, bool train) {...}

Guessing we have to use the first one for in-place operation?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we can remove the new line here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can put the default value for inplace as part of the TORCH_ARG declaration instead, something like

TORCH_ARG(bool, inplace) = false;

instead of specifying it as default value to the constructor argument. (Ideally we should do the same for p as well, but there are probably too many people using Dropout(0.5) directly, and we have to preserve it for better backward compatibility.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I'll change the TORCH_ARG declaration

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To strictly match the Python version, maybe we could do the following instead:

TORCH_CHECK(
  options.p() >= 0 && options.p() <= 1,
  "dropout probability has to be between 0 and 1, but got ", options.p());

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for adding the test! :D

@yf225
Copy link
Contributor

yf225 commented Oct 31, 2019

We will probably need to rebase the PR because of the conflicts :(

 - add `AlphaDropoutImpl` to `modules/dropout.h` and `modules/dropout.cpp`
 - add `functional/dropout.h` containing the `alpha_dropout` function
 - include `functional/dropout.h` in `nn/functional.h`
 - update `DropoutOptions` to have params `p` and `inplace` instead of `rate`- update `DropoutImpl` and `FeatureDropoutImpl` to use the new `DropoutOptions`
 - add `AlphaDropoutOptions` for use in `AlphaDropoutImpl`
 - update `torch::alpha_dropout` call to `F::alpha_dropout` in `modules/dropout.cpp`
 - add checks in `functional/dropout.cpp` to check the dropout rate
 - add functional/module/prettyprint tests and update existing `Dropout` tests to work with the new `DropoutOptions`
@suyashb95
Copy link
Contributor Author

Rebasing to resolve conflicts in test files

@suyashb95
Copy link
Contributor Author

@yf225 is the current build failing on Linux? I get an error saying

../caffe2/opt/onnxifi_op.cc: In function ‘void caffe2::{anonymous}::copyDescriptor(const caffe2::ExternalTensorDescriptor*, onnxTensorDescriptorV1*)’:
../caffe2/opt/onnxifi_op.cc:95:7: error: ‘onnxTensorDescriptorV1 {aka struct onnxTensorDescriptorV1}’ has no member named ‘isOffline’
   to->isOffline = from->isOffline;

Here's the complete stack trace

@suyashb95
Copy link
Contributor Author

@yf225 lemme know if this looks fine, I'll rebase to fix the conflicts then

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yf225 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@yf225 yf225 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@suyash458 Thanks a lot for the awesome work!

@yf225 yf225 changed the title C++/Python API Parity: add AlphaDropout C++/Python API Parity: add AlphaDropout and FeatureAlphaDropout Nov 19, 2019
@suyashb95
Copy link
Contributor Author

@suyash458 Thanks a lot for the awesome work!

Glad I could help!

@facebook-github-bot
Copy link
Contributor

@yf225 merged this pull request in e88d096.

@yf225 yf225 added the module: cpp Related to C++ API label Nov 20, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Merged module: cpp Related to C++ API

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants