-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Fix PReLU #3420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix PReLU #3420
Conversation
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! There are a lot of layers to adapt, and I appreciate that you took the time to adapt this one. :)
We should add an entry to HISTORY.md for this. 👍
|
|
||
| //! Set value of alpha to the one given by user. | ||
| // TODO: this doesn't even make any sense. is it trainable or not? | ||
| // why is there userAlpha? is that for initialization only? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a slightly frustrated comment I left some time ago 😄
We should probably figure it out now in this PR. :) I am not sure what the original intention of the code was---does the paper suggest that setting a controllable initial value of alpha is a good thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, this is unsafe: when a user deserializes a network from disk, this will overwrite alpha with whatever the userAlpha was. So either we need to ensure that userAlpha is always equal to what alpha(0) is when we serialize (and thus will also be equal when we deserialize), or find some other way to initialize it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know I am now leaving three comments here... sorry about that.
I realized the "clean" way to do this with the abstractions we have is to override CustomInitialize(); take a look at layer/layer.hpp or layer/batch_norm_impl.hpp for an example. That will allow setting the single weight of the network to userAlpha, and that will only be called when the network is being initialized---not when it's being just serialized or deserialized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the paper mentioned in code, Yes, alpha is a trainable parameter and userAlpha is just for initialization. I am not sure but there is no need of using CustomInitialize(). I used alpha just like ratio in layer/dropout.hpp which is also trainable in the last commit. Please guide me if I am not able to understand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment explains that the strategy here is incorrect. Please read it carefully and reference relevant code (and walk through the FFN code) if you don't believe me. We will need to use CustomInitialize().
| userAlpha(userAlpha) | ||
| { | ||
| alpha.set_size(WeightSize(), 1); | ||
| alpha(0) = userAlpha; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the new API this is actually not valid---when the layer object is constructed, its trainable weights are not valid. In fact even the size given by WeightSize() is not valid. Everything about the layer is invalid until ComputeOutputDimensions() and then SetWeightsPtr() is called. So, the process of setting the weight value in alpha can't be done until SetWeightsPtr().
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, in the constructors, we do not need to deal with alpha at all---we can leave it empty. It will be set by SetWeights() before the network is used. 👍
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some clarifications of how the whole neural network infrastructure works; hopefully the comments are helpful in debugging the build issues. 👍
| throw std::invalid_argument("PReLUType::CustomInitialize(): wrong " | ||
| "elements size!"); | ||
| } | ||
| MakeAlias(alpha, W.memptr(), 1, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry if this is a little unclear, but consider the whole setup of the network to happen in two functions:
SetWeights(): this function should call theMakeAlias()to set the internally-heldalphamatrix (which is a single element) to be an alias of whatever memory is given.CustomInitialize(): this function should set the single value of thealphamatrix to the initialization value given by the user (userAlpha). This function will be called afterSetWeights()is called, so there is no need to create the alias here.
| ar(cereal::base_class<Layer<MatType>>(this)); | ||
|
|
||
| ar(CEREAL_NVP(userAlpha)); | ||
| ar(CEREAL_NVP(alpha)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's no need to serialize alpha; the actual trained weights of the network (which includes alpha) are serialized by the FFN class. Here's how this works: when a network is deserialized (loaded), the FFN class will deserialize all the layers individually, and then deserialize the weights (all as one block), and then call SetWeights() on each layer to correctly set the layer weights. So you can see in that process that alpha will be correctly set in that case.
| userAlpha(userAlpha) | ||
| { | ||
| alpha.set_size(WeightSize(), 1); | ||
| alpha(0) = userAlpha; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to clarify, in the constructors, we do not need to deal with alpha at all---we can leave it empty. It will be set by SetWeights() before the network is used. 👍
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fixes. This all looks good to me, just some really minor final comments. 👍
| @@ -0,0 +1,96 @@ | |||
| /** | |||
| * @file tests/ann/layer/parametric_relu.cpp | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want to also remove the unadapted PReLU tests from not_adapted/?
Co-authored-by: Ryan Curtin <[email protected]>
rcurtin
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome, thanks for adapting this layer! No more comments from my side. 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Second approval provided automatically after 24 hours. 👍
|
Thanks again! 👍 |
This is to make PReLU in working condition by applying some changes as discussed in #2777.