Fix PReLU #3420

AdarshSantoria · 2023-02-14T19:02:18Z

This is to make PReLU in working condition by applying some changes as discussed in #2777.

rcurtin

Thank you! There are a lot of layers to adapt, and I appreciate that you took the time to adapt this one. :)

We should add an entry to HISTORY.md for this. 👍

rcurtin · 2023-02-17T03:31:29Z

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

+
+  //! Set value of alpha to the one given by user.
+  // TODO: this doesn't even make any sense.  is it trainable or not?
+  // why is there userAlpha?  is that for initialization only?


This looks like a slightly frustrated comment I left some time ago 😄

We should probably figure it out now in this PR. :) I am not sure what the original intention of the code was---does the paper suggest that setting a controllable initial value of alpha is a good thing?

Also, this is unsafe: when a user deserializes a network from disk, this will overwrite alpha with whatever the userAlpha was. So either we need to ensure that userAlpha is always equal to what alpha(0) is when we serialize (and thus will also be equal when we deserialize), or find some other way to initialize it.

I know I am now leaving three comments here... sorry about that.

I realized the "clean" way to do this with the abstractions we have is to override CustomInitialize(); take a look at layer/layer.hpp or layer/batch_norm_impl.hpp for an example. That will allow setting the single weight of the network to userAlpha, and that will only be called when the network is being initialized---not when it's being just serialized or deserialized.

According to the paper mentioned in code, Yes, alpha is a trainable parameter and userAlpha is just for initialization. I am not sure but there is no need of using CustomInitialize(). I used alpha just like ratio in layer/dropout.hpp which is also trainable in the last commit. Please guide me if I am not able to understand.

This comment explains that the strategy here is incorrect. Please read it carefully and reference relevant code (and walk through the FFN code) if you don't believe me. We will need to use CustomInitialize().

rcurtin · 2023-02-17T03:35:03Z

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

+    userAlpha(userAlpha)
+{
+  alpha.set_size(WeightSize(), 1);
+  alpha(0) = userAlpha;


In the new API this is actually not valid---when the layer object is constructed, its trainable weights are not valid. In fact even the size given by WeightSize() is not valid. Everything about the layer is invalid until ComputeOutputDimensions() and then SetWeightsPtr() is called. So, the process of setting the weight value in alpha can't be done until SetWeightsPtr().

Just to clarify, in the constructors, we do not need to deal with alpha at all---we can leave it empty. It will be set by SetWeights() before the network is used. 👍

src/mlpack/tests/ann/layer/parametric_relu.cpp

rcurtin

Here are some clarifications of how the whole neural network infrastructure works; hopefully the comments are helpful in debugging the build issues. 👍

rcurtin · 2023-02-28T16:11:00Z

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

+    throw std::invalid_argument("PReLUType::CustomInitialize(): wrong "
+        "elements size!"); 
+  }
+  MakeAlias(alpha, W.memptr(), 1, 1);


Sorry if this is a little unclear, but consider the whole setup of the network to happen in two functions:

SetWeights(): this function should call the MakeAlias() to set the internally-held alpha matrix (which is a single element) to be an alias of whatever memory is given.

CustomInitialize(): this function should set the single value of the alpha matrix to the initialization value given by the user (userAlpha). This function will be called after SetWeights() is called, so there is no need to create the alias here.

rcurtin · 2023-02-28T16:12:41Z

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

+  ar(cereal::base_class<Layer<MatType>>(this));
+
+  ar(CEREAL_NVP(userAlpha));
+  ar(CEREAL_NVP(alpha));


There's no need to serialize alpha; the actual trained weights of the network (which includes alpha) are serialized by the FFN class. Here's how this works: when a network is deserialized (loaded), the FFN class will deserialize all the layers individually, and then deserialize the weights (all as one block), and then call SetWeights() on each layer to correctly set the layer weights. So you can see in that process that alpha will be correctly set in that case.

rcurtin · 2023-02-28T16:13:31Z

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

+    userAlpha(userAlpha)
+{
+  alpha.set_size(WeightSize(), 1);
+  alpha(0) = userAlpha;


Just to clarify, in the constructors, we do not need to deal with alpha at all---we can leave it empty. It will be set by SetWeights() before the network is used. 👍

rcurtin

Thanks for the fixes. This all looks good to me, just some really minor final comments. 👍

src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

rcurtin · 2023-03-01T17:08:27Z

src/mlpack/tests/ann/layer/parametric_relu.cpp

@@ -0,0 +1,96 @@
+/**
+ * @file tests/ann/layer/parametric_relu.cpp


Do you want to also remove the unadapted PReLU tests from not_adapted/?

Co-authored-by: Ryan Curtin <[email protected]>

rcurtin

Awesome, thanks for adapting this layer! No more comments from my side. 👍

mlpack-bot

Second approval provided automatically after 24 hours. 👍

HISTORY.md

rcurtin · 2023-03-08T14:14:41Z

Thanks again! 👍

Fix PReLU

232ed1c

mlpack-bot bot added s: needs review s: unanswered s: unlabeled labels Feb 14, 2023

rcurtin reviewed Feb 17, 2023

View reviewed changes

AdarshSantoria and others added 5 commits February 17, 2023 10:35

add new line at bottom

fc0afb8

code fix

531f065

Update HISTORY.md

c44e694

Merge branch 'master' into notadapted

f570466

add initialization and serialization

7d67b49

rcurtin reviewed Feb 28, 2023

View reviewed changes

fix code

78f519b

rcurtin reviewed Mar 1, 2023

View reviewed changes

AdarshSantoria and others added 2 commits March 1, 2023 22:55

Update src/mlpack/methods/ann/layer/parametric_relu_impl.hpp

9831d7a

Co-authored-by: Ryan Curtin <[email protected]>

remove similar tests

1e27655

rcurtin approved these changes Mar 6, 2023

View reviewed changes

mlpack-bot bot approved these changes Mar 7, 2023

View reviewed changes

rcurtin added c: methods t: bugfix and removed s: needs review s: unanswered s: unlabeled labels Mar 8, 2023

rcurtin reviewed Mar 8, 2023

View reviewed changes

HISTORY.md Outdated Show resolved Hide resolved

Update HISTORY.md

2ff0f57

rcurtin merged commit c6c1866 into mlpack:master Mar 8, 2023

AdarshSantoria mentioned this pull request Mar 17, 2023

Adapt softmin layer #3437

Merged

rcurtin mentioned this pull request Apr 27, 2023

Release version 4.1.0 #3476

Merged

		@@ -0,0 +1,96 @@
		/**
		* @file tests/ann/layer/parametric_relu.cpp

Uh oh!

Fix PReLU #3420

Fix PReLU #3420

Uh oh!

Conversation

AdarshSantoria commented Feb 14, 2023

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rcurtin left a comment

Choose a reason for hiding this comment

Uh oh!

mlpack-bot bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rcurtin commented Mar 8, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants