Fix #334: Improve the batch norm algorithm by zolkis · Pull Request #339 · webmachinelearning/webnn

zolkis · 2023-02-07T19:50:40Z

A draft PR for now. Needs review, though.
Depends on prior merge of #337 and possibly #329.
Please make general comments and discussion in #334, and syntax related comments in this PR.

Preview | Diff

zolkis · 2023-02-08T09:10:04Z

Note this is not supposed to build yet until #337 is merged.

zolkis · 2023-03-18T14:22:04Z

Fixed the issue of initialization vs execution for batch norm.
Included the layout tests from note to algorithmic steps.
Removed the statements about linking issues with #337. After that is merged, this PR will be re-checked and marked for review.

zolkis · 2023-03-21T22:28:48Z

I am preparing the PR for input() and constant(), and that changes this PR, too. Moving it back to WiP.

zolkis · 2023-03-22T11:28:50Z

Small changes done, removing WiP. Can be reviewed/merged. @huningxin PTAL.

wchao1115 · 2023-04-12T22:21:40Z

index.bs

+Normalize the tensor values of input features using [[Batch-Normalization]].
+Batch normalization modifies the *mean* and *variance* of the inputs of a layer's activation functions, using parameters learned during the training phase over mini-batches of the training set. The goal is to reduce the internal covariate shift of the inputs, hence allowing higher learning rates and regularizing the inputs.
+During training, a pair of scaling and biasing parameters *gamma* and *beta* are learned over the input distribution of the mini-batches of the training set. Also, a pair of *moving mean* and *moving variance* are calculated using a parameter called *momentum* and the averages of means and variances calculated over the mini-batches. During inference, the layer output is normalized using the learned *gamma* (used as *scale*) and *beta* (used as *bias*) parameters, as well as the *moving mean* (used as *mean*) and *moving variance* (used as *variance*) parameters.


Although it would be hard for me to precisely review the accuracy of these new statements about batchnorm, what I can say here is that this additional information is not necessarily needed in the context of describing the behavior of this operation during model inference.

For instance, the paragraph starts with "Batch normalization modifies the mean and variance...". This isn't true for the inference pass. The mean and variance values are simply given as inputs to the inference pass of batchnorm. They were previously computed during the training pass, but WebNN only concerns itself with inference operation and not training, therefore the behavior of how these values came to be maybe irrelevant. The only reason it is even mentioned in the original text that means and variance values are previously computed is only to draw a distinction with the other similar operation instanceNormalization where both value vectors are computed on the fly and only within a single batch during the inference pass.

My recommendation is to keep the original text as-is.

index.bs

wchao1115 · 2023-04-12T22:39:32Z

index.bs

-              - *axis*: an {{unsigned long}} scalar. The index to the feature count dimension of the input shape for which the mean and variance values are. Its value must be in the range [0, N-1] where N is the rank of input tensor. When it's not specified, the default value is 1.
-              - *epsilon*: a {{float}} scalar. A small value to prevent computational error due to divide-by-zero. The default value is 0.00001 when not specified.
+              - *axis*: a {{long}} scalar, the index of the feature dimension of the input shape, which is collapsed for calculating the mean and variance for all other input dimensions. The value is between 0 and the rank of the input tensor. When it's not specified, the default value is 1, corresponding to the channel (*"c"*) dimension in the *"nchw"* data layout.


axis is an unsigned long type. I believe that's what we agreed on, which is consistent with other operations with the same param. Why do we change it here to long?

Also, please keep the existing description of the axis param. The new description doesn't read well to me. For instance, the index is to the "feature count" dimension of the input shape, and not the "feature" dimension. Nor was it collapsed for the calculation of the means and variance values. Those wordings just seem wrong.

wchao1115 · 2023-04-12T22:44:17Z

index.bs

+              - *scale*: an {{MLOperand}}. The 1-D tensor (vector) of the scaling values (the *gamma* parameters learned during training), whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.
+              - *bias*: an {{MLOperand}}. The 1-D tensor (vector) of the bias values (the *beta* parameter learned during training) whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.


The additional text added here (about the gamma and beta param) aren't necessary and could well be confusing to readers since neither is a parameter of the batchnorm API.

wchao1115 · 2023-04-12T22:55:17Z

index.bs

+              - *axis*: a {{long}} scalar, the index of the feature dimension of the input shape, which is collapsed for calculating the mean and variance for all other input dimensions. The value is between 0 and the rank of the input tensor. When it's not specified, the default value is 1, corresponding to the channel (*"c"*) dimension in the *"nchw"* data layout.
+              - *scale*: an {{MLOperand}}. The 1-D tensor (vector) of the scaling values (the *gamma* parameters learned during training), whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.
+              - *bias*: an {{MLOperand}}. The 1-D tensor (vector) of the bias values (the *beta* parameter learned during training) whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.
+              - *epsilon*: a {{float}} scalar. A small value added to the variance of the batch in order to prevent computational error due to divide-by-zero. When it's not specified, the default value is 0.00001.


According to the batchnorm calculations, the epsilon value is added to the square of the corresponding variance value of the mini-batch being inspected. I think this is an extra information that may not be needed in the description of epsilon because the spec already provides a reference to the full paper (that includes all the calculations relevant to batchnorm). However, the additional wording being introduced here, in my opinion, may cause more confusion as it sounds like each of the precomputed variance value is adjusted by the epsilon amount, which will be incorrect from the mathematical aspect of the behavior. My recommendation is to keep the existing wording.

wchao1115 · 2023-04-12T23:01:02Z

index.bs

+            <div class="note">
+                The axis value designates the feature or channel count dimension of the input tensor.
+            </div>


This is redundant to what already stated above this section.

wchao1115 · 2023-04-12T23:05:47Z

index.bs

+        <div class="informalsteps">
+            1. If |options|.activation [=map/exists=], the implementation MAY use it to fuse it with the batch normalization.
+        </div>


If we feel compelled to add this note, there may be a better way to express it. The current wording doesn't read well because it doesn't explain what "fuse" in this context means, and what is "it" that the statement refers to. For example, a more complete wording may be "At the graph execution time, the implementation may choose to combine the execution of the standard batch normalization operation with the optional activation function provided by options.activation to improve the overall execution performance of the combined operation." etc.

I agree, but I think the use of "fuse" should be handled in a separate issue/PR, consistently over the whole spec.

zolkis · 2023-04-13T12:58:00Z

I will reboot this PR completely. Marking WiP.

zolkis · 2023-04-17T19:56:31Z

I have reset to the original text, and kept the new/moved text.
@wchao1115 @huningxin: it can be re-reviewed. Thanks for the patience.

Signed-off-by: Zoltan Kis <[email protected]>

zolkis · 2023-05-16T15:31:39Z

@wchao1115 PTAL - reduced the branch to one clean commit.

zolkis · 2023-06-13T14:50:36Z

Requested changes have been done in 0eea13e
Continued in / moved to #397 (changed the target branch from main to
https://github.com/webmachinelearning/webnn/tree/zk-conventions-integration

zolkis · 2023-06-13T14:59:07Z

Closing this PR, as it's moved to #397.

anssiko mentioned this pull request Feb 9, 2023

Use modern WebIDL and Infra standard conventions #210

Closed

zolkis changed the title ~~[WiP] Fix #334: Improve the batch norm algorithm~~ Fix #334: Improve the batch norm algorithm Mar 19, 2023

zolkis changed the title ~~Fix #334: Improve the batch norm algorithm~~ WiP: Fix #334: Improve the batch norm algorithm Mar 21, 2023

zolkis changed the title ~~WiP: Fix #334: Improve the batch norm algorithm~~ Fix #334: Improve the batch norm algorithm Mar 22, 2023

wchao1115 requested changes Apr 12, 2023

View reviewed changes

zolkis changed the title ~~Fix #334: Improve the batch norm algorithm~~ [WiP] Fix #334: Improve the batch norm algorithm Apr 13, 2023

zolkis changed the title ~~[WiP] Fix #334: Improve the batch norm algorithm~~ [Draft] Fix #334: Improve the batch norm algorithm Apr 13, 2023

zolkis force-pushed the batchnorm-algorithm branch 2 times, most recently from d74b0f2 to 710e5ba Compare April 17, 2023 19:54

zolkis changed the title ~~[Draft] Fix #334: Improve the batch norm algorithm~~ Fix #334: Improve the batch norm algorithm Apr 17, 2023

zolkis requested a review from wchao1115 April 20, 2023 13:39

zolkis force-pushed the batchnorm-algorithm branch from 710e5ba to 24fbd8d Compare May 16, 2023 15:25

Add the batch normalization algorithm

0eea13e

Signed-off-by: Zoltan Kis <[email protected]>

zolkis force-pushed the batchnorm-algorithm branch from 24fbd8d to 0eea13e Compare May 16, 2023 15:30

anssiko requested a review from huningxin May 24, 2023 06:26

zolkis mentioned this pull request Jun 13, 2023

Add the batch normalization algorithm #397

Merged

zolkis closed this Jun 13, 2023

zolkis deleted the batchnorm-algorithm branch November 12, 2024 13:57

		- scale: an {{MLOperand}}. The 1-D tensor (vector) of the scaling values (the gamma parameters learned during training), whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.
		- bias: an {{MLOperand}}. The 1-D tensor (vector) of the bias values (the beta parameter learned during training) whose size is equal to the size of the input dimension indexed by {{MLBatchNormalizationOptions/axis}}.

Comments

Conversation

zolkis commented Feb 7, 2023 • edited by pr-preview bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zolkis commented Feb 8, 2023

Uh oh!

zolkis commented Mar 18, 2023

Uh oh!

zolkis commented Mar 21, 2023

Uh oh!

zolkis commented Mar 22, 2023

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

wchao1115 Apr 12, 2023

Choose a reason for hiding this comment

Uh oh!

zolkis Apr 13, 2023

Choose a reason for hiding this comment

Uh oh!

zolkis commented Apr 13, 2023

Uh oh!

zolkis commented Apr 17, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zolkis commented May 16, 2023

Uh oh!

zolkis commented Jun 13, 2023

Uh oh!

zolkis commented Jun 13, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

zolkis commented Feb 7, 2023 •

edited by pr-preview bot

Loading

zolkis commented Apr 17, 2023 •

edited

Loading