Skip to content

Conversation

@SherlockNoMad
Copy link
Contributor

@SherlockNoMad SherlockNoMad commented Oct 30, 2022

Stack from ghstack (oldest at bottom):

Fixes pytorch/torchdynamo#1802

There are a few problems,

  1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
  2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
  3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
UPDATE: This is expected behavior, see discussion below for explanation.

Without setting the default value for per_row_fake_quant and symmetric_quant, it gets the following error when running with meta tensor.

meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'

I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is None when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang who reported this bug

@pytorch-bot
Copy link

pytorch-bot bot commented Oct 30, 2022

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/88058

Note: Links to docs will display an error until the docs builds have been completed.

⏳ No Failures, 3 Pending

As of commit e3bd2a7:
💚 Looks good so far! There are no failures yet. 💚

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@github-actions
Copy link
Contributor

This PR needs a label

If your changes are user facing and intended to be a part of release notes, please use a label starting with release notes:.

If not, please add the topic: not user facing label.

For more information, see https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

@ezyang
Copy link
Contributor

ezyang commented Oct 31, 2022

from WC:

image

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label Oct 31, 2022
Fixes pytorch/torchdynamo#1802 

There are a few problems, 
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type? 

In particular, problem 3 is the most concerning one.

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor. 
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function. 

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site. 
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site. 
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'. 

I also verify that this bug also affect for other meta function with default args.... 

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing. 

cc ezyang 
cc bdhirsh. I know you had a fix in the symbolic shape branch... 
cc yanboliang  who reported this bug

[ghstack-poisoned]
SherlockNoMad added a commit that referenced this pull request Oct 31, 2022
ghstack-source-id: c595ad3
Pull Request resolved: #88058
@SherlockNoMad SherlockNoMad changed the title Fix args with default value for pyimpl kernel Fix args for meta__fused_moving_avg_obs_fq_helper Oct 31, 2022
@SherlockNoMad
Copy link
Contributor Author

@pytorchbot merge -g

@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks on your PR pass since you used the green (-g) flag (ETA: 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Nov 5, 2022
Fixes pytorch/torchdynamo#1802

There are a few problems,
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
**UPDATE: This is expected behavior, see discussion below for explanation.**

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor.
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang  who reported this bug
Pull Request resolved: pytorch#88058
Approved by: https://github.com/bdhirsh, https://github.com/yanboliang
kulinseth pushed a commit to kulinseth/pytorch that referenced this pull request Dec 10, 2022
Fixes pytorch/torchdynamo#1802

There are a few problems,
1. torch.fused_moving_avg_obs_fake_quant doesn't have OpInfo test
2. self.empty_like() is not a valid call. it should be torch.empty_like(self)
3. python meta function has some unexplained behavior for arguments with default value of bool type?

In particular, problem 3 is the most concerning one.
**UPDATE: This is expected behavior, see discussion below for explanation.**

Without setting the default value for `per_row_fake_quant` and `symmetric_quant`, it gets the following error when running with meta tensor.
```
meta__fused_moving_avg_obs_fq_helper() missing 2 required positional arguments: 'per_row_fake_quant' and 'symmetric_quant'
```
I can fix this by adding the default values to these two args. However, I observer something strange when examining the actual value in meta function.

```
    print("per_row_fake_quant", per_row_fake_quant)
    print("symmetric_quant", symmetric_quant)
```

When default values are False, printed value correctly reflect the args value populated from call site.
When default values are True, printed value is ALWAYS True, regardless of the populated value from call site.
When default Values are None, printed value is `None` when call site set the value to 'False', printed value is 'True' when call site sets the value to 'True'.

I also verify that this bug also affect for other meta function with default args....

My speculation is that this is something about pybind value packing when called from c++ dispatcher to python meta function, and default value parsing for python meta function (and other python dispatch functions) ?

I tried to find the c++ call stack, but gdb is missing symbols and C++ stacktrace is not working properly... Appreciate anyone who can point me to the source file for pybind value packing.

cc @ezyang
cc @bdhirsh. I know you had a fix in the symbolic shape branch...
cc @yanboliang  who reported this bug
Pull Request resolved: pytorch#88058
Approved by: https://github.com/bdhirsh, https://github.com/yanboliang
@facebook-github-bot facebook-github-bot deleted the gh/SherlockNoMad/64/head branch June 8, 2023 14:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ciflow/trunk Trigger trunk jobs on your pull request Merged topic: not user facing topic category

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants