Add support for ONNX-only and Caffe2 ONNX export by thiagocrepaldi · Pull Request #4120 · facebookresearch/detectron2

thiagocrepaldi · 2022-04-01T21:33:59Z

Summary of changes

This PR fixes both ONNX-only and Caffe2 ONNX exporters for the latest versions of this repo and PyTorch.

For ONNX-only, the main issue is that add_export_config(cfg) is not exposed when Caffe2 is not compiled along with PyTorch, but for ONNX-only scenarios, such dependency is not needed. Therefore, add_export_config is moved from detectron2/export/api.py to detectron2/export/__init__.py

A second contribution is a new test_export_onnx.py test file that export almost the same models as the test_export_tracing.py tests.

For the Caffe2-ONNX, the main issue was a dependency on ONNX optimizer pass which is deprecated in newer ONNX versions.
This PR removes such dependency because fuse_bn_into_conv optimization pass is already performed by torch.onnx.export anyway.

Fixes #3488
Fixes pytorch/pytorch#69674 (PyTorch repo)

tests/test_export_onnx.py

FrancescoMandru · 2022-04-15T11:52:14Z

Hi, any update on this merge request?

thiagocrepaldi · 2022-04-15T12:49:52Z

Hi, any update on this merge request?

This PR will probably need to be split in 2. One that fixes the PointRend model (and requires a new symbolic ::grid_sampler) and a second that fixes everything else. The former should take some time, but the second should be fairly quick

FrancescoMandru · 2022-04-15T18:39:28Z

Hi, any update on this merge request?

This PR will probably need to be split in 2. One that fixes the PointRend model (and requires a new symbolic ::grid_sampler) and a second that fixes everything else. The former should take some time, but the second should be fairly quick

Hope this new feature will come soon! If I can help you for something don't esitate to contact me

FrancescoMandru · 2022-04-26T11:52:14Z

Hi, any upsate on this feature?

thiagocrepaldi · 2022-04-26T13:40:36Z

I am working on this today. If you know how to add new onnx symbolics to pytorch, you could try working on adding grid_sampler

thiagocrepaldi · 2022-04-27T00:31:03Z

#4132 breaks ONNX export. Root cause is related to how def to() symbolic is implemented or how detectron creates to(device) on their code

thiagocrepaldi · 2022-04-27T17:45:20Z

I am working on this today. If you know how to add new onnx symbolics to pytorch, you could try working on adding grid_sampler

pytorch/pytorch#76159 adds grid_sample to PyTorch, so we should have this working very soon

thiagocrepaldi · 2022-04-27T22:56:38Z

We need pytorch/pytorch#76498 to fix ONNX export after #4132 was merged

We also need pytorch/pytorch#76159 to add ::grid_sapler symbolic

After both of these, this PR is ready to go

thiagocrepaldi · 2022-04-28T18:44:39Z

@garymm somehow I cannot request another review from you through GH's UI, but I have addressed your comments

garymm

Thanks!

detectron2/export/api.py

detectron2/utils/testing.py

tests/test_export_onnx.py

tests/test_export_torchscript.py

facebook-github-bot · 2022-05-27T05:03:29Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T05:06:41Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T05:09:48Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T05:17:42Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T05:21:49Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:23:11Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:25:00Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:37:12Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:42:55Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:55:19Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T06:56:21Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T07:06:27Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T07:16:13Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T07:34:19Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T07:50:11Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T07:58:42Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T08:07:46Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-27T08:21:31Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

The main root cause was that torch.nn.GroupNorm changed its behavior on newer pyTorch versions (pytorch/pytorch#74293). this PR fixes the RetinaNetHead according to the new behavior Also, add_export_config was not exposed when caffe2 was not compiled along with pytorch, preventing users with legacy scripts to call it. This PR exposes it and adds a warning message informing users about its deprecation on future versions This PR also adds torch_export_onnx.py with several unit tests for detectron2 models. ONNX is added as dependency to the CI to allow the aforementioned tests to run Lastly, because detectron2 CI still uses old pytorch versions (1.8, 1.9 and 1.10), this PR adds custom symbolic functions to fix bugs on these versions

facebook-github-bot · 2022-05-27T09:27:26Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

thiagocrepaldi · 2022-05-27T09:47:21Z

Now I found this PR has many unreviewed changes and frankly I can see some of them are wrong. Could you move back to a state where it only contains reviewed changes? IIRC I only reviewed two PRs (for ONNX export and caffe2 export). FYI, @FrancescoMandru is not a developer of the project and just having their approval is not enough.

I fully understand that the delay in merging is not a good experience. But adding more changes to make a super huge PR is not helping. Let's just wait for maintainers when they have a chance.

Cheers @ppwwyyxx! Sorry for the trouble, my goal was to simplify the merge process by combining both approved PRs #4120/#4153 which were waiting for the merge. In the meantime, I found a couple minor bugs that I decided to append so that we can provide an end-to-end ONNX export implementation that works out of the box.

Instead of reverting it, I've just rebased the PR, splitting it in 5 commits with detailed explanation in each of them.
In summary, the first 2 are the same changes you have approved before (for ONNX and Caffe2). The next three fixes the ONNX export, remove unnecessary output from ONNX graph (and ONNX graph only) and force tracing of heatmaps_to_keypoints(), which is OK because ONNX uses dynamic_axes argument to ensure the graph will accept dynamic inputs

I kindly ask you that you give it a shot in reviewing them, and if you still think that I should split them again, I will do it.

Happy friday

FrancescoMandru · 2022-05-27T09:54:50Z

@FrancescoMandru is not a developer of the project and just having their approval is not enough.

Definitely true, I'm trying to help @thiagocrepaldi as far as I can!

This commit fixes Caffe2Tracer.export_caffe2 API for the latest pytorch repo Before this change, all tests at tests/test_export_caffe2.py fail because the latest onnx releases does not have onnx.optimizer namespace anymore and detectron2 code tried to use fuse_bn_into_conv from it. However, fuse_bn_into_conv optimization previously present in onnx.optimizer is already performed by torch.onnx.export. Therefore it wwas removed from detectron2 code. Depends on pytorch/pytorch#75718 Fixes facebookresearch#3488 Fixes pytorch/pytorch#69674 (PyTorch repo)

Although facebookresearch#4120 exported a valid ONNX graph, after running it with ONNX Runtime, I've realized that all model inputs were exported as constants. The root cause was that at detectron2/structures/image_list.py, the padding of batched images were done on a temporary (copy) variable, which the ONNX tracer cannot track the user input, leading in suppressing inputs and storing them as constant during export. Also, len(torch.Tensor) is traced as a constant shape, which poses a problem if the next input has different shape. This PR fixes that by using torch.Tensor.shape[0] instead

TracingAdapter creates extra outputs (through flatten_to_tuple) to store metadata to rebuild the original data format. This is unnecessary during ONNX export as the original data will never be reconstructed to its original format using Schema.__call__ API. This PR suppresses such extra output constants during torch.onnx.export() execution. Outside this API, the behavior is not changed, ensuring BC. Although not stricly necessary to achieve the same numerical results as PyTorch, when a ONNX model schema is compared to PyTorch's, the diffrent number of outputs (ONNX model will have more outputs than PyTorch) may not only confuse users, but also result in false negative when coding model comparison helpers.

PyTorch ONNX export has some limitations when exporting mdoels with torch.jit.script. This PR suppresses the `torch.jit.script_if_tracing` when torch.onnx.export is in execution. Outside this API context, the original behavior is maintained to ensure BC This fixes detectron2.modeling.roi_heads.KRCNNConvDeconvUpsampleHead ONNX export

facebook-github-bot · 2022-05-27T17:17:05Z

@thiagocrepaldi has updated the pull request. You must reimport the pull request before landing.

ppwwyyxx · 2022-05-27T19:09:11Z

detectron2/modeling/meta_arch/retinanet.py

+            try:
+                norm_name = str(type(get_norm(norm, 1)))
+                if "BN" in norm_name:
+                    logger.warning(
+                        f"Shared BatchNorm (type={norm_name}) may not work well in RetinaNetHead."
+                    )
+            except ValueError:
+                # https://github.com/pytorch/pytorch/pull/74293
+                if isinstance(norm, str):
+                    logger.warning(f"No BatchNorm was found for '{norm}' for RetinaNetHead.")


Please undo this change. There is no reason to ignore the exception.

ppwwyyxx

The first commit cb755fe#diff-362c43f324b3aaa4d55eae710f24245e361fd9be77cb2ecaded84672ef88b0d5R356-R365 looks good but there is a problematic change that I don't remember have reviewed. Added a comment.

The second commit 8f86528 looks OK, but like I said caffe2 is deprecated so there is a chance that it may not get accepted. That should be a decision separated from the ONNX-only export so it's better to be a separate PR.

The third commit b352172 appears to be fixing something but I cannot see what problem it's trying to fix so I cannot accept it.
If it is trying to fix a caffe2 export correctness issue please put it in the caffe2-onnx PR, and in that case it's less likely to be accepted because it's more work for a deprecated functionality.
If it's trying to fix a onnx-only export, correctness issue, please put it in this onnx-only export PR and add a test to show what problems it's trying to fix.

Similarly I don't understand what the 4th commit f108953 is trying to do so I cannot accept it.
It seems to be about a user experience improvement but not a bug -- in that case then it should be a separate PR. Also I expect to see demonstration of the problem first, before jumping to solutions - there is a chance that the extra outputs are useful and should not be removed, but I cannot tell if I don't know what the problem is.

I cannot tell without running some tests myself whether the 5th commit 6fabe50 is correct or not. I strongly suspect it's incorrect, because the scripted function has a forloop with dynamic length which I assumed ONNX tracing cannot handle. But I don't know about "dynamic_axes" and might be wrong here. Anyway that's a conversion better happen separately as well, because this does not affect the basic (fast/mask r-cnn) models.

In summary, all of the unreviewed changes will need more iterations. Having them together will slow down the merge.

thiagocrepaldi · 2022-06-06T22:47:28Z

Divided into 3 different PRs:

#4291
#4295
#4315

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Apr 1, 2022

thiagocrepaldi force-pushed the main branch from 0f72f1d to c92024a Compare April 11, 2022 15:34

thiagocrepaldi changed the title ~~Fix caffe2 export support~~ Fix ONNX-only export support Apr 11, 2022

thiagocrepaldi force-pushed the main branch from c92024a to 063f7a1 Compare April 12, 2022 18:25

thiagocrepaldi changed the title ~~Fix ONNX-only export support~~ Add support for ONNX-only export Apr 12, 2022

thiagocrepaldi marked this pull request as ready for review April 12, 2022 19:45

thiagocrepaldi mentioned this pull request Apr 12, 2022

Unable to export PointRend model (from detectron2) to ONNX pytorch/pytorch#69674

Closed

garymm suggested changes Apr 12, 2022

View reviewed changes

tests/test_export_onnx.py Outdated Show resolved Hide resolved

tests/test_export_onnx.py Outdated Show resolved Hide resolved

thiagocrepaldi marked this pull request as draft April 13, 2022 00:47

thiagocrepaldi force-pushed the main branch from 422ab1c to 00a656d Compare April 26, 2022 20:17

thiagocrepaldi mentioned this pull request Apr 27, 2022

Export aten::to("cpu") and aten::to(device="cpu") pytorch/pytorch#76498

Closed

thiagocrepaldi closed this Apr 28, 2022

thiagocrepaldi reopened this Apr 28, 2022

thiagocrepaldi mentioned this pull request Apr 28, 2022

Fix Caffe2 ONNX export #4153

Closed

thiagocrepaldi force-pushed the main branch from 037dab6 to 3e5f70c Compare April 28, 2022 15:46

thiagocrepaldi marked this pull request as ready for review April 28, 2022 18:45

garymm suggested changes Apr 28, 2022

View reviewed changes

thiagocrepaldi force-pushed the main branch 4 times, most recently from 6087718 to 595eaef Compare April 28, 2022 20:39

Thiago Crepaldi added 4 commits May 27, 2022 10:16

ppwwyyxx reviewed May 27, 2022

View reviewed changes

ppwwyyxx suggested changes May 27, 2022

View reviewed changes

Conversation

thiagocrepaldi commented Apr 1, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary of changes

Uh oh!

Uh oh!

Uh oh!

FrancescoMandru commented Apr 15, 2022

Uh oh!

thiagocrepaldi commented Apr 15, 2022

Uh oh!

FrancescoMandru commented Apr 15, 2022

Uh oh!

FrancescoMandru commented Apr 26, 2022

Uh oh!

thiagocrepaldi commented Apr 26, 2022

Uh oh!

thiagocrepaldi commented Apr 27, 2022

Uh oh!

thiagocrepaldi commented Apr 27, 2022

Uh oh!

thiagocrepaldi commented Apr 27, 2022

Uh oh!

thiagocrepaldi commented Apr 28, 2022

Uh oh!

garymm left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

Uh oh!

thiagocrepaldi commented May 27, 2022

Uh oh!

FrancescoMandru commented May 27, 2022

Uh oh!

facebook-github-bot commented May 27, 2022

thiagocrepaldi commented Apr 1, 2022 •

edited

Loading

ppwwyyxx left a comment •

edited

Loading