Adding maskrcnn model from torchvision by xuzhao9 · Pull Request #547 · pytorch/benchmark

xuzhao9 · 2021-11-08T19:40:06Z

Add maskrcnn model from torchvision.
This is the first step towards adding dynamic shapes support for maskrcnn.

torchbenchmark/models/maskrcnn/__init__.py

xuzhao9 · 2021-11-09T19:34:12Z

The eval GPU utilization looks terrible:

There is a big gap in train as well:

Looking into it.

xuzhao9 · 2021-11-11T00:10:24Z

Changed to using COCO 2017 data, the utilization for evaluation improves but still looks not very good:

Train utilization looks much better, but still there is a gap at the beginning (could be caused by data-copying):

xuzhao9 · 2021-11-11T00:31:09Z

After adding data prefetch, the gap in the front is gone but the gap at the end still exists for eval:

At the same time, train profiling looks terrific:

Krovatkin

a few qqs to understand this PR a bit better.

torchbenchmark/models/maskrcnn/__init__.py

Krovatkin · 2021-11-11T06:51:48Z

torchbenchmark/models/maskrcnn/__init__.py

+        lr = 0.02
+        momentum = 0.9
+        weight_decay = 1e-4
+        params = [p for p in self.model.parameters() if p.requires_grad]


out of curiosity, is this some kind of an optimization? wouldn't autograd ignore parameters don't require gradients?

@Krovatkin I borrowed these default values from torchvision reference code: https://github.com/pytorch/vision/blob/main/references/detection/train.py#L77. These parameters seems to be default to torchvision maskrcnn training.

I think we don't need this anymore, including in torchvision.

Test failed when removing these parameters(https://app.circleci.com/pipelines/github/pytorch/benchmark/2710/workflows/26c92ded-69b3-4cdc-8c35-3cc4ab5f8e13/jobs/2793), ValueError: parameter group didn't specify a value of required optimization parameter lr so they are still needed.

fmassa · 2021-11-11T17:57:59Z

Hi @xuzhao9

I believe the slowdown during inference of mask r-cnn is due to this function

Indeed, we perform many small operations in a for loop, which is not ideal for maximizing GPU utilization.
In particular, I believe this line is the main culprit.

I think we can further optimize this function at the Python-level by leveraging a single call to grid_sample, but it would require rewriting the function. I believe detectron2 contains a more optimized version of this function in here

Also, ccing @datumbox as this could be a good task to optimize the torchvision function

eircfb · 2021-11-11T18:00:08Z

After adding data prefetch, the gap in the front is gone but the gap at the end still exists for eval:

At the same time, train profiling looks terrific:

Question for model authors:

This still doesnt' look great. Is this how the model is used in nature?

What can be done here to improve utilization? Is this from a lack of prefetch from the second model? Its no good to have that idle section at the end.

[edit: the whole inference looks right on the edge of underutilized, do we need a different way to run this model?]

eircfb · 2021-11-11T18:04:12Z

Hi @xuzhao9

I believe the slowdown during inference of mask r-cnn is due to this function

Indeed, we perform many small operations in a for loop, which is not ideal for maximizing GPU utilization. In particular, I believe this line is the main culprit.

I think we can further optimize this function at the Python-level by leveraging a single call to grid_sample, but it would require rewriting the function. I believe detectron2 contains a more optimized version of this function in here

Also, ccing @datumbox as this could be a good task to optimize the torchvision function

Right, sending off all these tiny shaders against independent data. Thats a perf error, causes dispatch to dominate. Can maybe be patched up by using cuda graphs, but really not good.

Is this really how the model runs in nature?

fmassa · 2021-11-11T18:04:50Z

@eircfb I believe I've answered your questions in my answer just before, let me know if you have further questions. It's the post-processing steps that are the culprit in here, and they can definitely be optimized (without having to resort to custom CUDA kernels)

This is normally how we use the model, yes.

eircfb · 2021-11-11T18:08:06Z

@eircfb I believe I've answered your questions in my answer just before, let me know if you have further questions. It's the post-processing steps that are the culprit in here, and they can definitely be optimized (without having to resort to custom CUDA kernels)

This is normally how we use the model, yes.

Right, we posted at same time. Isn't that loop of tiny dispatches just for the first section? What causes the low utilization towards the end of the trace?

fmassa · 2021-11-12T13:05:30Z

@eircfb I believe the section I pointed out should correspond to the last part of the trace (although to be 100% sure I would need to check the full trace with further info, @xuzhao9 can you send it to me?)

For the first section with slowdown, I would need to check more carefully but my first guess would be that it comes from this section https://github.com/pytorch/vision/blob/0fa747eea4569f349fb0dacab44604b7b11c1a74/torchvision/models/detection/rpn.py#L253-L274

Maybe the slowdown could be attributed to this function being called instead of this one, but again, I would need to see the full trace to be sure

xuzhao9 · 2021-11-12T23:04:34Z

@eircfb I believe the section I pointed out should correspond to the last part of the trace (although to be 100% sure I would need to check the full trace with further info, @xuzhao9 can you send it to me?)

For the first section with slowdown, I would need to check more carefully but my first guess would be that it comes from this section https://github.com/pytorch/vision/blob/0fa747eea4569f349fb0dacab44604b7b11c1a74/torchvision/models/detection/rpn.py#L253-L274

Maybe the slowdown could be attributed to this function being called instead of this one, but again, I would need to see the full trace to be sure

Here is the full trace of eval (bs==4, with prefetch): pytorch-benchmark-dev-0_1709.1636590532626.pt.trace.json.zip

facebook-github-bot · 2021-11-15T19:46:26Z

@xuzhao9 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

fmassa · 2021-11-16T17:16:00Z

@xuzhao9 after looking at the full trace, my comments from before seems to be accurate.

The first slow-down is due to this part of the implementation. I have a gut-feeling that this might be due to bad hyperparameters in here, so if you could try always dispatching to this codepath it would give hints if we can easily fix this.

For the second (and biggest slowdown), the problem is what I mentioned before on the mask pasting in images, and would require re-writing the function in a better way

xuzhao9 · 2021-11-16T20:28:58Z

@xuzhao9 after looking at the full trace, my comments from before seems to be accurate.

The first slow-down is due to this part of the implementation. I have a gut-feeling that this might be due to bad hyperparameters in here, so if you could try always dispatching to this codepath it would give hints if we can easily fix this.

For the second (and biggest slowdown), the problem is what I mentioned before on the mask pasting in images, and would require re-writing the function in a better way

Thank you @fmassa! I think your explanations make sense (though I am not an expert). I think the code is in decent shape now and we are confident it represents common use cases. We can use it as a baseline to help further improve the model quality.

Please help review and stamp it so that we can proceed on developing dynamic shapes.

Thanks!

fmassa

Thanks!

I've made a few comments that could simplify the implementation (by removing unused functions), and also to mention that both torchscript as well as CPU are supported.

Otherwise the rest LGTM so I'm approving this as it can be merged as is

torchbenchmark/models/maskrcnn/__init__.py

torchbenchmark/models/maskrcnn/transforms.py

torchbenchmark/models/maskrcnn/__init__.py

facebook-github-bot · 2021-11-18T02:26:56Z

@xuzhao9 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

fmassa · 2021-11-18T08:35:54Z

torchbenchmark/models/vision_maskrcnn/__init__.py

+        if not self.device == "cuda":
+            return NotImplementedError("CPU is not supported by this model")


For my understanding, did you face any errors when running the code on the CPU?

The CPU test works fine. But the JIT run seems to be failed: https://app.circleci.com/pipelines/github/pytorch/benchmark/2729/workflows/474cf2b6-f2d5-4a13-b08d-7f634886d475/jobs/2812.

Note that during scripting, the model always returns (Losses, Detections) (as pointed out in the error message).

So you would need to change the code in the training to always return losses, detection = model(inputs), and go from there.

facebook-github-bot · 2021-11-22T18:24:51Z

@xuzhao9 has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-11-22T19:56:00Z

@xuzhao9 merged this pull request in 39b5b6d.

jdsgomes · 2021-12-10T22:37:40Z

I've made a few comments that could simplify the implementation (by removing unused functions), and also to mention that both torchscript as well as CPU are supported.

I re-ran the profiling with the code changes from this PR and it indeed it improves the overall figure (visible in the middle part where we are running nms).

After changes

Before changes

facebook-github-bot added the cla signed label Nov 8, 2021

xuzhao9 commented Nov 8, 2021

View reviewed changes

torchbenchmark/models/maskrcnn/__init__.py Outdated Show resolved Hide resolved

xuzhao9 changed the title ~~[WIP] Adding maskrcnn model from torchvision~~ Adding maskrcnn model from torchvision Nov 9, 2021

xuzhao9 requested a review from Krovatkin November 9, 2021 18:35

xuzhao9 requested review from aaronenyeshi and eircfb November 9, 2021 19:35

xuzhao9 changed the title ~~Adding maskrcnn model from torchvision~~ [WIP] Adding maskrcnn model from torchvision Nov 11, 2021

Krovatkin reviewed Nov 11, 2021

View reviewed changes

xuzhao9 requested a review from Krovatkin November 11, 2021 16:46

xuzhao9 force-pushed the xz9/add-maskrcnn-cudagraph branch from 407ab50 to deadddc Compare November 15, 2021 19:12

xuzhao9 changed the title ~~[WIP] Adding maskrcnn model from torchvision~~ Adding maskrcnn model from torchvision Nov 15, 2021

xuzhao9 requested a review from fmassa November 16, 2021 01:25

fmassa approved these changes Nov 17, 2021

View reviewed changes

xuzhao9 force-pushed the xz9/add-maskrcnn-cudagraph branch from 9d43a62 to 69d3dcb Compare November 17, 2021 21:02

fmassa reviewed Nov 18, 2021

View reviewed changes

Just started working on maskrcnn

c309395

xuzhao9 added 21 commits November 22, 2021 13:23

Copy data to GPU.

1af8d7a

Added GPU computation.

9bcc41d

Fix a small bug.

ac290c6

Remove yacs requirements.

5ae2357

Try prefetch the data.

5b9f3d8

Fix device

66a860a

Changed default bs and add prefetch

3ccffee

Add back repeated runs py.

87d0e09

Fix get_modules().

02298d1

Created metadata file.

0951fb3

Remove unnecessary optimizer parameters.

daf2bc3

Revert the optimizer parameters.

d334b11

Adds JIT.

e11b3b7

Removed transforms code.

da6aea6

Rename to vision_maskrcnn

6abc3a0

Fix imports.

06375db

Fix install requirements.txt

7e9d006

Fix an import issue.

5605f2e

Fix another issue.

08147cb

Enable CPU tests.

48d3eab

Disable JIT.

15fb801

xuzhao9 force-pushed the xz9/add-maskrcnn-cudagraph branch from 89ef243 to 15fb801 Compare November 22, 2021 18:24

facebook-github-bot closed this in 39b5b6d Nov 22, 2021

facebook-github-bot added the Merged label Nov 22, 2021

xuzhao9 deleted the xz9/add-maskrcnn-cudagraph branch December 10, 2021 00:42

xuzhao9 mentioned this pull request Dec 10, 2021

Add torchvision maskrcnn (enable jit, dynamic shapes) #137

Closed

		if not self.device == "cuda":
		return NotImplementedError("CPU is not supported by this model")

Conversation

xuzhao9 commented Nov 8, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

xuzhao9 commented Nov 9, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

xuzhao9 commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Krovatkin left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Krovatkin Nov 11, 2021

Choose a reason for hiding this comment

Uh oh!

xuzhao9 Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa Nov 11, 2021

Choose a reason for hiding this comment

Uh oh!

xuzhao9 Nov 16, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa commented Nov 11, 2021

Uh oh!

eircfb commented Nov 11, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

eircfb commented Nov 11, 2021

Uh oh!

fmassa commented Nov 11, 2021

Uh oh!

eircfb commented Nov 11, 2021

Uh oh!

fmassa commented Nov 12, 2021

Uh oh!

xuzhao9 commented Nov 12, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

facebook-github-bot commented Nov 15, 2021

Uh oh!

fmassa commented Nov 16, 2021

Uh oh!

xuzhao9 commented Nov 16, 2021

Uh oh!

fmassa left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

facebook-github-bot commented Nov 18, 2021

Uh oh!

fmassa Nov 18, 2021

Choose a reason for hiding this comment

Uh oh!

xuzhao9 Nov 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

fmassa Nov 24, 2021

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Nov 22, 2021

Uh oh!

facebook-github-bot commented Nov 22, 2021

Uh oh!

jdsgomes commented Dec 10, 2021

Uh oh!

Reviewers

xuzhao9 commented Nov 8, 2021 •

edited

Loading

xuzhao9 commented Nov 9, 2021 •

edited

Loading

xuzhao9 commented Nov 11, 2021 •

edited

Loading

xuzhao9 commented Nov 11, 2021 •

edited

Loading

xuzhao9 Nov 11, 2021 •

edited

Loading

xuzhao9 Nov 16, 2021 •

edited

Loading

eircfb commented Nov 11, 2021 •

edited

Loading

xuzhao9 commented Nov 12, 2021 •

edited

Loading

xuzhao9 Nov 18, 2021 •

edited

Loading