Workaround performance bug / memory leak in GOMP #32875

peterbell10 · 2020-01-31T20:11:35Z

Fixes #32008

This is similar to @CaoZhongZ's patch which runs on all OpenMP threads in the team and selectively exits early to scale the number of threads active. I have also restored the if clause from before #26963 so that running on 1 thread should still avoid additional synchronisation.

One comment is that this does slightly change the meaning of at::get_num_threads inside of a parallel_for loop since it's not guaranteed that the function was called on that many threads. I've looked at the uses within ATen and couldn't see anything that would be problematic. There are a few places in quantized that seem to make this assumption but they always use a grain size of 1 so should be safe:

pytorch/aten/src/ATen/native/quantized/cpu/qconv.cpp

Lines 436 to 437 in d9e99ab

    
           const int num_tasks = at::get_num_threads(); 
        
           at::parallel_for(0, num_tasks, 1, [&](int64_t begin, int64_t end) {

ezyang · 2020-01-31T23:20:21Z

@VitalyFedyunin do you think you could take a look? I added some other folks who might be interested too.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

ilia-cher

could you add a comment for at::get_num_threads in ATen/Parallel.h
noting the behavior there? alternatively we could probably change get_num_threads to get_max_threads to emphasize the upper bound

peterbell10 · 2020-02-07T17:27:18Z

@ilia-cher I've added the comment but noticed that at::get_num_threads actually calls omp_get_max_threads so was never guarunteed to be the actual number of thread launched anyway.

dr-ci · 2020-02-07T17:33:57Z

💊 CircleCI build failures summary and remediations

As of commit f8fc17a:

Commit f8fc17a was recently pushed. Waiting for builds...

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker.

This comment has been revised 2 times.

facebook-github-bot

@VitalyFedyunin is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@VitalyFedyunin has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2020-02-13T23:47:58Z

@VitalyFedyunin merged this pull request in 0808485.

Summary: Fixes pytorch#32008 This is similar to CaoZhongZ's patch which runs on all OpenMP threads in the team and selectively exits early to scale the number of threads active. I have also restored the `if` clause from before pytorch#26963 so that running on 1 thread should still avoid additional synchronisation. One comment is that this does slightly change the meaning of `at::get_num_threads` inside of a `parallel_for` loop since it's not guaranteed that the function was called on that many threads. I've looked at the uses within ATen and couldn't see anything that would be problematic. There are a few places in `quantized` that seem to make this assumption but they always use a grain size of 1 so should be safe: https://github.com/pytorch/pytorch/blob/d9e99ab544cceaf346605db1af4a862197a107cd/aten/src/ATen/native/quantized/cpu/qconv.cpp#L436-L437 Pull Request resolved: pytorch#32875 Differential Revision: D19775823 Pulled By: VitalyFedyunin fbshipit-source-id: 4f843b78cdb9e2766339590d728923786a00af6d

imaginary-person · 2021-02-27T01:41:27Z

aten/src/ATen/ParallelOpenMP.h

+    }
+
    int64_t tid = omp_get_thread_num();
-    int64_t chunk_size = divup((end - begin), omp_get_num_threads());


@peterbell10, this line in the original code was incorrect.
Since the work is to be distributed among num_threads threads, This line should have been
int64_t chunk_size = divup((end - begin), num_threads);

That's why @CaoZhongZ's workaround patch also delivered correct results.

omp_get_num_threads() corresponds to the number of threads in the active parallel region. See these docs:
https://gcc.gnu.org/onlinedocs/libgomp/omp_005fget_005fnum_005fthreads.html

Specifically note the reference to NUM_THREADS clause:

At runtime, the size of the current team may be set either by the NUM_THREADS clause or by omp_set_num_threads.

Oh, thanks a lot! I'm learning OpenMP & this is helping a lot. I mistook it for omp_get_max_threads.

peterbell10 requested a review from ezyang January 31, 2020 20:11

peterbell10 added the open source label Jan 31, 2020

ezyang requested review from VitalyFedyunin, ilia-cher and ngimel January 31, 2020 23:19

ezyang added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Feb 3, 2020

facebook-github-bot reviewed Feb 6, 2020

View reviewed changes

VitalyFedyunin approved these changes Feb 6, 2020

View reviewed changes

ilia-cher reviewed Feb 6, 2020

View reviewed changes

peterbell10 force-pushed the openmp-num-threads branch from 0c7397d to e050d85 Compare February 7, 2020 17:28

peterbell10 added 2 commits February 7, 2020 17:49

Workaround performance bug / memory leak in GOMP

d898f36

Update get_num_threads comment to account for dynamic thread scaling

f8fc17a

peterbell10 force-pushed the openmp-num-threads branch from e050d85 to f8fc17a Compare February 7, 2020 17:49

facebook-github-bot reviewed Feb 11, 2020

View reviewed changes

ngimel mentioned this pull request Feb 13, 2020

Increasing memory usage on CPU #30388

Open

facebook-github-bot reviewed Feb 13, 2020

View reviewed changes

facebook-github-bot closed this in 0808485 Feb 13, 2020

facebook-github-bot added the merged label Feb 13, 2020

mattip mentioned this pull request Apr 13, 2020

Fix for num_threads==1 in OpenMP "parallel for" #36479

Closed

mruberry added the Merged label Oct 28, 2020

imaginary-person mentioned this pull request Feb 25, 2021

torch.eye(d) is slow and hogs cpu for d >= 182 #48251

Open

imaginary-person reviewed Feb 27, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Workaround performance bug / memory leak in GOMP #32875

Workaround performance bug / memory leak in GOMP #32875

Uh oh!

peterbell10 commented Jan 31, 2020

Uh oh!

ezyang commented Jan 31, 2020

Uh oh!

facebook-github-bot left a comment

Uh oh!

ilia-cher left a comment

Uh oh!

peterbell10 commented Feb 7, 2020

Uh oh!

dr-ci bot commented Feb 7, 2020 •

edited

Loading

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot left a comment

Uh oh!

facebook-github-bot commented Feb 13, 2020

Uh oh!

imaginary-person Feb 27, 2021 •

edited

Loading

Uh oh!

peterbell10 Feb 27, 2021

Uh oh!

imaginary-person Feb 27, 2021 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

	const int num_tasks = at::get_num_threads();
	at::parallel_for(0, num_tasks, 1, [&](int64_t begin, int64_t end) {

Workaround performance bug / memory leak in GOMP #32875

Workaround performance bug / memory leak in GOMP #32875

Uh oh!

Conversation

peterbell10 commented Jan 31, 2020

Uh oh!

ezyang commented Jan 31, 2020

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

ilia-cher left a comment

Choose a reason for hiding this comment

Uh oh!

peterbell10 commented Feb 7, 2020

Uh oh!

dr-ci bot commented Feb 7, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CircleCI build failures summary and remediations

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot left a comment

Choose a reason for hiding this comment

Uh oh!

facebook-github-bot commented Feb 13, 2020

Uh oh!

imaginary-person Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

peterbell10 Feb 27, 2021

Choose a reason for hiding this comment

Uh oh!

imaginary-person Feb 27, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

dr-ci bot commented Feb 7, 2020 •

edited

Loading

imaginary-person Feb 27, 2021 •

edited

Loading

imaginary-person Feb 27, 2021 •

edited

Loading