Skip to content

[DPER] Introduce barrier operation to force synchronization of threads in async execution#49322

Closed
kennyhorror wants to merge 1 commit intopytorch:masterfrom
kennyhorror:export-D24933471
Closed

[DPER] Introduce barrier operation to force synchronization of threads in async execution#49322
kennyhorror wants to merge 1 commit intopytorch:masterfrom
kennyhorror:export-D24933471

Conversation

@kennyhorror
Copy link
Copy Markdown
Contributor

Summary:
In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.

This operator allows to address these issues by introducing extra explicit dependencies between ops.

Test Plan:
Unit-test/
E2E testing in the future diffs.

Reviewed By: xianjiec

Differential Revision: D24933471

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D24933471

…s in async execution (pytorch#49322)

Summary:
Pull Request resolved: pytorch#49322

In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.

This operator allows to address these issues by introducing extra explicit dependencies between ops.

Test Plan:
Unit-test/
E2E testing in the future diffs.

Reviewed By: xianjiec

Differential Revision: D24933471

fbshipit-source-id: 18e29c0899a97183115339528dc5c3c8b090205a
@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request was exported from Phabricator. Differential Revision: D24933471

@codecov
Copy link
Copy Markdown

codecov bot commented Dec 15, 2020

Codecov Report

Merging #49322 (c6bb365) into master (5a5e576) will increase coverage by 0.00%.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master   #49322   +/-   ##
=======================================
  Coverage   80.56%   80.56%           
=======================================
  Files        1875     1875           
  Lines      202701   202701           
=======================================
+ Hits       163307   163309    +2     
+ Misses      39394    39392    -2     

@facebook-github-bot
Copy link
Copy Markdown
Contributor

This pull request has been merged in 46debe7.

hwangdeyu pushed a commit to hwangdeyu/pytorch that referenced this pull request Jan 6, 2021
…s in async execution (pytorch#49322)

Summary:
Pull Request resolved: pytorch#49322

In some cases async execution might loose dependencies (Alias like ops) or produce suboptimal scheduling when there is an option which parts to schedule first. Example of the later behavior can happen in ModelParallel training where copy can get lower priority compared to the rest of the execution on the given GPU, which will caused other GPUs to starve.

This operator allows to address these issues by introducing extra explicit dependencies between ops.

Test Plan:
Unit-test/
E2E testing in the future diffs.

Reviewed By: xianjiec

Differential Revision: D24933471

fbshipit-source-id: 1668994c7856d73926cde022378a99e1e8db3567
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants