Skip to content

Conversation

@masnesral
Copy link
Contributor

@masnesral masnesral commented May 21, 2024

Stack from ghstack (oldest at bottom):

Summary:
Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl:
#124682
#122941

This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable.

cc @voznesenskym @penguinwu @EikanWang @jgong5 @Guobing-Chen @XiaobingSuper @zhuhaozhe @blzheng @wenzhe-nrv @jiayisunx @peterbell10 @ipiszy @yf225 @chenyang78 @kadeng @muchulee8 @ColinPeppler @amjames @desertfire @chauhang

Summary:
Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl:
#124682
#122941

This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable.

[ghstack-poisoned]
@pytorch-bot
Copy link

pytorch-bot bot commented May 21, 2024

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/126816

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (3 Unrelated Failures)

As of commit ecde0bf with merge base 0ff2f8b (image):

UNSTABLE - The following jobs failed but were likely due to flakiness present on trunk and has been marked as unstable:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@masnesral masnesral requested review from eellison and jansel May 21, 2024 22:23
if (
os.environ.get("TORCH_TNT_IN_USE", "0") == "1"
or os.environ.get("TORCH_WARM_POOL", "1") != "1"
# TODO(masnesral): warm pool is broken, need to fix it
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jansel, I commandeered this TODO you had originally, but I'm not sure there's anything to actually do here? We're calling _warm_process_pool() on the ProcessPoolExecutor owned by the subproc in SubprocMain.init(). Or did you have some other kind of warmup?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting errors with this turned on. Does it pass for you?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm ok. It's passing for me, but looks like something weird is happening -- extra subprocesses being spawned, probably from the subprocess itself.

if (
os.environ.get("TORCH_TNT_IN_USE", "0") == "1"
or os.environ.get("TORCH_WARM_POOL", "1") != "1"
# TODO(masnesral): warm pool is broken, need to fix it
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting errors with this turned on. Does it pass for you?

Summary:
Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl:
#124682
#122941

This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
import sys
import typing

from torch._inductor.codecache import _set_triton_ptxas_path, caching_device_properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_set_triton_ptxas_path isn't defined in torch._inductor.codecache, do you mean the one defined in compile_tasks.py?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm; I guess it works because codecache.py imports that function. But yeah, will fix

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

python torch/_inductor/compile_worker/main.py doesn't seem to run for me if you just call it directly, not sure if there's some special way to call it so it passes today

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm perplexed. Seems to work fine for me:

$ python torch/_inductor/compile_worker/__main__.py --help
usage: __main__.py [-h] [--workers WORKERS] [--parent PARENT]

options:
  -h, --help         show this help message and exit
  --workers WORKERS
  --parent PARENT

(but making change anyway)

# Wrapper around ProcessPoolExecutor forks in a new process we control
pool = SubprocPool(config.compile_threads)
else:
# ensure properties have been calculated before processes
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: assert that the worker_start_method is in the other possible start_methods

Comment on lines +229 to +237
# Examples:
# A simple x + x + x script: 10ms seconds in the middle of the program, 2ms at startup
# tf_efficientnet_b0 benchmark: 50ms! in the middle of the program , 3ms at startup

# So we want to start the workers early when it is still cheap, and also to allow the workers to get
# ready before we have work for them.

# ProcessPoolExecutor also does not launch the workers until it finds a point when all the workers are idle.
# But if we waited until then fork time will be long and we will be waiting for the processes to initialize.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are these comments still relevant ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lemme add a TODO and I'll benchmark

Summary:
Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl:
#124682
#122941

This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable.

cc voznesenskym penguinwu EikanWang jgong5 Guobing-Chen XiaobingSuper zhuhaozhe blzheng wenzhe-nrv jiayisunx peterbell10 ipiszy yf225 chenyang78 kadeng muchulee8 ColinPeppler amjames desertfire chauhang

[ghstack-poisoned]
@masnesral
Copy link
Contributor Author

@pytorchbot merge

@pytorch-bot pytorch-bot bot added the ciflow/trunk Trigger trunk jobs on your pull request label May 29, 2024
@pytorchmergebot
Copy link
Collaborator

Merge started

Your change will be merged once all checks pass (ETA 0-4 Hours).

Learn more about merging in the wiki.

Questions? Feedback? Please reach out to the PyTorch DevX Team

Advanced Debugging
Check the merge workflow status
here

Aidyn-A pushed a commit to tinglvv/pytorch that referenced this pull request May 30, 2024
Summary:
Adds a "safe" parallel compile implementation that a) Popens a sub-process with an entry point we control, and b) Uses a ProcessPoolExecutor in that sub-processes to perform parallel compiles. This change essentially squashes these two implementations from jansel, but removes the "thread-based" approach since benchmarking revealed that compile-time performance was poor compared to the existing impl:
pytorch#124682
pytorch#122941

This PR adds the implementation, but defaults to the existing "fork". I'll submit a separate change to enable.

Pull Request resolved: pytorch#126816
Approved by: https://github.com/jansel
@PaliC
Copy link
Contributor

PaliC commented May 31, 2024

@pytorchbot revert -m "causing internal regression" -c "ghfirst"

@pytorchmergebot
Copy link
Collaborator

@pytorchbot successfully started a revert job. Check the current status here.
Questions? Feedback? Please reach out to the PyTorch DevX Team

@pytorchmergebot
Copy link
Collaborator

Reverting PR 126816 failed

Reason: Command git -C /home/runner/work/pytorch/pytorch revert --no-edit 6c81856dcaab477e748aeda4c575b89f9e4dc4dd returned non-zero exit code 1

Auto-merging torch/_inductor/codecache.py
CONFLICT (content): Merge conflict in torch/_inductor/codecache.py
CONFLICT (modify/delete): torch/_inductor/compile_worker/__main__.py deleted in parent of 6c81856dcaa ([inductor] Add a subprocess-based parallel compile (#126816)) and modified in HEAD.  Version HEAD of torch/_inductor/compile_worker/__main__.py left in tree.
error: could not revert 6c81856dcaa... [inductor] Add a subprocess-based parallel compile (#126816)
hint: After resolving the conflicts, mark them with
hint: "git add/rm <pathspec>", then run
hint: "git revert --continue".
hint: You can instead skip this commit with "git revert --skip".
hint: To abort and get back to the state before "git revert",
hint: run "git revert --abort".
hint: Disable this message with "git config advice.mergeConflict false"
Details for Dev Infra team Raised by workflow job

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants