-
Notifications
You must be signed in to change notification settings - Fork 26.3k
PyTorch ThroughputBenchmark #20766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PyTorch ThroughputBenchmark #20766
Conversation
soumith
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be a separate binary / utility, outside of csrc -- from what I see it doesn't need to hook into any of the internals in any way.
Probably going into the benchmark folder as a separate utility?
|
pytorch/benchmarks feels like indeed a better place. You can make it a separate binary/library and link against libtorch.so The reasoning is that it's more of a scaffolding around pytorch that doesn't require changes in the core (like e.g. autograd profiler). So putting it separately is cleaner. Even if we were put it in the same build, csrc is probably the wrong place to put it. cc @gchanan for input too. |
|
@dzhulgakov , @soumith , my motivation is that I would like any researcher easily see perf of their model in an inference like setup. It is not specific to a model, it is a generic benchmarking utility. So yes, I would like it to be part of the same build. Then one doesn't need to recompile anything after they developed a model and can just call this utility. How I see it, it should work this way both in OSS and internally. No issues pytorch/benchmarks folder specifically, but would it work the way I described above? |
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm, if we want to expose it in the main build, I guess it doesn't make sense to build it as a separate python extension.
The question becomes whether it cuts the bar for general usefulness (similarly to autograd profiler) - @soumith @gchanan . But it that case we probably want to put it into some sub-namespace (torch.profiler? torch.bechmark? autograd.profiler name is a bit unfortunate)
|
Putting things into a different python namespace makes sense to me, putting things into jit was rather to cause this discussion and figure out what is the proper place :) @soumith , @zdevito , @gchanan , given the goal I described above (of making this easily available to any researcher / ml practitioner so they can test their model throughput without additional builds), should I just go ahead and create a new python namespace like torch.benchmark? I assume, in this case, the c++ code stays in the same place, but please let me know if there is a better place for it. autograd.profiler should likely also live there :) |
|
We should probably provide a torch.profiler package as part of the main build or in an easily installable package similar to torchvision. First, this is where autograd profiler should live. It has long since become more generic than just profiling autograd. We can make torch.autograd.profiler alias torch.profiler for now. Second, a generic harness for timing the throughput and latency of a model until appropriate multi-core load is a useful enough tool that it seems to make sense going into torch.profiler. Note that this will enforce a higher barrier of quality on it than if it were for our own use only, but that is probably a good thing because we to make it easier to accurately benchmark our models anyway. It can serve as a way to capture best practices for our external users. Example: JIT requires warm up (at least one iteration). I don't expect users to know this, so if they write their own benchmark harness then they often report slower numbers. |
zdevito
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems like a good first pass. If this is going to go into PyTorch proper and have python bindings, it needs to work with nn.Module not just ScriptModules.
|
There is precedence (and good design) for this. Take a look at It was a tool that was aimed to tell users in a one-touch way what part of their code is a bottleneck -- without the user understanding how to use the profiler, or plug profiler code deeper into their codebase. |
|
similarly, we can probably aim for |
|
the other thing I can think of that the API needs to be improved upon is beyond CPU throughput. |
e10aa56 to
137be71
Compare
|
So here is a new version. I think, I addressed all the API questions except the following:
I also didn't move the module location (it is currently under torch.jit for no reason), I intend to move it to torch.profiler as soon as we agree on this step for sure :) Update summary:
|
dzhulgakov
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks nice, I'd defer to @gchanan on naming. My guess would be torch.profiler.ThroughputBenchmark
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you could probably cast to something like duration<std::chrono::milliseconds, float> but I'm not sure how exactly it works
|
spoke with @zdevito , going to add support for nn.Module in order to be more generic. |
|
the final location should be in I also think |
137be71 to
7cacabd
Compare
|
@dzhulgakov , sorry for the churn, I addressed XQ's feedback on the internal diff wrt CVs usage, the python module part needs some update. |
7cacabd to
b64057e
Compare
f815e7a to
69077d2
Compare
|
@bddppq pointed me to https://pybind11.readthedocs.io/en/stable/faq.html#someclass-declared-with-greater-visibility-than-the-type-of-its-field-someclass-member-wattributes - so it seems that pybind11 has this issue. You can either declare your classes explicitly as |
69077d2 to
f8871f0
Compare
f8871f0 to
c8df563
Compare
c8df563 to
0630532
Compare
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are some things that we should clean up and improve before this is merged. There's one place that might be running without the GIL which is unsafe, cloneInputs doesn't clone inputs, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you plan to add things later, add them later. There's no need to have dead code in the codebase. It's not like adding this field will break BC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: prefer using instead of typedef
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@salexspb is landing this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
|
Hey @apaszke , I am going to kindly ask follow up offline with me about this PR to make sure we don't have too much back and forth here while I consider your last set of comments as non blocking with all the respect to you. Kindly note that this PR was standing out for quite some time. I am going to respectfully land this for now and if you would be able to show me some draw backs of the current approach, I will iterate over it later. I think I addressed the feedback from 4 persons on this PR already who have been kindly spending a lot of time iterating over this PR for several weeks while there was no single comment from you here before. In my view addressing comments from 4 reviewers who have been highly involved on this PR it is sufficient for merging. The only reason I didn't merge this yet is annoying visibility issues which I now fixed thanks to @dzhulgakov 's and @bddppq 's help. I am also going to be in NYC next week, not sure where you are based, but if you are there, happy to chat in person! We are also going to have some sessions there specifically designed to address extra back and forth on diffs which was a BIG pain point for people in AI Dev Platform. This PR is included in the list! :) Please also note that I went into quite a bit of compromises here already. In my view having support for nn.Module is not required for merging this, though I compromised on this and added it in the first version which complicated the things quite a bit and slowed this down. |
0630532 to
b43a5f2
Compare
b43a5f2 to
99cda35
Compare
Summary: This is useful for measuring inference performance of your models. This is a very basic benchmark for now. We don't support batching on the benchmark side, no inter and intra op parallelizm is supported yet, just caller based parallelizm. Main phylosophy here is that user should be able to provide inputs from python and just stack them within the benchmark. API should be exactly the same as passing inputs to module.forward. Pull Request resolved: pytorch#20766 Test Plan: Added a new unit test Differential Revision: D15435461 Pulled By: salexspb fbshipit-source-id: e8759c417faf6c52807d5a685eed48e023cd40e9
99cda35 to
1cf0505
Compare
|
@salexspb this PR appears to have broken windows builds on master :( |
|
fyi, unlanding because it broke windows build. |
|
Apparently I might've missed the error due to it being in "full logs" only. Will try to resend tonight |
|
revert of the revert: #22185 |
Summary:
This is useful for measuring inference performance of your
models. This is a very basic benchmark for now. We don't support
batching on the benchmark side, no inter and intra op parallelizm is
supported yet, just caller based parallelizm.
Main phylosophy here is that user should be able to provide inputs
from python and just stack them within the benchmark. API should be
exactly the same as passing inputs to module.forward.
Differential Revision: D15435461