-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Lightweight at-most-once logging for API usage #20698
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
facebook-github-bot
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
| * | ||
| * In order to ensure light-weightness of logging, we utilize static variable | ||
| * trick - LogAPIUsage will be invoked only once and further invocations will | ||
| * just do an atomic check. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an atomic check on for example a hot-loop, wouldn't that add a few hundred nano-seconds per TensorImpl usage?
Is the logging only enabled if the env variable is set, and the atomic check skippable if it's not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The logging will be enabled all the time in e.g. FB environment. The envvar thing is just for testing in default implementation.
I've tried to measure overhead and if I don't wipe out cache (it's hard to measure with cold cache as the overhead is so tiny), it comes down to something like 0.3ns (nanoseconds) so it's definitely fine. The static var initialization trick is used widely (including for singletons), so it should be cheap to sprinkle around - https://stackoverflow.com/questions/23829389/does-a-function-local-static-variable-automatically-incur-a-branch
Summary: Resubmit #20698 which got messed up. Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl. Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed. Pull Request resolved: #20745 Differential Revision: D15429196 Pulled By: dzhulgakov fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
apaszke
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this log-once thing of any use? How were the places where we add the log selected? It all seems very arbitrary to me and only contributes to the messiness of our codebase.
| batch_sampler=None, num_workers=0, collate_fn=default_collate, | ||
| pin_memory=False, drop_last=False, timeout=0, | ||
| worker_init_fn=None): | ||
| torch._C._log_api_usage_once("python.data_loader") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well those are not free, so why are we hardcoding them in the core even though literally no other user than FB cares about this feature?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it's time to actually flesh out a design and expose logging properly. i think this feature would be pretty useful to a wider set of folks -- anyone doing thousands of experiments fleet-wide.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm ok with adding a API to do this, but some of the decisions taken in this one seem fairly arbitrary to me.
Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.
Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Test plan:
Used
PYTORCH_API_USAGE_STDERR=1env with various scenarios, verified that logging is indeed triggered.Given the only-once nature of logging, I'm not sure adding unittest would be that beneficial as it might be impact by how multiple unittests are linked together in one binary.