Skip to content

Conversation

@dzhulgakov
Copy link
Collaborator

@dzhulgakov dzhulgakov commented May 20, 2019

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.

Test plan:
Used PYTORCH_API_USAGE_STDERR=1 env with various scenarios, verified that logging is indeed triggered.
Given the only-once nature of logging, I'm not sure adding unittest would be that beneficial as it might be impact by how multiple unittests are linked together in one binary.

@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: cuda Related to torch.cuda, and CUDA support in general module: dataloader Related to torch.utils.data.DataLoader and Sampler oncall: distributed Add this issue/PR to distributed oncall triage queue module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: optimizer Related to torch.optim module: pybind Related to our Python bindings / interactions with other Python libraries labels May 20, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzhulgakov dzhulgakov merged commit d3059b9 into pytorch:master May 20, 2019
*
* In order to ensure light-weightness of logging, we utilize static variable
* trick - LogAPIUsage will be invoked only once and further invocations will
* just do an atomic check.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

an atomic check on for example a hot-loop, wouldn't that add a few hundred nano-seconds per TensorImpl usage?
Is the logging only enabled if the env variable is set, and the atomic check skippable if it's not?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logging will be enabled all the time in e.g. FB environment. The envvar thing is just for testing in default implementation.

I've tried to measure overhead and if I don't wipe out cache (it's hard to measure with cold cache as the overhead is so tiny), it comes down to something like 0.3ns (nanoseconds) so it's definitely fine. The static var initialization trick is used widely (including for singletons), so it should be cheap to sprinkle around - https://stackoverflow.com/questions/23829389/does-a-function-local-static-variable-automatically-incur-a-branch

@dzhulgakov dzhulgakov deleted the api-metrics branch May 21, 2019 05:01
facebook-github-bot pushed a commit that referenced this pull request May 24, 2019
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: #20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
Copy link
Contributor

@apaszke apaszke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this log-once thing of any use? How were the places where we add the log selected? It all seems very arbitrary to me and only contributes to the messiness of our codebase.

batch_sampler=None, num_workers=0, collate_fn=default_collate,
pin_memory=False, drop_last=False, timeout=0,
worker_init_fn=None):
torch._C._log_api_usage_once("python.data_loader")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well those are not free, so why are we hardcoding them in the core even though literally no other user than FB cares about this feature?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think it's time to actually flesh out a design and expose logging properly. i think this feature would be pretty useful to a wider set of folks -- anyone doing thousands of experiments fleet-wide.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with adding a API to do this, but some of the decisions taken in this one seem fairly arbitrary to me.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 module: cuda Related to torch.cuda, and CUDA support in general module: dataloader Related to torch.utils.data.DataLoader and Sampler module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: optimizer Related to torch.optim module: pybind Related to our Python bindings / interactions with other Python libraries oncall: distributed Add this issue/PR to distributed oncall triage queue oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants