Skip to content

Conversation

@dzhulgakov
Copy link
Collaborator

Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.

Test plan:
Used PYTORCH_API_USAGE_STDERR=1 env with various scenarios, verified that logging is indeed triggered.
Given the only-once nature of logging, I'm not sure adding unittest would be that beneficial as it might be impact by how multiple unittests are linked together in one binary.

Added a tiny benchmark to verify overhead. Since we're talking about static variable only, it seems to be 0.3ns with warm cache (note it's 1000 invocations reported below). Didn't test with cold cache as it's pretty hard to capture such a small overhead:

Running bin/core_overhead_benchmark
Run on (24 X 2394.5 MHz CPU s)
CPU Caches:
  L1 Data 32K (x24)
  L1 Instruction 32K (x24)
  L2 Unified 4096K (x24)
  L3 Unified 16384K (x24)
-------------------------------------------------------
Benchmark                Time           CPU Iterations
-------------------------------------------------------
BM_APILogging         2816 ns       2815 ns     244608
BM_NoAPILogging       2471 ns       2470 ns     277368

@pytorchbot pytorchbot added caffe2 oncall: jit Add this issue/PR to JIT oncall triage queue module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: dataloader Related to torch.utils.data.DataLoader and Sampler oncall: distributed Add this issue/PR to distributed oncall triage queue module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: optimizer Related to torch.optim module: pybind Related to our Python bindings / interactions with other Python libraries labels May 21, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@ezyang
Copy link
Contributor

ezyang commented May 21, 2019

Is there a way to ensure that we continue not to trigger these trigger points via static initialization? Seems like something subtle that's easy to break.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TORCH_CHECK please :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks uber slow lol

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's called from python, so it should be ok :) I'm putting it only in construction of classes, so the impact should be minimal

Copy link
Contributor

@ezyang ezyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Implementation seems sound

@dzhulgakov
Copy link
Collaborator Author

Added a simple python test to ensure that we don't add accidentally some static initialization logging (it's a pretty hacky test but it works - also verified that it fails if such logging is actually added)

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzhulgakov dzhulgakov force-pushed the api-metrics branch 2 times, most recently from 4702df8 to d155af7 Compare May 23, 2019 00:45
@pytorchbot pytorchbot added the module: onnx Related to torch.onnx label May 23, 2019
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@dzhulgakov dzhulgakov force-pushed the api-metrics branch 2 times, most recently from e63fc5c to 9e81649 Compare May 23, 2019 07:18
Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

Copy link
Contributor

@facebook-github-bot facebook-github-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dzhulgakov has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

zdevito pushed a commit to zdevito/ATen that referenced this pull request May 24, 2019
Summary:
Resubmit #20698 which got messed up.

Idea is that when PyTorch is used in a custom build environment (e.g. Facebook), it's useful to track usage of various APIs centrally. This PR introduces a simple very lightweight mechanism to do so - only first invocation of a trigger point would be logged. This is significantly more lightweight than #18235 and thus we can allow to put logging in e.g. TensorImpl.

Also adds an initial list of trigger points. Trigger points are added in such a way that no static initialization triggers them, i.e. just linking with libtorch.so will not cause any logging. Further suggestions of what to log are welcomed.
Pull Request resolved: pytorch/pytorch#20745

Differential Revision: D15429196

Pulled By: dzhulgakov

fbshipit-source-id: a5e41a709a65b7ebccc6b95f93854e583cf20aca
@facebook-github-bot
Copy link
Contributor

@dzhulgakov merged this pull request in c25e337.

@ezyang
Copy link
Contributor

ezyang commented May 24, 2019

ASAN started failing after this PR merged but I am not sure if this diff could have actually caused it:

ay 24 09:05:25 test_structseq_repr (__main__.TestTorch) ... ASAN:DEADLYSIGNAL
May 24 09:05:25 =================================================================
May 24 09:05:25 ==7655==ERROR: AddressSanitizer: SEGV on unknown address 0x7fb1181631c0 (pc 0x7fb15c5ac746 bp 0x7ffded12b8f0 sp 0x7ffded12b078 T0)
May 24 09:05:25 ==7655==The signal is caused by a READ memory access.
May 24 09:05:26     #0 0x7fb15c5ac745 in strlen /build/glibc-LK5gWL/glibc-2.23/string/../sysdeps/x86_64/strlen.S:76
May 24 09:05:26     #1 0x7fb15cbaf1ac  (/usr/lib/llvm-5.0/lib/clang/5.0.0/lib/linux/libclang_rt.asan-x86_64.so+0xa71ac)
May 24 09:05:26     #2 0x7fb149d6f142 in torch::utils::returned_structseq_repr(PyTupleObject*) (/opt/conda/lib/python3.6/site-packages/torch/lib/libtorch_python.so+0x18e0142)
May 24 09:05:26     #3 0x55f56be9a824 in PyObject_Repr /tmp/build/80754af9/python_1546130271559/work/Objects/object.c:521
May 24 09:05:26     #4 0x55f56be5b704 in _PyCFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/methodobject.c:209
May 24 09:05:26     #5 0x55f56bee242b in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4851
May 24 09:05:26     #6 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #7 0x55f56bedc53a in _PyFunction_FastCall /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
May 24 09:05:26     #8 0x55f56bedc53a in fast_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
May 24 09:05:26     #9 0x55f56bee2504 in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
May 24 09:05:26     #10 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #11 0x55f56bedb8e3 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #12 0x55f56bedce69 in _PyFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5084
May 24 09:05:26     #13 0x55f56be5bb0e in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
May 24 09:05:26     #14 0x55f56be606a2 in _PyObject_Call_Prepend /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2373
May 24 09:05:26     #15 0x55f56be5b54d in PyObject_Call /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2261
May 24 09:05:26     #16 0x55f56bf08a6b in do_call_core /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5120
May 24 09:05:26     #17 0x55f56bf08a6b in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3404
May 24 09:05:26     #18 0x55f56bedb8e3 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #19 0x55f56bedcc4b in _PyFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5084
May 24 09:05:26     #20 0x55f56be5bb0e in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
May 24 09:05:26     #21 0x55f56be606a2 in _PyObject_Call_Prepend /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2373
May 24 09:05:26     #22 0x55f56be5b54d in PyObject_Call /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2261
May 24 09:05:26     #23 0x55f56beb5a90 in slot_tp_call /tmp/build/80754af9/python_1546130271559/work/Objects/typeobject.c:6207
May 24 09:05:26     #24 0x55f56be5b92a in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2331
May 24 09:05:26     #25 0x55f56bee257d in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4875
May 24 09:05:26     #26 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #27 0x55f56bedb8e3 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #28 0x55f56bedce69 in _PyFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5084
May 24 09:05:26     #29 0x55f56be5bb0e in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
May 24 09:05:26     #30 0x55f56be606a2 in _PyObject_Call_Prepend /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2373
May 24 09:05:26     #31 0x55f56be5b54d in PyObject_Call /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2261
May 24 09:05:26     #32 0x55f56bf08a6b in do_call_core /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5120
May 24 09:05:26     #33 0x55f56bf08a6b in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3404
May 24 09:05:26     #34 0x55f56bedb8e3 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #35 0x55f56bedcc4b in _PyFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5084
May 24 09:05:26     #36 0x55f56be5bb0e in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
May 24 09:05:26     #37 0x55f56be606a2 in _PyObject_Call_Prepend /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2373
May 24 09:05:26     #38 0x55f56be5b54d in PyObject_Call /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2261
May 24 09:05:26     #39 0x55f56beb5a90 in slot_tp_call /tmp/build/80754af9/python_1546130271559/work/Objects/typeobject.c:6207
May 24 09:05:26     #40 0x55f56be5b92a in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2331
May 24 09:05:26     #41 0x55f56bee257d in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4875
May 24 09:05:26     #42 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #43 0x55f56bedc53a in _PyFunction_FastCall /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
May 24 09:05:26     #44 0x55f56bedc53a in fast_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
May 24 09:05:26     #45 0x55f56bee2504 in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
May 24 09:05:26     #46 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #47 0x55f56bedc53a in _PyFunction_FastCall /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
May 24 09:05:26     #48 0x55f56bedc53a in fast_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4968
May 24 09:05:26     #49 0x55f56bee2504 in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
May 24 09:05:26     #50 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #51 0x55f56bedbd70 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #52 0x55f56bedce69 in _PyFunction_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:5084
May 24 09:05:26     #53 0x55f56be5bb0e in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2310
May 24 09:05:26     #54 0x55f56be606a2 in _PyObject_Call_Prepend /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2373
May 24 09:05:26     #55 0x55f56be5b54d in PyObject_Call /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2261
May 24 09:05:26     #56 0x55f56beb50fa in slot_tp_init /tmp/build/80754af9/python_1546130271559/work/Objects/typeobject.c:6420
May 24 09:05:26     #57 0x55f56bee2766 in type_call /tmp/build/80754af9/python_1546130271559/work/Objects/typeobject.c:915
May 24 09:05:26     #58 0x55f56be5b92a in _PyObject_FastCallDict /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2331
May 24 09:05:26     #59 0x55f56bedca59 in _PyObject_FastCallKeywords /tmp/build/80754af9/python_1546130271559/work/Objects/abstract.c:2496
May 24 09:05:26     #60 0x55f56bee257d in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4875
May 24 09:05:26     #61 0x55f56bf08146 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3351
May 24 09:05:26     #62 0x55f56be8a22a in _PyFunction_FastCall /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4933
May 24 09:05:26     #63 0x55f56bee2504 in call_function /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4872
May 24 09:05:26     #64 0x55f56bf07389 in _PyEval_EvalFrameDefault /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:3335
May 24 09:05:26     #65 0x55f56bedd288 in _PyEval_EvalCodeWithName /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4166
May 24 09:05:26     #66 0x55f56bedd288 in PyEval_EvalCodeEx /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:4187
May 24 09:05:26     #67 0x55f56bede01b in PyEval_EvalCode /tmp/build/80754af9/python_1546130271559/work/Python/ceval.c:731
May 24 09:05:26     #68 0x55f56bf603c3 in run_mod /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:1025
May 24 09:05:26     #69 0x55f56bf607c0 in PyRun_FileExFlags /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:978
May 24 09:05:26     #70 0x55f56bf609c2 in PyRun_SimpleFileExFlags /tmp/build/80754af9/python_1546130271559/work/Python/pythonrun.c:419
May 24 09:05:26     #71 0x55f56bf644b2 in run_file /tmp/build/80754af9/python_1546130271559/work/Modules/main.c:340
May 24 09:05:26     #72 0x55f56bf644b2 in Py_Main /tmp/build/80754af9/python_1546130271559/work/Modules/main.c:811
May 24 09:05:26     #73 0x55f56be2d02d in main /tmp/build/80754af9/python_1546130271559/work/Programs/python.c:69
May 24 09:05:26     #74 0x7fb15c54182f in __libc_start_main /build/glibc-LK5gWL/glibc-2.23/csu/../csu/libc-start.c:291
May 24 09:05:26     #75 0x55f56bf0de0d in _start /home/rdonnelly/mc/conda-bld/compilers_linux-64_1534865402226/work/.build/src/glibc-2.12.2/csu/../sysdeps/x86_64/elf/start.S:103
May 24 09:05:26 
May 24 09:05:26 AddressSanitizer can not provide additional info.
May 24 09:05:26 SUMMARY: AddressSanitizer: SEGV /build/glibc-LK5gWL/glibc-2.23/string/../sysdeps/x86_64/strlen.S:76 in strlen

ezyang added a commit that referenced this pull request May 24, 2019
ezyang added a commit that referenced this pull request May 24, 2019
This reverts commit c25e337.

gh-metadata: pytorch pytorch 20906 gh/ezyang/139/head
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

caffe2 Merged module: build Build system issues module: cuda Related to torch.cuda, and CUDA support in general module: dataloader Related to torch.utils.data.DataLoader and Sampler module: internals Related to internal abstractions in c10 and ATen module: nn Related to torch.nn module: onnx Related to torch.onnx module: optimizer Related to torch.optim module: pybind Related to our Python bindings / interactions with other Python libraries oncall: distributed Add this issue/PR to distributed oncall triage queue oncall: jit Add this issue/PR to JIT oncall triage queue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants