[LTC] Merge master #65112

alanwaketan · 2021-09-15T23:43:36Z

Merge master to the lazy_tensor_staging branch in order to use TorchBench's on-demand PR CI.

cc @pietern @mrshenli @pritamdamania87 @zhaojuanmao @satgera @rohan-varma @gqchen @aazzolini @osalpekar @jiayisuse @SciPioneer @H-Huang @cbalioglu @gcramer23

Test Plan: revert-hammer Differential Revision: D30711934 (1cd0252) Original commit changeset: 0af808ddf528 fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10

Summary: Pull Request resolved: #64641 `sum`, `mean`, and `norm` were ported to structured kernels in #61642, #61643, and #62711, respectively. Those PRs changed related overlads into composite kernels. However, their dispatch section remained the same, when they really should be marked as `CompositeExplicitAutograd`. This PR fixes this issue. Test Plan: Imported from OSS Reviewed By: ngimel Differential Revision: D30867122 Pulled By: ezyang fbshipit-source-id: b951aee41a3cab9ca546df826a285d60013e3b3a

Summary: Pull Request resolved: #64933 Fixes pytorch/functorch#108 This is a short-term fix. A longer-term fix would be to either: 1. have proper {select,slice,diagonal}_embed functions 2. have efficient {select,slice,diagonal}_scatter functions (and efficient zero tensors). NB: I didn't use diag_embed because diag_embed is slightly different from diagonal_backward. There are no BC concerns because TorchScript (luckily) does not serialize the backwards graph. Test Plan: - run tests - run benchmarks. https://gist.github.com/zou3519/e7c0774d1ac97f32aa02ec44d81e60e1. Surprisingly the instruction count goes down. This is probably because we create fewer autograd nodes now. Reviewed By: ezyang Differential Revision: D30909333 Pulled By: zou3519 fbshipit-source-id: 3b33e13010ba13b4d487b346aa9bee8a0e8c378c

…64948) Summary: Connected with issue #64845, takeover of #64091 Pull Request resolved: #64948 Reviewed By: malfet, seemethere Differential Revision: D30908592 Pulled By: janeyx99 fbshipit-source-id: dc31b0bbc9f4e35d23412aa14acbbab7422b4146

Summary: There were several reports of target determinator incorrectly skipping tests, most recent one is #64902 Let's disable it until it could be further stabilized Pull Request resolved: #64921 Reviewed By: seemethere, janeyx99 Differential Revision: D30901186 Pulled By: malfet fbshipit-source-id: 531afd2d390c6b51f727330d5dd1882d70b6fdde

Summary: The library will no longer link properly on VS 2019 (14.29.30133). To ensure that engineers building on Windows can use and debug with this build type, incremental linking needs to be turned off for this build flag. Verified that this build type successfully builds, links, and provides debuggable Python modules on Windows. Pull Request resolved: #64892 Reviewed By: jbschlosser Differential Revision: D30902565 Pulled By: malfet fbshipit-source-id: e5286a4c6f45c7cbe4cdc1b98560129bd386970b

Summary: Pull Request resolved: #64903 Fix the accuracy regression caused by #63895. Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_periodic_model_averager buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30894688 fbshipit-source-id: fe00b8b23b860d9f806f87c1b6caba1d0b807485

Summary: Pull Request resolved: #64945 In the const folding pass, we try to create `get_attr` nodes in submod_1 for `get_attr` nodes that are in the main graph. But we don't have the real attributes in submod_1. To fix this we assign main module as the owning module of sumod_1 graph. The fix above would cause problem for `call_module` node in submod_1 because during split modules gets inlined (target changed from "mod.a.b" -> "mod_a_b") to submod_1. Changing the owning module would make those `call_module nodes unable to find the referring module. To fix this, we set the targeting module to main module. Reviewed By: jfix71 Differential Revision: D30905949 fbshipit-source-id: cd67bc8fe4b8ad4344ae97b8e36753fdce3ece6d

Summary: Pull Request resolved: #64447 As the code comment says, we needn't worry about Jupyter notebooks on mobile. ghstack-source-id: 137951718 Test Plan: Profiled startup of //caffe2/caffe2/fb/high_perf_models/pytorch/benchmark_framework_overheads:cpp_benchmark on devserver with -niter 0 -nrep 0 and `C10_DISPATCHER_ONE_KERNEL_PER_DISPATCH_KEY` defined. Time spent in sherwood_v3_table lookups went way down. Reviewed By: ezyang, bhosmer Differential Revision: D30736094 fbshipit-source-id: bcc22cd0d9adceba259a03898c992759d501fe89

Summary: per title Pull Request resolved: #64972 Reviewed By: mruberry Differential Revision: D30924598 Pulled By: ngimel fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899

Summary: Pull Request resolved: #64937 Adds CLI output for rendered test results to go alongside test exeuction, users should be able to quickly diagnose test failures like so: ![fdsfdsfdsfdsf](https://user-images.githubusercontent.com/1700823/133156245-ba939cbf-8aa2-47a7-b1fb-7cc876ca75c4.png) Signed-off-by: Eli Uriegas <[email protected]> cc ezyang seemethere malfet lg20987 pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D30917897 Pulled By: seemethere fbshipit-source-id: f51ea499462e3cfd64496cb711b84a93971c91bd

…ialization Level [1/2] (#64268) Summary: Pull Request resolved: #64268 If the same pair of operator name and num inputs have been used to add an instruction to the operator table previously (and the operator's schema is not vararg), use the same index as that instruction rather than creating a new one. ghstack-source-id: 138014905 Test Plan: Phabricator tests, and test performance changes in next diff Reviewed By: iseeyuan, tugsbayasgalan Differential Revision: D30615434 fbshipit-source-id: f442f557f12412693a73004ce44733ccef063b82

…ialization Level [2/2] (#64269) Summary: Pull Request resolved: #64269 Revert changes in D29826210 (693d8f2) (we don't need operator lambda caching since there aren't duplicate operators anymore) This diff stack results in an additional approx 12% speedup in model loading time (from 229ms to 200ms) when run against an 87MB speech model that jiatongzhou provided. ghstack-source-id: 138014904 Test Plan: **Speech Transducer v25 model (as in D29826210 (693d8f2f0767413bb995b895fccad87dfd4f05a7))** || Before | After | |Load Time|[229ms](https://www.internalfb.com/intern/aibench/details/160889436133243)|[200ms](https://www.internalfb.com/intern/aibench/details/837884532607514)| |Save File Size|[86.23 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658544950)|[86.1 MB](https://lookaside.facebook.com/intern/diff/file/data/?number=658554403)| The "after" flamegraph shows significantly less time is spent on ```append_operator``` than before. Steps - Check out desired commit in devserver (base branch or this diff) - ```buck build bento/kernels:bento_kernel_pytorch``` - Use N1094068 with pytorch_local kernel to save model for lite interpreter - Edit ```aibench/specifications/models/pytorch/speech_transducer/v25.json ``` to have new model location and md5 - ```buck run aibench:run_bench -- -b aibench/specifications/models/pytorch/speech_transducer/v25.json --framework pytorch --platform android/arm64 --devices "S8US" --force_profile --remote ``` **Test that saving a model with de-dup ops doesn't change its output** https://www.internalfb.com/intern/anp/view/?id=1137434 Reviewed By: iseeyuan Differential Revision: D30615710 fbshipit-source-id: bb4052f0f16eccab386585e94411056f94bce43c

…l and has different type (#65004) Summary: Pull Request resolved: #65004 If we have some code like `torch.add(x, 1)` and x is a float tensor then in conversion things would falling apart because currently we will add a constant layer of int32 dtype for `1` but we actually need float dtype. This diff adds an arg to `get_trt_tensor` which specify the dtype of the constant layer we would created. Also, start to add doc string for functions. Reviewed By: yinghai Differential Revision: D30852156 fbshipit-source-id: 650ce72d2794093a4616e640ea503dcc1c6b2bc4

Summary: Pull Request resolved: #64031 More copies of tuple elements. ghstack-source-id: 137978948 Test Plan: Pixel 3 before: https://our.intern.facebook.com/intern/aibench/details/724509739115867 Pixel 3 after: https://our.intern.facebook.com/intern/aibench/details/232361457767293 Top-line number doesn't seem to have moved, but we can see that the vector copy disappeared in the flame graph. Reviewed By: raziel Differential Revision: D30559545 fbshipit-source-id: e5343abae96b8e80e0ccec482ad316884ae231ea

…63993) Summary: Pull Request resolved: #63993 This seems to be unused, and it's pretty scary. ghstack-source-id: 137978949 Test Plan: CI Reviewed By: lw Differential Revision: D30560441 fbshipit-source-id: 08b7ce971fd1e2dbeddbf37b02413fef513b4753

Summary: Pull Request resolved: #64110 As the code comment says, we can exploit pickler string interning to accelerate OpCode parsing. No more strcmp! ghstack-source-id: 137978946 Test Plan: Pixel 3 before: https://www.internalfb.com/intern/aibench/details/591414145082422 Pixel 3 after: https://www.internalfb.com/intern/aibench/details/484557404703261 new mean is 292 ms, down from 302 ms. Reviewed By: dhruvbird Differential Revision: D30615052 fbshipit-source-id: 9707625e778388a7920ab72704d71ad57ddaac17

Summary: Pull Request resolved: #64277 Just moved the vector implementation to ArrayRef and re-implemented the former using the latter. ghstack-source-id: 137978947 Test Plan: existing CI Reviewed By: dhruvbird Differential Revision: D30647666 fbshipit-source-id: c0f4f06c348d36882ec0db802be44d8c7749562f

Summary: Pull Request resolved: #64623 The config api will change, but we'll add configs gradually for TensorRT to unblock experimentation Test Plan: python torch/fx/experimental/fx2trt/example/unittests.py Imported from OSS Reviewed By: vkuzo Differential Revision: D30800474 fbshipit-source-id: 3c4640de1205a0f19b62943ab84f386d80394ec2

…64951) Summary: Pull Request resolved: #64951 Test Plan: Imported from OSS Reviewed By: VitalyFedyunin Differential Revision: D30910035 Pulled By: ejguan fbshipit-source-id: d687fe10939920a3617a60552fe743e8526438a0

…nfo (#63978) Summary: Pull Request resolved: #63978 Test Plan: Imported from OSS Reviewed By: saketh-are Differential Revision: D30558877 Pulled By: heitorschueroff fbshipit-source-id: 3e62ff24a935784fc93a76a0f46a1deb060ba680

Summary: Pull Request resolved: #64885 1) The constructor accepts a local optimizer instance instead of the inputs of local optimizer constructor and the class type. 2) The parameters are read from local optimizer's `param_groups` instead of a separate input. Proposal: #59699 ghstack-source-id: 137865867 Test Plan: buck test mode/dev-nosan //caffe2/test/distributed:distributed_nccl_spawn -- test_post_localSGD_optimizer_parity Reviewed By: rohan-varma Differential Revision: D30888794 fbshipit-source-id: 21261b480f6bbb9b2333426020e3f350da3f73c2

…um to BinaryUfuncInfo Test Plan: revert-hammer Differential Revision: D30558877 (382e008) Original commit changeset: 3e62ff24a935 fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf

Summary: Puts memory sharing intro under Sharing memory... header, where it should have been all along. Pull Request resolved: #64996 Reviewed By: mruberry Differential Revision: D30948619 Pulled By: ngimel fbshipit-source-id: 5d9dd267b34e9d3fc499d4738377b58a22da1dc2

Summary: Fixes #62793 This is mostly a quick fix. I think the more correct fix could be updating `unique_dim` to `_unique_dim` which could be BC-breaking for C++ users (� maybe). Maybe something else I am missing. ~~Not sure how to add a test for it.~~ Have tested it locally. We can add a test like following. Tested this locally, it fails currently but passes with the fix. ```python def test_wildcard_import(self): exec('from torch import *') ``` Pull Request resolved: #63080 Reviewed By: gchanan Differential Revision: D30738711 Pulled By: zou3519 fbshipit-source-id: b86d0190e45ba0b49fd2cffdcfd2e3a75cc2a35e

…rs (#64988) Summary: Pull Request resolved: #64988 Pull Request resolved: #64968 The current wrapper (provided by [Vulkan-Tools](https://github.com/KhronosGroup/Vulkan-Tools/tree/master/common)) can't handle dynamically loading Vulkan on Windows/Mac. Therefore, we can bring in [volk](https://github.com/zeux/volk) to load the vulkan libraries for other platforms. 1. Use `volk` with `link_style="static"` only if Windows. Use `vulkan_wrapper` for all others (temporary solution) 2. Make DotSlash work on Windows when resolving glslc path Test Plan: For Android: ``` cd ~/fbsource buck build -c ndk.custom_libcxx=false -c pt.enable_qpl=0 //xplat/caffe2:pt_vulkan_api_test_binAndroid\#android-arm64 --show-output adb push buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAndroid\#android-arm64 /data/local/tmp/vulkan_api_test adb shell "/data/local/tmp/vulkan_api_test" cd - ``` For Mac: ``` buck build //xplat/caffe2:pt_vulkan_api_test_binAppleMac ./buck-out/gen/xplat/caffe2/pt_vulkan_api_test_binAppleMac\#macosx-x86_64 ``` On Local OSS repo with `pr/64988` branch: The build and test are fine. Note that `VulkanAPITest.log_softmax()` has been broken for the past month. Ivan will take a look at when he is available. Build: `BUILD_TEST=1 USE_VULKAN=1 USE_VULKAN_SHADERC_RUNTIME=1 USE_VULKAN_WRAPPER=0 MACOSX_DEPLOYMENT_TARGET=10.9 CC=clang CXX=clang++ python setup.py install` Test: `$PYTORCH_ROOT/build/bin/vulkan_api_test /data/local/tmp` ``` Running main() from ../third_party/googletest/googletest/src/gtest_main.cc [==========] Running 69 tests from 1 test suite. [----------] Global test environment set-up. [----------] 69 tests from VulkanAPITest [ RUN ] VulkanAPITest.adaptive_avg_pool2d [ OK ] VulkanAPITest.adaptive_avg_pool2d (228 ms) [ RUN ] VulkanAPITest.add [ OK ] VulkanAPITest.add (51 ms) [ RUN ] VulkanAPITest.add_broadcast0 [ OK ] VulkanAPITest.add_broadcast0 (13 ms) [ RUN ] VulkanAPITest.add_broadcast1 [ OK ] VulkanAPITest.add_broadcast1 (9 ms) [ RUN ] VulkanAPITest.add_broadcast2 [ OK ] VulkanAPITest.add_broadcast2 (9 ms) [ RUN ] VulkanAPITest.add_ [ OK ] VulkanAPITest.add_ (60 ms) [ RUN ] VulkanAPITest.add_broadcast0_ [ OK ] VulkanAPITest.add_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.add_broadcast1_ [ OK ] VulkanAPITest.add_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.add_scalar [ OK ] VulkanAPITest.add_scalar (24 ms) [ RUN ] VulkanAPITest.add_scalar_ [ OK ] VulkanAPITest.add_scalar_ (8 ms) [ RUN ] VulkanAPITest.addmm [ OK ] VulkanAPITest.addmm (22 ms) [ RUN ] VulkanAPITest.addmm_expand [ OK ] VulkanAPITest.addmm_expand (12 ms) [ RUN ] VulkanAPITest.avg_pool2d [ OK ] VulkanAPITest.avg_pool2d (9 ms) [ RUN ] VulkanAPITest.clamp [ OK ] VulkanAPITest.clamp (92 ms) [ RUN ] VulkanAPITest.clamp_ [ OK ] VulkanAPITest.clamp_ (60 ms) [ RUN ] VulkanAPITest.conv2d [ OK ] VulkanAPITest.conv2d (15 ms) [ RUN ] VulkanAPITest.conv2d_dw [ OK ] VulkanAPITest.conv2d_dw (15 ms) [ RUN ] VulkanAPITest.conv2d_pw [ OK ] VulkanAPITest.conv2d_pw (34 ms) [ RUN ] VulkanAPITest.conv2d_winograd [ OK ] VulkanAPITest.conv2d_winograd (10 ms) [ RUN ] VulkanAPITest.copy [ OK ] VulkanAPITest.copy (1 ms) [ RUN ] VulkanAPITest.div [ OK ] VulkanAPITest.div (32 ms) [ RUN ] VulkanAPITest.div_broadcast0 [ OK ] VulkanAPITest.div_broadcast0 (11 ms) [ RUN ] VulkanAPITest.div_broadcast1 [ OK ] VulkanAPITest.div_broadcast1 (9 ms) [ RUN ] VulkanAPITest.div_broadcast2 [ OK ] VulkanAPITest.div_broadcast2 (7 ms) [ RUN ] VulkanAPITest.div_ [ OK ] VulkanAPITest.div_ (46 ms) [ RUN ] VulkanAPITest.div_broadcast0_ [ OK ] VulkanAPITest.div_broadcast0_ (9 ms) [ RUN ] VulkanAPITest.div_broadcast1_ [ OK ] VulkanAPITest.div_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.div_scalar [ OK ] VulkanAPITest.div_scalar (95 ms) [ RUN ] VulkanAPITest.div_scalar_ [ OK ] VulkanAPITest.div_scalar_ (18 ms) [ RUN ] VulkanAPITest.empty [ OK ] VulkanAPITest.empty (0 ms) [ RUN ] VulkanAPITest.hardsigmoid [ OK ] VulkanAPITest.hardsigmoid (76 ms) [ RUN ] VulkanAPITest.hardsigmoid_ [ OK ] VulkanAPITest.hardsigmoid_ (80 ms) [ RUN ] VulkanAPITest.hardshrink [ OK ] VulkanAPITest.hardshrink (630 ms) [ RUN ] VulkanAPITest.hardshrink_ [ OK ] VulkanAPITest.hardshrink_ (573 ms) [ RUN ] VulkanAPITest.leaky_relu [ OK ] VulkanAPITest.leaky_relu (271 ms) [ RUN ] VulkanAPITest.leaky_relu_ [ OK ] VulkanAPITest.leaky_relu_ (254 ms) [ RUN ] VulkanAPITest.hardswish [ OK ] VulkanAPITest.hardswish (83 ms) [ RUN ] VulkanAPITest.hardswish_ [ OK ] VulkanAPITest.hardswish_ (72 ms) [ RUN ] VulkanAPITest.max_pool2d [ OK ] VulkanAPITest.max_pool2d (16 ms) [ RUN ] VulkanAPITest.mean [ OK ] VulkanAPITest.mean (17 ms) [ RUN ] VulkanAPITest.mean2d [ OK ] VulkanAPITest.mean2d (20 ms) [ RUN ] VulkanAPITest.mm [ OK ] VulkanAPITest.mm (12 ms) [ RUN ] VulkanAPITest.mul [ OK ] VulkanAPITest.mul (28 ms) [ RUN ] VulkanAPITest.mul_broadcast0 [ OK ] VulkanAPITest.mul_broadcast0 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast1 [ OK ] VulkanAPITest.mul_broadcast1 (9 ms) [ RUN ] VulkanAPITest.mul_broadcast2 [ OK ] VulkanAPITest.mul_broadcast2 (9 ms) [ RUN ] VulkanAPITest.mul_ [ OK ] VulkanAPITest.mul_ (43 ms) [ RUN ] VulkanAPITest.mul_broadcast0_ [ OK ] VulkanAPITest.mul_broadcast0_ (8 ms) [ RUN ] VulkanAPITest.mul_broadcast1_ [ OK ] VulkanAPITest.mul_broadcast1_ (1 ms) [ RUN ] VulkanAPITest.mul_scalar [ OK ] VulkanAPITest.mul_scalar (64 ms) [ RUN ] VulkanAPITest.mul_scalar_ [ OK ] VulkanAPITest.mul_scalar_ (17 ms) [ RUN ] VulkanAPITest.reflection_pad2d [ OK ] VulkanAPITest.reflection_pad2d (7 ms) [ RUN ] VulkanAPITest.reshape [ OK ] VulkanAPITest.reshape (73 ms) [ RUN ] VulkanAPITest.reshape_ [ OK ] VulkanAPITest.reshape_ (41 ms) [ RUN ] VulkanAPITest.sigmoid [ OK ] VulkanAPITest.sigmoid (81 ms) [ RUN ] VulkanAPITest.sigmoid_ [ OK ] VulkanAPITest.sigmoid_ (68 ms) [ RUN ] VulkanAPITest.softmax [ OK ] VulkanAPITest.softmax (28 ms) [ RUN ] VulkanAPITest.log_softmax Max Diff allowed: 5.87862e-05 ../aten/src/ATen/test/vulkan_api_test.cpp:1470: Failure Value of: check Actual: false Expected: true [ FAILED ] VulkanAPITest.log_softmax (19 ms) [ RUN ] VulkanAPITest.tanh [ OK ] VulkanAPITest.tanh (63 ms) [ RUN ] VulkanAPITest.tanh_ [ OK ] VulkanAPITest.tanh_ (68 ms) [ RUN ] VulkanAPITest.sub [ OK ] VulkanAPITest.sub (28 ms) [ RUN ] VulkanAPITest.sub_broadcast0 [ OK ] VulkanAPITest.sub_broadcast0 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast1 [ OK ] VulkanAPITest.sub_broadcast1 (9 ms) [ RUN ] VulkanAPITest.sub_broadcast2 [ OK ] VulkanAPITest.sub_broadcast2 (8 ms) [ RUN ] VulkanAPITest.sub_ [ OK ] VulkanAPITest.sub_ (43 ms) [ RUN ] VulkanAPITest.sub_broadcast0_ [ OK ] VulkanAPITest.sub_broadcast0_ (10 ms) [ RUN ] VulkanAPITest.sub_broadcast1_ [ OK ] VulkanAPITest.sub_broadcast1_ (2 ms) [ RUN ] VulkanAPITest.upsample_nearest2d [ OK ] VulkanAPITest.upsample_nearest2d (5 ms) [ RUN ] VulkanAPITest.mobilenetv2 [ OK ] VulkanAPITest.mobilenetv2 (82 ms) [----------] 69 tests from VulkanAPITest (3885 ms total) [----------] Global test environment tear-down [==========] 69 tests from 1 test suite ran. (3885 ms total) [ PASSED ] 68 tests. [ FAILED ] 1 test, listed below: [ FAILED ] VulkanAPITest.log_softmax 1 FAILED TEST ``` Differential Revision: D30925995 fbshipit-source-id: 1b1b7f7f22090064424a5379d2f0559d0da7846a

Summary: This PR plays around with implementation & usage of a `parametrize` decorator for test parametrization similar to `pytest.mark.parametrize`, based on previous work introducing a `_TestParametrizer` class. It works with the internal `DeviceTest` hierarchy & composes with `dtype`, `skip*`, and other decorators. Basic usage is demonstrated in `test/test_blah.py`: ```python import unittest from itertools import product from torch.testing._internal.common_device_type import ( instantiate_device_type_tests, deviceCountAtLeast, ops) from torch.testing._internal.common_methods_invocations import op_db from torch.testing._internal.common_utils import ( TestCase, run_tests, parametrize, instantiate_parametrized_tests, subtest) class TestBlah(TestCase): parametrize("x", range(5)) def test_default_names(self, x): print('Passed in:', x) # Use default names but add an expected failure. parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), *range(1, 5)]) def test_default_names_expected_failure(self, x): if x == 0: raise RuntimeError('Boom') print('Passed in:', x) parametrize("bias", [False, True], name_fn=lambda b: 'bias' if b else 'no_bias') def test_custom_names(self, bias): print('Passed in:', bias) parametrize("bias", [subtest(True, name='bias'), subtest(False, name='no_bias')]) def test_custom_names_alternate(self, bias): print('Passed in:', bias) parametrize("x,y", [(1, 2), (1, 3), (1, 4)]) def test_two_things_default_names(self, x, y): print('Passed in:', x, y) parametrize("x", [1, 2, 3]) parametrize("y", [4, 5, 6]) def test_two_things_composition(self, x, y): print('Passed in:', x, y) parametrize("x", [subtest(0, decorators=[unittest.expectedFailure]), *range(1, 3)]) parametrize("y", [4, 5, subtest(6, decorators=[unittest.expectedFailure])]) def test_two_things_composition_expected_failure(self, x, y): if x == 0 or y == 6: raise RuntimeError('Boom') print('Passed in:', x, y) parametrize("x", [1, 2]) parametrize("y", [3, 4]) parametrize("z", [5, 6]) def test_three_things_composition(self, x, y, z): print('Passed in:', x, y, z) parametrize("x", [1, 2], name_fn=str) parametrize("y", [3, 4], name_fn=str) parametrize("z", [5, 6], name_fn=str) def test_three_things_composition_custom_names(self, x, y, z): print('Passed in:', x, y, z) parametrize("x,y", product(range(2), range(3))) def test_two_things_product(self, x, y): print('Passed in:', x, y) parametrize("x,y", [subtest((1, 2), name='double'), subtest((1, 3), name='triple'), subtest((1, 4), name='quadruple')]) def test_two_things_custom_names(self, x, y): print('Passed in:', x, y) parametrize("x,y", [(1, 2), (1, 3), (1, 4)], name_fn=lambda x, y: '{}_{}'.format(x, y)) def test_two_things_custom_names_alternate(self, x, y): print('Passed in:', x, y) class TestDeviceBlah(TestCase): parametrize("x", range(10)) def test_default_names(self, device, x): print('Passed in:', device, x) parametrize("x,y", [(1, 2), (3, 4), (5, 6)]) def test_two_things(self, device, x, y): print('Passed in:', device, x, y) deviceCountAtLeast(1) def test_multiple_devices(self, devices): print('Passed in:', devices) ops(op_db) parametrize("flag", [False, True], lambda f: 'flag_enabled' if f else 'flag_disabled') def test_op_parametrized(self, device, dtype, op, flag): print('Passed in:', device, dtype, op, flag) instantiate_parametrized_tests(TestBlah) instantiate_device_type_tests(TestDeviceBlah, globals()) if __name__ == '__main__': run_tests() ``` Generated tests: ``` TestBlah.test_custom_names_alternate_bias TestBlah.test_custom_names_alternate_no_bias TestBlah.test_custom_names_bias TestBlah.test_custom_names_no_bias TestBlah.test_default_names_expected_failure_x_0 TestBlah.test_default_names_expected_failure_x_1 TestBlah.test_default_names_expected_failure_x_2 TestBlah.test_default_names_expected_failure_x_3 TestBlah.test_default_names_expected_failure_x_4 TestBlah.test_default_names_x_0 TestBlah.test_default_names_x_1 TestBlah.test_default_names_x_2 TestBlah.test_default_names_x_3 TestBlah.test_default_names_x_4 TestBlah.test_three_things_composition_custom_names_1_3_5 TestBlah.test_three_things_composition_custom_names_1_3_6 TestBlah.test_three_things_composition_custom_names_1_4_5 TestBlah.test_three_things_composition_custom_names_1_4_6 TestBlah.test_three_things_composition_custom_names_2_3_5 TestBlah.test_three_things_composition_custom_names_2_3_6 TestBlah.test_three_things_composition_custom_names_2_4_5 TestBlah.test_three_things_composition_custom_names_2_4_6 TestBlah.test_three_things_composition_x_1_y_3_z_5 TestBlah.test_three_things_composition_x_1_y_3_z_6 TestBlah.test_three_things_composition_x_1_y_4_z_5 TestBlah.test_three_things_composition_x_1_y_4_z_6 TestBlah.test_three_things_composition_x_2_y_3_z_5 TestBlah.test_three_things_composition_x_2_y_3_z_6 TestBlah.test_three_things_composition_x_2_y_4_z_5 TestBlah.test_three_things_composition_x_2_y_4_z_6 TestBlah.test_two_things_composition_expected_failure_x_0_y_4 TestBlah.test_two_things_composition_expected_failure_x_0_y_5 TestBlah.test_two_things_composition_expected_failure_x_0_y_6 TestBlah.test_two_things_composition_expected_failure_x_1_y_4 TestBlah.test_two_things_composition_expected_failure_x_1_y_5 TestBlah.test_two_things_composition_expected_failure_x_1_y_6 TestBlah.test_two_things_composition_expected_failure_x_2_y_4 TestBlah.test_two_things_composition_expected_failure_x_2_y_5 TestBlah.test_two_things_composition_expected_failure_x_2_y_6 TestBlah.test_two_things_composition_x_1_y_4 TestBlah.test_two_things_composition_x_1_y_5 TestBlah.test_two_things_composition_x_1_y_6 TestBlah.test_two_things_composition_x_2_y_4 TestBlah.test_two_things_composition_x_2_y_5 TestBlah.test_two_things_composition_x_2_y_6 TestBlah.test_two_things_composition_x_3_y_4 TestBlah.test_two_things_composition_x_3_y_5 TestBlah.test_two_things_composition_x_3_y_6 TestBlah.test_two_things_custom_names_alternate_1_2 TestBlah.test_two_things_custom_names_alternate_1_3 TestBlah.test_two_things_custom_names_alternate_1_4 TestBlah.test_two_things_custom_names_double TestBlah.test_two_things_custom_names_quadruple TestBlah.test_two_things_custom_names_triple TestBlah.test_two_things_default_names_x_1_y_2 TestBlah.test_two_things_default_names_x_1_y_3 TestBlah.test_two_things_default_names_x_1_y_4 TestBlah.test_two_things_product_x_0_y_0 TestBlah.test_two_things_product_x_0_y_1 TestBlah.test_two_things_product_x_0_y_2 TestBlah.test_two_things_product_x_1_y_0 TestBlah.test_two_things_product_x_1_y_1 TestBlah.test_two_things_product_x_1_y_2 TestDeviceBlahCPU.test_default_names_x_0_cpu TestDeviceBlahCPU.test_default_names_x_1_cpu TestDeviceBlahCPU.test_default_names_x_2_cpu TestDeviceBlahCPU.test_default_names_x_3_cpu TestDeviceBlahCPU.test_default_names_x_4_cpu TestDeviceBlahCPU.test_default_names_x_5_cpu TestDeviceBlahCPU.test_default_names_x_6_cpu TestDeviceBlahCPU.test_default_names_x_7_cpu TestDeviceBlahCPU.test_default_names_x_8_cpu TestDeviceBlahCPU.test_default_names_x_9_cpu TestDeviceBlahCPU.test_multiple_devices_cpu TestDeviceBlahCPU.test_op_parametrized_<opname>_<variant>_cpu_uint8_flag_enabled_cpu TestDeviceBlahCPU.test_two_things_x_1_y_2_cpu TestDeviceBlahCPU.test_two_things_x_3_y_4_cpu TestDeviceBlahCPU.test_two_things_x_5_y_6_cpu TestDeviceBlahMETA.test_default_names_x_0_meta TestDeviceBlahMETA.test_default_names_x_1_meta TestDeviceBlahMETA.test_default_names_x_2_meta TestDeviceBlahMETA.test_default_names_x_3_meta TestDeviceBlahMETA.test_default_names_x_4_meta TestDeviceBlahMETA.test_default_names_x_5_meta TestDeviceBlahMETA.test_default_names_x_6_meta TestDeviceBlahMETA.test_default_names_x_7_meta TestDeviceBlahMETA.test_default_names_x_8_meta TestDeviceBlahMETA.test_default_names_x_9_meta TestDeviceBlahMETA.test_multiple_devices_meta TestDeviceBlahMETA.test_op_parametrized_<opname>_<variant>_meta_uint8_flag_enabled_meta TestDeviceBlahMETA.test_two_things_x_1_y_2_meta TestDeviceBlahMETA.test_two_things_x_3_y_4_meta TestDeviceBlahMETA.test_two_things_x_5_y_6_meta ``` Caveats: * `parametrize` decorators cannot be "stacked" yet; each one overwrites the previous. This will change to either: * Allow stacking of multiple decorators * Error out with a nice error message if multiple decorators are specified The PR introduces `instantiate_parametrized_tests()` in addition to `instantiate_device_type_tests()`. The former should be used for non-device-specific tests, and the latter should be used for device-specific tests, as usual. Both of these support the `parametrize` decorator. Only the latter supports the `ops` decorator (no change here- this was already the case). Pull Request resolved: #60753 Reviewed By: saketh-are Differential Revision: D30606615 Pulled By: jbschlosser fbshipit-source-id: a34f36d643f68a6e221f419d9bb3e1ae1d84dd65

Summary: Pull Request resolved: #64935 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D30889157 fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d

Summary: Addresses pytorch/functorch#78. Pull Request resolved: #62315 Reviewed By: mruberry Differential Revision: D30932765 Pulled By: zou3519 fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e

…am with attempt to close and additional warning (#64788) Summary: ghstack is not working for the second commit so I'm manually creating this PR for now. Please only look at changes related to the second commit in this PR (there is a PR for the first commit). This PR removes TarArchiveReader's dependency on FileLoader DataPipe, by allowing it to use a IterDataPipe of path names as input rather than a tuple of path name and a stream. It also adds additional tests to ensure that the DataPipe is functioning properly when it is read multiple times or reset half way through reading. The whole stack fixes #64281 - issues related to unclosed buffer stream. Stack: * __->__ #64788 * #64786 cc VitalyFedyunin ejguan Pull Request resolved: #64788 Reviewed By: jbschlosser, ejguan Differential Revision: D30901176 Pulled By: NivekT fbshipit-source-id: 59746a8d0144fc6d3ce0feb2d76445b82e6d414e

Summary: Pull Request resolved: #65477 Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D31115936 Pulled By: suo fbshipit-source-id: fb16911a683713fdc2393bfe7150fc29c7d6814f

…TACKTRACES=1" Summary: Original commit changeset: 9cfda47cafb3 Test Plan: unland Reviewed By: ezyang Differential Revision: D31116643 fbshipit-source-id: 631eea446ed48c63ca39281d24163a2eadbe8d12

Summary: Pull Request resolved: #65235 1. Updated the legacy type checks in `torch/csrc/autograd/engine.cpp` to individually validate the dtype, device, and layout equality for grad and tensor. 2. Removed device field from `InputMetadata` since it's already stored via storing options. Also, added `dtype()` and `layout()` methods to `InputMetadata`. To make this change, some calls had to be updated due to the change in constructor. 3. To fix #65016: a. Added a `is_tensor_subclass` field in `InputMetadata` to skip device checks for grad and tensor when the tensor has python key set on it (tensor subclass). Test Plan: Imported from OSS Reviewed By: jbschlosser Differential Revision: D31117318 Pulled By: anjali411 fbshipit-source-id: 825401df98695c48bf9b320be54585f6aff500bd

Summary: Fixes #65368. See discussion in the issue. cc mruberry SsnL jbschlosser soulitzer Pull Request resolved: #65415 Reviewed By: soulitzer Differential Revision: D31093303 Pulled By: albanD fbshipit-source-id: 621c74c7a2aceee95e3d3b708c7f1a1d59e59b93

Summary: Pull Request resolved: #65340 I thought about a few possible ways of doing this. The main hazard is that if I create a CPU tensor that doesn't have any real storage, the moment I actually try to access the data on the tensor I will segfault. So I don't want to use _make_subclass on a "cpu meta tensor" because the CPU meta tensor (with no subclass) is radioactive: printing it will immediately cause a segfault. So instead, I have to create the CPU meta tensor AND subclass all in one go, and that means I need another function for it. One downside to doing it this way is I need another overload for explicit strides, and in general it is difficult to get the view relationships to all work out properly; tracked at #65339 Fixes #62972 Fixes #62730 Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Reviewed By: albanD Differential Revision: D31057231 Pulled By: ezyang fbshipit-source-id: 73522769e093ae8a1bf0c7f7e594659bfb827b28

Summary: Related to #30987 and #33628. Fix the following tasks: - Remove the use of `.data` in all our internal code: - [x] `benchmarks/` - [x] `torch/utils/tensorboard/` cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 albanD gchanan Pull Request resolved: #65389 Reviewed By: soulitzer Differential Revision: D31093464 Pulled By: albanD fbshipit-source-id: 3a9c8834fd544a59a1cc2b930ae538fd1d46b232

Summary: Currently, the description of torch.any would be parsed like ``` param input the input tensor. ``` However, it should be ``` Tests if any element in input evaluates to True. ``` Pull Request resolved: #65310 Reviewed By: ezyang Differential Revision: D31102918 Pulled By: soulitzer fbshipit-source-id: 678ade20ba16ad2643639fbd2420c8b36fcd8bd7

…one map (#65380) Summary: Pull Request resolved: #65380 Fixing bugs that arise when running setup.py develop cc pietern mrshenli pritamdamania87 zhaojuanmao satgera rohan-varma gqchen aazzolini osalpekar jiayisuse SciPioneer H-Huang gcramer23 Test Plan: Imported from OSS Reviewed By: mrshenli Differential Revision: D31104844 Pulled By: jaceyca fbshipit-source-id: acfd4cf316c71177df758ca55b470f51a17f776b

Summary: Pull Request resolved: #65486 Adding this after observing jobs running for 6+ hours on `pytorch/pytorch-canary`, still trying to debug why they happen there but this should resovle jobs running forever Signed-off-by: Eli Uriegas <[email protected]> cc ezyang seemethere malfet pytorch/pytorch-dev-infra Test Plan: Imported from OSS Reviewed By: ezyang, malfet, janeyx99 Differential Revision: D31117497 Pulled By: seemethere fbshipit-source-id: 126a10e844bdef77c2852cc5c392e5f37f130f7e

Summary: Pull Request resolved: #63819 ghstack-source-id: 138521664 Test Plan: buck test mode/dev-nosan caffe2/torch/csrc/deploy:test_deploy_gpu buck test mode/opt-split-dwarf caffe2/torch/csrc/deploy:test_deploy_gpu Reviewed By: wconstab Differential Revision: D30499301 fbshipit-source-id: 0bc165b4ed5be28ebb0becc65f292cf26368692f

Summary: Reported by cloudhan in #64733 (comment) Fixes regression introduced by 047e682 cc malfet seemethere Pull Request resolved: #65444 Reviewed By: dagitses, seemethere Differential Revision: D31103260 Pulled By: malfet fbshipit-source-id: 9d5454a64cb8a0b96264119cf16582cc5afed284

Summary: Pull Request resolved: #64513 Proposal: #63041 Support custom buffer reduction in DDP via hook ghstack-source-id: 138655663 Test Plan: CI Reviewed By: SciPioneer Differential Revision: D30751152 fbshipit-source-id: 257a9d46bb178d8812d4ea5a4d9c6140b8a1791f

Summary: Pull Request resolved: #64514 sync_params is a misnomer since we don't actually synchroniz parameters. While removing this I realized `self._check_and_sync_module_buffers` does almost everything we need it to, so just refactored that and made DDP forward call into it. ghstack-source-id: 138684982 Test Plan: CI Reviewed By: zhaojuanmao Differential Revision: D30751231 fbshipit-source-id: add7c684f5c6c71dad9e9597c7759849fa74f47a

Summary: - Replace THCNumerics with `at::_isnan` - Replace `contiguous` with `expect_contiguous` - Don't use `contiguous` on output tensors. Instead skip the copy and just create a new empty tensor. Pull Request resolved: #65350 Reviewed By: ezyang Differential Revision: D31103501 Pulled By: ngimel fbshipit-source-id: 9030869e28d6c570fad074fd0502076de8e2ab09

Summary: Pull Request resolved: #65315 ghstack-source-id: 138703808 Test Plan: - OSS builds and BUCK builds - CircleCI Reviewed By: hanton Differential Revision: D31048011 fbshipit-source-id: 824a8e32d65de2caf25e41efef2b022ddbb63156

…65387) Summary: Pull Request resolved: #65387 Added a customized NNC implementation for signed log1p kernel and enabled the fusion pass that adds the fused signed log1p op. Also, added a SR microbenchmark for this kernel which shows the performance improvement. Without fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 1953 ns 1953 ns 358746 BM_signed_log1p/64 2049 ns 2049 ns 342145 BM_signed_log1p/512 3291 ns 3291 ns 214342 BM_signed_log1p/4096 15559 ns 15559 ns 44420 BM_signed_log1p/32768 101936 ns 101935 ns 6843 BM_signed_log1p/65536 194792 ns 194789 ns 3615 ``` With NNC fusion: ``` -------------------------------------------------------------------------------- Benchmark Time CPU Iterations -------------------------------------------------------------------------------- BM_signed_log1p/16 369 ns 369 ns 1896179 BM_signed_log1p/64 497 ns 497 ns 1406995 BM_signed_log1p/512 1618 ns 1618 ns 430209 BM_signed_log1p/4096 11327 ns 11326 ns 61463 BM_signed_log1p/32768 84099 ns 84086 ns 8325 BM_signed_log1p/65536 166531 ns 166510 ns 4186 ``` This clearly shows >15% improvement in performance of this kernel with NNC fusion. On inline_cvr local model, there is a small improvement in terms of profiled time spent on ops: without fusion: `0.9%` (computed by adding the % spent on all the 4 ops involved) with NNC fusion: `0.55%` Test Plan: `buck test mode/opt-clang //caffe2/benchmarks/static_runtime:static_runtime_cpptest -- SignedLog1p` Also, did the accuracy test with inline_cvr as described here, https://fb.quip.com/qmdDAJzEmPtf, on the full size model (285298536_1) ``` get 57220 prediction values get 57220 prediction values max_error: 0 total: 0 ``` Reviewed By: hlu1 Differential Revision: D30609492 fbshipit-source-id: d2e68df580569a30ee61abb0ef18d2c4c56827bd

Summary: Fixes #60397. I'm not sure how aliases are supposed to be implemented, but this is the most basic/direct way, IMO. As a side-effect, this implementation results in a "duplicate" doc entry, inheriting the one from `add_module`: ![monkey-patch](https://user-images.githubusercontent.com/7027770/133693137-8408d8e7-1f4f-436b-b176-57dda9bc3a32.png) An alternative implementation could be: ```python def register_module(self, name: str, module: Optional['Module']) -> None: r"""Alias for :func:`add_module`.""" self.add_module(name, module) ``` which results in this documentation: ![image](https://user-images.githubusercontent.com/7027770/133693249-d969a71a-be44-489d-9633-4f38b44ab887.png) Questions: 1. Should I replicate the tests? There are two for `add_module`: [test_add_module_raises_error_if_attr_exists](https://github.com/pytorch/pytorch/blob/873255c6d95342d144e9d1b633c16410844b934e/test/test_nn.py#L1420-L1434) and [test_add_module](https://github.com/pytorch/pytorch/blob/873255c6d95342d144e9d1b633c16410844b934e/test/test_nn.py#L1837-L1855). 2. This PR only adds `register_module` to `nn.Module`. There is an `add_module` in [`_RemoteModule`](https://github.com/pytorch/pytorch/blob/master/torch/distributed/nn/api/remote_module.py#L311-L312), which raises `NotSupported`, and there is another one in [`ConcreteModuleTypeBuilder`](https://github.com/pytorch/pytorch/blob/873255c6d95342d144e9d1b633c16410844b934e/torch/_C/__init__.pyi.in#L468), which means something else, I think. Should I do anything about them? cc ngimel SsnL Pull Request resolved: #65174 Reviewed By: soulitzer Differential Revision: D31089717 Pulled By: jbschlosser fbshipit-source-id: abd8d14a434fd8c7efa0bd8c242df56da33491e9

Summary: ## {emoji:1f41b} Bug 'CosineAnnealingWarmRestarts' object has no attribute 'T_cur'. In the Constructor of the CosineAnnealingWarmRestarts, we're calling the constructor of the Parent class (_LRScheduler) which inturn calls the step method of the CosineAnnealingWarmRestarts. The called method tries to update the object's attribute 'T_cur' which is not defined yet. So it raises the error. This only holds, when we give the value for last_epoch argument as 0 or greater than 0 to the 'CosineAnnealingWarmRestarts', while initializing the object. ![Bug_in_CosineAnnealingWarmRestarts](https://user-images.githubusercontent.com/77477328/132552212-70abc8b5-0357-4c35-90a9-832648bac607.png) ## To Reproduce Steps to reproduce the behavior: 1. Give the value for the last_epoch argument as zero OR 1. Give the value for the last_epoch argument as a Positive integer. ## Expected behavior I only expected the 'CosineAnnealingWarmRestarts' object to be initialized. ## Environment PyTorch version: 1.9.0+cpu Is debug build: False CUDA used to build PyTorch: None ROCM used to build PyTorch: N/A OS: Ubuntu 20.04.2 LTS (x86_64) GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0 Clang version: Could not collect CMake version: version 3.21.2 Libc version: glibc-2.31 Python version: 3.8.10 [GCC 9.4.0] (64-bit runtime) Python platform: Linux-5.8.0-59-generic-x86_64-with-glibc2.29 Is CUDA available: False CUDA runtime version: No CUDA ## Additional context We can able to solve this bug by moving the line 'self.T_cur = self.last_epoch' above the 'super(CosineAnnealingWarmRestarts,self).__init__()' line. Since we've initialized the "self.T_cur" to the object. Pull Request resolved: #64758 Reviewed By: ezyang Differential Revision: D31113694 Pulled By: jbschlosser fbshipit-source-id: 98c0e292291775895dc3566fda011f2d6696f721

Summary: Pull Request resolved: #65014 ghstack-source-id: 138656948 Test Plan: ``` (pytorch) [[email protected] ~/pytorch] python3 test/test_jit.py TestPeephole CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing ........s...................... ---------------------------------------------------------------------- Ran 31 tests in 0.393s OK (skipped=1) (pytorch) [[email protected] ~/pytorch] python3 test/test_jit.py TestPeephole.test_normalized_rsub CUDA not available, skipping tests monkeytype is not installed. Skipping tests for Profile-Directed Typing . ---------------------------------------------------------------------- Ran 1 test in 0.015s OK ``` Reviewed By: eellison Differential Revision: D30941389 fbshipit-source-id: 03f0416d99090845c9bfb1e5fcf771d5f1d7a050

Summary: Pull Request resolved: #65181 This PR changes `state_dict()` during sync to `named_parameters` and `named_buffers` explicitly. the underlying motivation is that, `state_dict()` doesn't necessarily equals to "params + buffers" for all cases, state_dict is used for checkpoint purpose mainly, and params/buffers are used for training, we might have cases that params/buffers be in different forms with state_dict (i.e. state_dict we might want to save in small pieces of tensors while in training we want to concat the tensors together for performance reasons). ghstack-source-id: 138701159 Test Plan: wait for ci Reviewed By: divchenko, rohan-varma Differential Revision: D31007085 fbshipit-source-id: 4e1c4fbc07110163fb9b09b043ef7b4b75150f18

Summary: Pull Request resolved: #65481 Previous we have `acc_ops.transpose` but after a recent diff `torch.transpose` is mapped to `acc_ops.permute`. Here we clean up the fx2trt unittest for transpose and add support for negative indices in permute. Reviewed By: wushirong Differential Revision: D31115280 fbshipit-source-id: 58e689e6dd14181aea5186f3bb5b8745a07d0e51

alanwaketan · 2021-09-23T21:20:21Z

The failures seem minor. Let's merge it.

This reverts commit ebc2ebc.

ezyang and others added 30 commits September 14, 2021 06:10

Revert D30711934: [pytorch][PR] Use RDS for build size tracking

09d221e

Test Plan: revert-hammer Differential Revision: D30711934 (1cd0252) Original commit changeset: 0af808ddf528 fbshipit-source-id: 6f67ed5cbaf333cc55729be2a23e385772e31b10

remove SkipInfo class (#64972)

d188204

Summary: per title Pull Request resolved: #64972 Reviewed By: mruberry Differential Revision: D30924598 Pulled By: ngimel fbshipit-source-id: 1ac1ec8fd50ca27e3cd36c12a588d334e7466899

Revert D30558877: Ported std/var to ReductionOpInfo and minimum/maxim…

36a0d97

…um to BinaryUfuncInfo Test Plan: revert-hammer Differential Revision: D30558877 (382e008) Original commit changeset: 3e62ff24a935 fbshipit-source-id: 3b9f03c1f43c6d5f2738ed139d0236f2ded78dbf

[dnnlowp] reduce num of test cases to avoid time out (#64935)

d6d286f

Summary: Pull Request resolved: #64935 As title Test Plan: CI Reviewed By: dskhudia Differential Revision: D30889157 fbshipit-source-id: 316c808806b084bd2e44c56e1cdb61adf2369a9d

add OpInfo for torch.nn.functional.dropout (#62315)

32c5da8

Summary: Addresses pytorch/functorch#78. Pull Request resolved: #62315 Reviewed By: mruberry Differential Revision: D30932765 Pulled By: zou3519 fbshipit-source-id: 481c67b59a966b4d640973d252b3e392d8db728e

suo and others added 21 commits September 22, 2021 10:15

ugh (#65477)

b3ec88f

Summary: Pull Request resolved: #65477 Test Plan: Imported from OSS Reviewed By: zhouzhuojie Differential Revision: D31115936 Pulled By: suo fbshipit-source-id: fb16911a683713fdc2393bfe7150fc29c7d6814f

Back out "Eagerly populate python_error::what() when TORCH_SHOW_CPP_S…

db4b68b

…TACKTRACES=1" Summary: Original commit changeset: 9cfda47cafb3 Test Plan: unland Reviewed By: ezyang Differential Revision: D31116643 fbshipit-source-id: 631eea446ed48c63ca39281d24163a2eadbe8d12

alanwaketan force-pushed the alanwaketan/ltc_merge_master branch 3 times, most recently from 9c16447 to b95e4ec Compare September 23, 2021 08:00

Merge branch 'master' into lazy_tensor_staging

5e0fbc9

alanwaketan force-pushed the alanwaketan/ltc_merge_master branch from b95e4ec to 5e0fbc9 Compare September 23, 2021 17:59

alanwaketan merged commit ebc2ebc into lazy_tensor_staging Sep 23, 2021

alanwaketan added a commit that referenced this pull request Sep 23, 2021

Revert "[LTC] Merge master (#65112)"

992cdf1

This reverts commit ebc2ebc.

alanwaketan deleted the alanwaketan/ltc_merge_master branch September 24, 2021 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[LTC] Merge master #65112

[LTC] Merge master #65112

Uh oh!

alanwaketan commented Sep 15, 2021 •

edited by pytorch-probot bot

Loading

Uh oh!

alanwaketan commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

[LTC] Merge master #65112

[LTC] Merge master #65112

Uh oh!

Conversation

alanwaketan commented Sep 15, 2021 • edited by pytorch-probot bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alanwaketan commented Sep 23, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

alanwaketan commented Sep 15, 2021 •

edited by pytorch-probot bot

Loading