-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Impl for ParameterList #40850
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closed
Closed
Impl for ParameterList #40850
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
💊 CI failures summary and remediationsAs of commit 24952d5 (more details on the Dr. CI page): 💚 💚 Looks good so far! There are no failures yet. 💚 💚 This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 14 times. |
Summary: Pull Request resolved: #40830 Fixes #40725 Signed-off-by: Edward Z. Yang <[email protected]> Test Plan: Imported from OSS Differential Revision: D22323886 Pulled By: ezyang fbshipit-source-id: b8a61496923d9f086d4c201024748505ba783238
Summary: Pull Request resolved: #40578 Test Plan: Imported from OSS Differential Revision: D22239286 Pulled By: IvanKobzarev fbshipit-source-id: 7a4160b621af8cfcc3b3d9e6da1a75c8afefba27
Summary: `long long == int64_t != long` in MSVC Pull Request resolved: #40783 Differential Revision: D22328757 Pulled By: ezyang fbshipit-source-id: bc7301d6b0e7e00ee6d7ca8637e3fce7810b15e2
#40845) Summary: … file This prevents implementation of those functions(as lambdas) to be embedded as weak symbol into every shared library that includes this header. Combination of this and #40844 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: #40845 Differential Revision: D22334779 Pulled By: malfet fbshipit-source-id: 64706918fc2947350a58c0877f294b1b8b085455
…40844) Summary: If virtual function is implemented in header file, it's implementation will be included as a weak symbol to every shared library that includes this header along with all of it's dependencies. This was one of the reasons why size of libcaffe2_module_test_dynamic.so was 500Kb (AddRelatedBlobInfo implementation pulled a quarter of libprotobuf.a with it) Combination of this and #40845 reduces size of `libcaffe2_module_test_dynamic.so` from 500kb to 50Kb. Pull Request resolved: #40844 Differential Revision: D22334725 Pulled By: malfet fbshipit-source-id: 836a4cbb9f344355ddd2512667e77472546616c0
Summary: Pull Request resolved: #40709 Cast to kHalf and back to kFloat before the linear operator to mimic FP16 quant support Test Plan: python test/test_quantization.py test_convert_dynamic_fp16 Imported from OSS Differential Revision: D22335977 fbshipit-source-id: f964128ec733469672a1ed4cb0d757d0a6c22c3a
Summary: Pull Request resolved: #40710 Test Plan: python test/test_quantization.py Imported from OSS Differential Revision: D22335975 fbshipit-source-id: 5c176bb6b9c300e1beb83df972149dd5a400b854
…Destruction (#40585) Summary: Pull Request resolved: #40585 This PR aborts incomplete NCCL Communicators in the ProcessGroupNCCL destructor. This should prevent pending NCCL communicators from blocking other CUDA ops. ghstack-source-id: 106988073 Test Plan: Sandcastle/ OSS CI Differential Revision: D22244873 fbshipit-source-id: 4b4fe65e1bd875a50151870f8120498193d7535e
) Summary: Moving this to a file that can be source by downstream pytorch/xla. Pull Request resolved: #40828 Reviewed By: malfet Differential Revision: D22339513 Pulled By: ailzhang fbshipit-source-id: c43b18fa2b7e1e8bb6810a6a43bb7dccd4756238
Summary: Pull Request resolved: #40876 clang format reducer.cpp ghstack-source-id: 106980050 Test Plan: unit test Differential Revision: D22321422 fbshipit-source-id: 54afdff206504c7bbdf2e408928cc32068e15cdc
Summary: Pull Request resolved: #38257 It seems we're doing a runtime type check for custom classes on each operator call if the operator has custom class arguments. This does not have an effect on operators without custom class arguments, but this is a problem for operators with custom class arguments, for example operators taking a at::native::xnnpack::Conv2dOpContext argument. The long term solution would be to move those checks to op registration time instead of doing them at call time, but as an intermediate fix, we can at least make the check fast by - Using ska::flat_hash_map instead of std::unordered_map - Using std::type_index instead of std::string (i.e. avoid calling std::hash on a std::string) ghstack-source-id: 106805209 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D21507226 fbshipit-source-id: bd120d5574734be843c197673ea4222599fee7cb
Summary: Pull Request resolved: #40747 - ghstack-source-id: 106833603 Test Plan: waitforsandcastle Reviewed By: ezyang Differential Revision: D22299161 fbshipit-source-id: 6e34999b5f8244d9582e4978754039d340720ca8
Summary: Right now it is used to check whether `math.remainder` exists, which is the case for both Python-3.7 and 3.8 Pull Request resolved: #40868 Differential Revision: D22343454 Pulled By: malfet fbshipit-source-id: 6b6d4869705b64c4b952309120f92c04ac7e39fd
…ning-amenable mode (#40546) Summary: This PR introduces a warning when user tries to export the model to ONNX in training-amenable mode while constant folding is turned on. We want to warn against any unintentional use because constant folding may fold some parameters that may be intended to be trainable in the exported model. Pull Request resolved: #40546 Reviewed By: hl475 Differential Revision: D22310917 Pulled By: houseroad fbshipit-source-id: ba83b8e63af7c458b5ecca8ff2ee1c77e2064f90
Summary: Pull Request resolved: #40649 The original implementation of RemoveOpsByType is pretty buggy and does not remove all instances of the ops that should be removed. It's also quite complicated and hard to modify. I reimplemented it by first converting the graph to its SSA form. The algorithm is quite simple once the graph is in SSA form. It's very similar to constant propagation with a few modifications. The hardest part is to deal with the case of removing an op with the output being an output of the predict net, because that output has to be preserved. (Note: this ignores all push blocking failures!) Reviewed By: yinghai, dzhulgakov Differential Revision: D22220798 fbshipit-source-id: faf6ed5242f1e2f310125d964738c608c6c55c94
Summary: Pull Request resolved: #40885 `TracingState::setValue` associates a concrete IValue in the traced program with a `Value*` symbolic. Previously, the logic for how GenericDicts worked was special cased to only work for very simple cases and silently eat other cases. This PR generalizes the logic to reflect the same behavior as using dictionaries on input: whenever we encounter a dictionary in the system, we completely "burn in" all the keys into the graph, and then recursively call `setValue` on the associated value. This has the effect of requiring that any dictionary structure you are creating in a traced program be of fixed structure, similar to how any dictionary used as input must be static as well. Test Plan: Imported from OSS Differential Revision: D22342490 Pulled By: suo fbshipit-source-id: 93e610a4895d61d9b8b19c8d2aa4e6d57777eaf6
Summary: Move Storage class from __init__.pyi.in to types.py and make it a protocol, since this is not a real class Expose `PyTorchFileReader` and `PyTorchFileWriter` native classes Ignore function attributes, as there are yet no good way to type annotate those, see python/mypy#2087 Pull Request resolved: #40862 Differential Revision: D22344743 Pulled By: malfet fbshipit-source-id: 95cdb6f980ee79383960f306223e170c63df3232
…#40814) Summary: Pull Request resolved: #40814 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This prepares the stack by simplifying the TensorPipe fixture. A comment says that the TensorPipe fixture cannot subclass the generic fixture class as that would lead to a diamond class hierarchy which Python doesn't support (whereas in fact it does), and therefore it copies over two properties that are defined on the generic fixture. However, each class that uses the TensorPipe fixture also inherits from the generic fixture, so there's no need to redefine those properties. And, in fact, by not redefining it we save ourselves some trouble when the TensorPipe fixture would end up overriding another override. ghstack-source-id: 107045914 Test Plan: Sandcastle and CircleCI Differential Revision: D22287533 fbshipit-source-id: 254c38b36ba51c9d852562b166027abacbbd60ef
Summary: Pull Request resolved: #40815 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- This prepares the stack by aligning the `ddp_under_dist_autograd` test to the other ones, so that later changes will be more consistent and thus easier to follow. It does so by moving the `skipIf` decorators and the `setUp` methods from the base test suite to the entry point scripts. ghstack-source-id: 107045911 Test Plan: Sandcastle and CircleCI Differential Revision: D22287535 fbshipit-source-id: ab0c9eb774b21d81e0ebd3078df958dbb4bfa0c7
Summary: Pull Request resolved: #40913 Summary of the entire stack: -- This diff is part of an attempt to refactor the RPC tests. They currently suffer from several problems: - Several ways to specify the agent to use: there exists one "generic" fixture that uses the global variable TEST_CONFIG to look up the agent name, and is used for process group and Thrift, and then there are separate fixtures for the flaky agent and the TensorPipe one. - These two ways lead to having two separate decorators (`requires_process_group_agent` and `@_skip_if_tensorpipe_agent`) which must both be specified, making it unclear what the effect of each of them is and what happens if only one is given. - Thrift must override the TEST_CONFIG global variable before any other import (in order for the `requires_process_group_agent` decorator to work correctly) and for that it must use a "trap" file, which makes it even harder to track which agent is being used, and which is specific to Buck, and thus cannot be used in OSS by other agents. - Even if the TensorPipe fixture doesn't use TEST_CONFIG, it still needs to set it to the right value for other parts of the code to work. (This is done in `dist_init`). - There are a few functions in dist_utils.py that return some properties of the agent (e.g., a regexp to match against the error it returns in case of shutdown). These functions are effectively chained if/elses on the various agents, which has the effect of "leaking" some part of the Thrift agent into OSS. - Each test suite (RPC, dist autograd/dist optimizer, their JIT versions, remote module, ...) must be run on each agent (or almost; the faulty one is an exception) in both fork and spawn mode. Each of these combinations is a separate file, which leads to a proliferation of scripts. - There is no "master list" of what combinations make sense and should be run. Therefore it has happened that when adding new tests or new agents we forgot to enroll them into the right tests. (TensorPipe is still missing a few tests, it turns out). - All of these tiny "entry point" files contain almost the same duplicated boilerplate. This makes it very easy to get the wrong content into one of them due to a bad copy-paste. This refactoring aims to address these problems by: - Avoiding global state, defaults/override, traps, if/elses, ... and have a single way to specify the agent, based on an abstract base class and several concrete subclasses which can be "mixed in" to any test suite. - Instead of enabling/disabling tests using decorators, the tests that are specific to a certain agent are now in a separate class (which is a subclass of the "generic" test suite) so that they are only picked up by the agent they apply to. - Instead of having one separate entry point script for each combination, it uses one entry point for each agent, and in that script it provides a list of all the test suites it wants to run on that agent. And it does that by trying to deduplicate the boilerplate as much as possible. (In fact, the various agent-suite combinations could be grouped in any way, not necessarily by agent as I did here). It provides further advantages: - It puts all the agents on equal standing, by not having any of them be the default, making it thus easier to migrate from process group to TensorPipe. - It will make it easier to add more versions of the TensorPipe tests (e.g., one that disables the same-machine backends in order to test the TCP-based ones) without a further duplication of entry points, of boilerplate, ... Summary of this commit -- Once we start merging multiple test suites in a single file (which we'll happen in the next diffs in the stack) the OSX tests on CircleCI start failing due to "too many open files". This indicates a file descriptor leak. I then managed to repro it on Linux too by lowering the limit on open file descriptors (`ulimit -n 500`). Each test method that unittest runs is run on a new instance of the Testcase class. With our multiprocessing wrappers, this instance contains a list of child processes. Even after these processes are terminated, it appears they still hold some open file descriptor (for example a pipe to communicate with the subprocess). It also appears unittest is keeping these Testcase instances alive until the entire suite completes, which I suspect is what leads to this "leak" of file descriptors. Based on that guess, in this diff I am resetting the list of subprocesses during shutdown, and this seems to fix the problem. ghstack-source-id: 107045908 Test Plan: Sandcastle and CircleCI Differential Revision: D22356784 fbshipit-source-id: c93bb9db60fde72cae0b0c735a50c17e427580a6
Summary: Pull Request resolved: #40889 Test Plan: Imported from OSS Differential Revision: D22348266 fbshipit-source-id: eed2ece5c94fcfaf187d6770bed4a7109f0c0b4a
Summary: Pull Request resolved: #40652 Resolves #40304, but looking for feedback on whether there is a better approach for this. In order to profile `rpc_async` calls made within a torchscript function, we add the profiling logic to `rpcTorchscript` which is the point where the RPC is dispatched and is called by the jit `rpc_async` operator. We take a somewhat similar approach to how this is done in the python API. If profiling is enabled, we call `record_function_enter` which creates a `RecordFunction` object and runs its starting callbacks. Then, we schedule end callbacks for this `RecordFunction` to be run when the jit future completes. One caveat is that `rpcTorchscript` can also be called by rpc_async from a non-JIT function, in which case the profiling logic lives in Python. We add a check to ensure that we don't double profile in this case. ghstack-source-id: 107109485 Test Plan: Added relevant unittests. Differential Revision: D22270608 fbshipit-source-id: 9f62d1a2a27f9e05772d0bfba47842229f0c24e1
Summary: Pull Request resolved: #40931 Fix docstrings for dynamic quantized Linear/LSTM and associated classes ghstack-source-id: 107064446 Test Plan: Docs show up in correctly Differential Revision: D22360787 fbshipit-source-id: 8e357e081dc59ee42fd7f12ea5079ce5d0cc9df2
Summary: The doc for `torch.distributed.launch` is missing since v1.2.0 (see issue #36386) because PR #22501 added some imports at the first line. https://github.com/pytorch/pytorch/blob/542ac74987141c81b6326b9e7d5c6a1d00fc3701/torch/distributed/launch.py#L1-L5 I move it below the docstring to make the autodoc in Sphinx work normally. Pull Request resolved: #40963 Differential Revision: D22380816 Pulled By: mrshenli fbshipit-source-id: ee8406785b9a198bbf3fc65e589854379179496f
Summary: Pull Request resolved: #40926 Reviewed By: eellison Differential Revision: D22359471 Pulled By: Krovatkin fbshipit-source-id: 823e87674e2d2917f075255ec926e0485972f4e2
Summary: Pull Request resolved: #40866 Reviewed By: pbelevich Differential Revision: D22358988 Pulled By: Krovatkin fbshipit-source-id: 7118d7f8d4eaf056cfb71dc0d588d38b1dfb0fc7
Contributor
Author
|
Sorry I made a wrong rebase. I will abandon this diff and create a new one. |
Contributor
|
This one has rebase issue, #40987 is created instead. Close this one now. |
This was referenced Jul 9, 2020
facebook-github-bot
pushed a commit
that referenced
this pull request
Jul 13, 2020
Summary: This is a new PR for #40850, #40987 and #41206 unintentionally closed), as I have some issues for rebates for that one. Very sorry about that. And I have fixed the tests failed in that PR. This diff contains the implementation of C++ API for ParameterList from #25883. Refer to the Python API: https://github.com/pytorch/pytorch/blob/bc9e8af21875dafafe9bbd25c8f542b20b2e660f/torch/nn/modules/container.py#L376 Not sure about some naming difference between C++ API and Python API, like `append`, should it be called `push_back` Pull Request resolved: #41259 Test Plan: Add unit tests in this diff Differential Revision: D22495780 Pulled By: glaringlee fbshipit-source-id: 79ea3592db640f35477d445ecdaeafbdad814bec
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary:
This diff contains the implementation of C++ API for ParameterList from #25883.
Refer to the Python API:
pytorch/torch/nn/modules/container.py
Line 376 in bc9e8af
Not sure about some naming difference between C++ API and Python API, like
append, should it be calledpush_backTest Plan: Add unit tests in this diff
Reviewers: @glaringlee
Subscribers:
Tasks: #25883
Tags: