Impl for Parameter Dict #40014

arktrail · 2020-06-15T03:04:22Z

Try to implement the nn::ParameterDict for the C++ front-end from #25883.

@yf225 please take a look and tell me if the PR has problems. I took the #36904 and #28652 as reference. I am hoping for any suggestions.

dr-ci · 2020-06-15T04:09:30Z

💊 CI failures summary and remediations

As of commit 2ee86b2 (more details on the Dr. CI page):

2/2 failures possibly* introduced in this PR
- 1/2 non-CircleCI failure(s)

1 failure not recognized by patterns:

Job	Step	Action
^{binary_linux_libtorch_3_7m_cpu_gcc5_4_cxx11-abi_shared-with-deps_build}	^{Spin Up Environment}	🔁 rerun

Extra GitHub checks: 1 failed

Failed: GitHub Actions - clang-tidy

This comment was automatically generated by Dr. CI (expand for details).

Follow this link to opt-out of these comments for your Pull Requests.

Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group.

See how this bot performed.

This comment has been revised 41 times.

arktrail · 2020-06-16T04:15:03Z

When implementing the ParameterDict, I refer to ModuleDict from #28652, in which the ModuleDict derived from Cloneable. It seems that the Cloneable is just for Module, so I guess when it comes to the Parameter part (both ParameterDict and ParameterList), I don't need to derive the Cloneable?

It is my first time to commit to PyTorch, so please correct me if my understanding is totally wrong, thank you guys so much!

glaringlee · 2020-06-17T21:47:48Z

@yyn19951228
Thanks for working on this. I think there are some misunderstanding here. Let me explain:
Overall, we treat the parameterDict/List like a Module.

ParameterDict should be inherited from Module class and clone-able (What you've done is right). The reason is that we want parameterDict be visible by all the Module functions, and clone-able. So parameterDict can be used as a normal dict, it can also be used as a Module if necessary. We don't have any use case though, but we should make it consistent with python impl.
You don't need to create a new OrderedDict in your ParameterDict class, instead, you should reuse the parameter OrderedDict within Module class, so all other functions in Module functions can see the parameterDict, and clone() works in that case.

Take a look at the python implementation here:
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/container.py#L485
https://github.com/pytorch/pytorch/blob/master/torch/nn/modules/module.py (as reference)
We have module c++ impl here and it support most of the functionalities in module.py
https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/include/torch/nn/module.h
https://github.com/pytorch/pytorch/blob/master/torch/csrc/api/src/nn/module.cpp

What you need to do, is to follow the parameterDict python impl translate it into C++. I think most of the function will be exactly the. same as python implementation. The update() api, you might need to overload it into different version. (Iterator, dict, etc). Let me know if you need any help.

glaringlee

See my comment above.

arktrail · 2020-06-17T22:37:30Z

@glaringlee Many thanks for reviewing my diff! I do have some misunderstanding about the implementations, let me rewrite these parts.

arktrail · 2020-06-19T06:43:35Z

@glaringlee I have updated my impl for parameterDict, please check if it meet the necessity. Also, I get lost for two places:

For the initialization function, I would like to override by taking the std::initializer_list like this

explicit ParameterDictImpl(
      std::initializer_list<std::pair<std::string, Tensor>> initializer_list) {
    parameters_.reserve(initializer_list.size());
    for (const auto& item : initializer_list) {
      insert(std::move(item.first), std::move(item.second));
    }
  }

But when I want to init by using

ParameterDict dict(
    {
      {"a", torch::tensor({1.0})}
    }
  );

it shows that

no matching constructor for initialization of 'torch::nn::ParameterDict'ccls(2)
pimpl.h(53, 18): candidate inherited constructor not viable: cannot convert initializer list argument to 'std::nullptr_t' (aka 'nullptr_t')
(162, 1): constructor from base class 'ModuleHolder<torch::nn::ParameterDictImpl>' inherited here
pimpl.h(70, 18): candidate inherited constructor not viable: cannot convert initializer list argument to 'std::shared_ptr<ParameterDictImpl>'
.....
no instance of constructor "torch::nn::ParameterDict::ParameterDict" matches the argument list -- argument types are: ({...})

May I ask what might be the proper way to implement this?

I use the container as the arguments for update(), may I ask does iterator also needed in such case?

Thank you very much! And sorry for these questions, they seem dummy maybe.

glaringlee

@yyn19951228 Thank you so much for working on this. I've made some comments. Please take a look. I pasted some links in the comments, so you know why the code should work like that. Let me know if you need any help, I got some env issue today, can not build libtorch for long time, sorry for replying late.

glaringlee · 2020-06-20T02:38:44Z

test/cpp/api/parameterdict.cpp

+      "(b): Parameter containing: [CPUFloatType of size [1]]\n"
+      "(c): Parameter containing: [CPUFloatType of size [1]]\n"
+      ")");
+}


pls add a new line to avoid lint error

glaringlee · 2020-06-20T02:39:00Z

torch/csrc/api/include/torch/nn/modules/container/parameterdict.h

+TORCH_MODULE(ParameterDict);
+
+} // namespace nn
+} // namespace torch


pls add a new line to avoid lint error

torch/csrc/api/include/torch/nn/module.h

glaringlee · 2020-06-20T02:58:05Z

torch/csrc/api/include/torch/nn/modules/container/parameterdict.h

+      torch::OrderedDict<std::string, torch::Tensor> params) {
+    parameters_.reserve(params.size());
+    for (const auto& item : params) {
+      insert(std::move(item.key()), std::move(item.value()));


I think here, you should insert directly into 'parameters_'.
Check the ordered_dict.h,

pytorch/torch/csrc/api/include/torch/ordered_dict.h

Line 13 in a203dc2

class OrderedDict {

maybe you can just do parameters_(params).
The reason is that, this is constructor, we should keep whatever is passed in. If passed in parameters don't need grad, then we don't have to set require_grad to true as default.

Compiler complains

torch::OrderedDict<std::__1::string, at::Tensor> torch::nn::Module::parameters_ The registered parameters of this `Module`. Inorder to access parameters_ in ParameterDict and ParameterList type 'OrderedDict<std::string, at::Tensor>' (aka 'OrderedDict<basic_string<char, char_traits<char>, allocator<char> >, at::Tensor>') does not provide a call operator call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type

if I use parameters_(params) here.
So can I use parameters_ = params here? I think OrderedDict implements the copy constructor.

pytorch/torch/csrc/api/include/torch/ordered_dict.h

Line 40 in f4365cf

OrderedDict& operator=(OrderedDict&& other) = default;

torch/csrc/api/include/torch/nn/modules/container/parameterdict.h

glaringlee · 2020-06-20T03:21:09Z

test/cpp/api/parameterdict.cpp

@@ -0,0 +1,114 @@
+#include <gtest/gtest.h>


Please modify the test if necessary after you finish modifying the parameterdict implementation.

glaringlee · 2020-06-20T03:29:36Z

Now answer your previous two questions.

You can not use initialize_list at this point. We actually use a ModuleHolder class to hold your implementation. So the initialize_list passed to ModuleHolder first and then it will be forwarded to the ParameterDict impl class. but unfortunately, we have no correct template to forward a single initialize_list at this point. see here.

pytorch/torch/csrc/api/include/torch/nn/pimpl.h

Line 57 in 63c3b89

template <
We don't need to support Container in iterator since the iterator is for ParameterDict itself, the only type is OrderedList (parameter_).

ParameterDict V3

glaringlee · 2020-06-22T18:17:31Z

@yyn19951228 stuck with another code review today, will review yours tomorrow if I can not make it today.

arktrail · 2020-06-23T16:13:32Z

@glaringlee Doesn't matter.

glaringlee

@yyn19951228 Sorry about delay. I added more comments, please take a look at. Thanks a lot.

torch/csrc/api/include/torch/nn/modules/container/parameterdict.h

glaringlee

@yyn19951228 some nit things, then we are good to go. Thanks a lot for your help!

torch/csrc/api/include/torch/nn/modules/container/parameterdict.h

arktrail · 2020-06-26T05:45:10Z

Hi @glaringlee , I have updated these comments! Please check if there is anything left, thank you very much!

Summary: Test was added a few months back in #36503 but recently became flaky for ROCm. CC ezyang xw285cornell sunway513 Pull Request resolved: #40577 Differential Revision: D22258196 Pulled By: ezyang fbshipit-source-id: 8a22b0c17b536b3d42d0382f7737df0f8823ba08

Summary: Pull Request resolved: #40534 Differential Revision: D22258874 Pulled By: seemethere fbshipit-source-id: 1954a22ed52e1a65caf89725ab1db9f40ff917b8

Summary: Switch windows CPU testers from `windows.xlarge` to `windows.medium` class. Remove VS 14.16 CUDA build Only do smoke force-on-cpu tests using VS2019+CUDA10.1 config. Pull Request resolved: #40592 Differential Revision: D22259351 Pulled By: malfet fbshipit-source-id: f934ff774dfc7d47f12c3da836ca314c12d92208

Summary: Pull Request resolved: #40584 Also patch [this github issue](#33124) involving an illegal assembly instruction in 8x8-dq-aarch64-neon.S. Test Plan: Build binaries, copy to shaker, run executables. Also run all existing caffe tests. Reviewed By: kimishpatel Differential Revision: D22240670 fbshipit-source-id: 51960266ce58699fe6830bcf75632b92a122f638

Summary: Pull Request resolved: #40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7

Summary: Pull Request resolved: #40614 This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x. The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine: ``` torch.jit.script def remote_fn(t: int): return t torch.jit.script def local_fn(): for _ in range(1_000_000): fut = rpc.rpc_async("rhs", remote_fn, (42,)) fut.wait() ``` And the average round-trip time (one iteration) is: - TensorPipe with SHM: 97.2 us - TensorPipe with UV _after the fix_: 205us - Gloo: 440us - TensorPipe with UV _before the fix_: 5ms Test Plan: Ran PyTorch RPC test suite Differential Revision: D22255393 fbshipit-source-id: 3f6825d03317d10313704c05a9280b3043920507

glaringlee · 2020-06-26T18:48:32Z

@yyn19951228
Hi, I think there are some problems with your rebase.
This PR should only have your changes, but it is now have 28 files changes :).
If you have problems on rebasing this correctly, you can open a new pr with the latest pytorch and put your code in.
I will close this one and approve your new PR.

Summary: Pull Request resolved: #40611 This commit removes a dead store in `transformWith` of exit_transforms.cpp. Test Plan: Continuous integration. Reviewed By: suo Differential Revision: D22254136 fbshipit-source-id: f68c4625f7be8ae29b3500303211b2299ce5d6f6

Summary: Pull Request resolved: #40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5

Summary: Pull Request resolved: #40632 Differential Revision: D22262316 Pulled By: malfet fbshipit-source-id: 3d525767bfbfc8e2497541849d85cabf0379a43b

Summary: Pull Request resolved: #36764 This allows bundling inputs that are large uniform buffers in channels-last memory format. Test Plan: Unit test. Differential Revision: D21142660 Pulled By: dreiss fbshipit-source-id: 31bbea6586d07c1fd0bcad4cb36ed2b8bb88a7e4

Summary: Pull Request resolved: #37055 Sometimes it's okay to bundle a large example input tensor with a model. Add a utility function to make it easy for users to do that *on purpose*. Test Plan: Unit test. Differential Revision: D22264239 Pulled By: dreiss fbshipit-source-id: 05c6422be1aa926cca850f994ff1ae83c0399119

Summary: This replicates the pattern of other "do for luck" commands. Prep change to add RocM to CircleCI. Pull Request resolved: #40631 Differential Revision: D22261707 Pulled By: malfet fbshipit-source-id: 3dadfa434deab866a8800715f3197e84169cf43e

Summary: Reland of #35594 Pull Request resolved: #40551 Reviewed By: ezyang Differential Revision: D22249831 Pulled By: ngimel fbshipit-source-id: b221b3c0a490ccaaabba50aa698a2490536e0917

…ion API (#39459) Summary: Pull Request resolved: #39459 Update to this PR: this code isn't going to fully solve #37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925) Test Plan: 1) unit tests Imported from OSS Reviewed By: ezyang Differential Revision: D22264889 fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47

Summary: Pull Request resolved: #40466 Extend row wise sparse Adagrad fusion op to FP16 (rounding-to-nearest) for PyTorch. Reviewed By: jianyuh Differential Revision: D22003571 fbshipit-source-id: e97e01745679a9f6e7b0f81ce5a6ebf4d4a1df41

Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for *DPER2*. In preparation for subsequent support for null flag features in *compute meta*. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. ## Overview Intern project plan to support adding dense flags for missing feature values instead of replacing with zero. Project plan : https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing ## Code paths: See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths. Test Plan: # A. DPER3 blob value inspection ## 1. Build local bento kernel in fbcode folder `buck build mode/dev-nosan //bento/kernels:bento_kernel_ads_ranking` ## 2. Use kernel `ads_ranking (local)` to print dense feature blob values n280239 ## 2.1 Try `default_dense_value = "0.0"` (default) ``` preproc_6/feature_preproc_6/dper_feature_processor_7/raw_input_proc_7/float_feature_sparse_to_dense_7/float_features [[0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [1. ] [1.7857143] [1.7777778] [1. ] [0. ] [0.5625 ] [0. ] [0. ] [0.8 ] [0. ] [1. ] [0.56 ] [0. ]] ``` ## 2.2 Try `default_dense_value = "123"` ``` preproc_2/feature_preproc_2/dper_feature_processor_3/raw_input_proc_3/float_feature_sparse_to_dense_3/float_features [[123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [ 1. ] [ 1.7857143] [ 1.7777778] [ 1. ] [123. ] [ 0.5625 ] [123. ] [123. ] [ 0.8 ] [123. ] [ 1. ] [ 0.56 ] [123. ]] ``` ## 2.3 Try `default_dense_value = float("nan")` ``` RuntimeError: [enforce fail at enforce_finite_op.h:40] std::isfinite(input_data[i]). Index 0 is not finite (e.g., NaN, Inf): -nan (Error from operator: input: "unary_4/logistic_regression_loss_4/average_loss_4/average_loss" name: "" type: "EnforceFinite" device_option { random_seed: 54 }) ``` which is expected due to nan input. # B. Unit test `buck test fblearner/flow/projects/dper/tests/preprocs:raw_feature_extractor_test` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5348024586274923/ {F241336814} Differential Revision: D21961595 fbshipit-source-id: 3dcb153b3c7f42f391584f5e7f52f3d9c76de31f

arktrail · 2020-06-27T01:31:49Z

Hi @glaringlee , sorry about that! I have created a new PR #40654, and could you please check that? Thank you very much!

glaringlee · 2020-06-27T02:25:05Z

Close this PR since it is hard to rebase in this PR.
@yyn19951228 opened a new PR #40654 with same change.

Impl for Parameter Dict

224af78

arktrail requested review from ebetica, goldsborough and yf225 as code owners June 15, 2020 03:04

glaringlee mentioned this pull request Jun 16, 2020

Python/C++ API Parity: torch.nn modules and functional #25883

Open

glaringlee self-requested a review June 17, 2020 18:52

glaringlee requested changes Jun 17, 2020

View reviewed changes

arktrail added 2 commits June 19, 2020 02:24

Impl for Parameter Dict

df1a496

Impl for ParameterDict V2

4301bd1

Impl for Parameter Dict

7ac3444

arktrail requested a review from glaringlee June 19, 2020 08:36

glaringlee requested changes Jun 20, 2020

View reviewed changes

Merge branch 'master' of https://github.com/yyn19951228/pytorch

e3faff5

ParameterDict V3

arktrail requested a review from glaringlee June 22, 2020 05:25

pytorchbot added the open source label Jun 22, 2020

zou3519 added the triaged This issue has been looked at a team member, and triaged and prioritized into an appropriate module label Jun 23, 2020

glaringlee requested changes Jun 24, 2020

View reviewed changes

arktrail requested a review from glaringlee June 25, 2020 05:41

glaringlee requested changes Jun 25, 2020

View reviewed changes

ParameterDict V5

2dcdb31

arktrail requested a review from glaringlee June 26, 2020 05:44

jeffdaily and others added 6 commits June 26, 2020 09:45

build docker image for CUDA11 (#40534)

1082211

Summary: Pull Request resolved: #40534 Differential Revision: D22258874 Pulled By: seemethere fbshipit-source-id: 1954a22ed52e1a65caf89725ab1db9f40ff917b8

properly skip legacy tests regardless of the default executor (#40381)

5036c94

Summary: Pull Request resolved: #40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7

Meghan Lele and others added 14 commits June 26, 2020 12:35

Back out "Revert D21581908: Move TensorOptions ops to c10" (#40595)

0494e0a

Summary: Pull Request resolved: #40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5

Add special rules to launch docker image with RocM (#40632)

edac323

Summary: Pull Request resolved: #40632 Differential Revision: D22262316 Pulled By: malfet fbshipit-source-id: 3d525767bfbfc8e2497541849d85cabf0379a43b

[Reland][doc] Add overflow notice for cuFFT on half precision (#40551)

24a8614

Summary: Reland of #35594 Pull Request resolved: #40551 Reviewed By: ezyang Differential Revision: D22249831 Pulled By: ngimel fbshipit-source-id: b221b3c0a490ccaaabba50aa698a2490536e0917

Impl for Parameter Dict

8f4788f

Impl for Parameter Dict

413716b

ParameterDict V5

dd681f8

fix rebase

2ee86b2

arktrail requested review from albanD, apaszke, mrshenli, pritamdamania87 and zhaojuanmao as code owners June 27, 2020 00:53

glaringlee closed this Jun 27, 2020

glaringlee mentioned this pull request Jun 27, 2020

Python/C++ API Parity: Add impl and tests for ParameterDict #40654

Closed

Impl for Parameter Dict #40014

Impl for Parameter Dict #40014

Uh oh!

Conversation

arktrail commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

dr-ci bot commented Jun 15, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

💊 CI failures summary and remediations

1 failure not recognized by patterns:

Extra GitHub checks: 1 failed

Uh oh!

arktrail commented Jun 16, 2020

Uh oh!

glaringlee commented Jun 17, 2020

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

arktrail commented Jun 17, 2020

Uh oh!

arktrail commented Jun 19, 2020

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

glaringlee Jun 20, 2020

Choose a reason for hiding this comment

Uh oh!

glaringlee Jun 20, 2020

Choose a reason for hiding this comment

Uh oh!

Uh oh!

glaringlee Jun 20, 2020

Choose a reason for hiding this comment

Uh oh!

arktrail Jun 21, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee Jun 20, 2020

Choose a reason for hiding this comment

Uh oh!

glaringlee commented Jun 20, 2020

Uh oh!

glaringlee commented Jun 22, 2020

Uh oh!

arktrail commented Jun 23, 2020

Uh oh!

glaringlee left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

glaringlee left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

arktrail commented Jun 26, 2020

Uh oh!

glaringlee commented Jun 26, 2020

arktrail commented Jun 15, 2020 •

edited

Loading

dr-ci bot commented Jun 15, 2020 •

edited

Loading

arktrail Jun 21, 2020 •

edited

Loading

glaringlee left a comment •

edited

Loading