-
Notifications
You must be signed in to change notification settings - Fork 26.3k
Impl for Parameter Dict #40014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impl for Parameter Dict #40014
Conversation
💊 CI failures summary and remediationsAs of commit 2ee86b2 (more details on the Dr. CI page):
1 failure not recognized by patterns:
Extra GitHub checks: 1 failed
This comment was automatically generated by Dr. CI (expand for details).Follow this link to opt-out of these comments for your Pull Requests.Please report bugs/suggestions on the GitHub issue tracker or post in the (internal) Dr. CI Users group. This comment has been revised 41 times. |
|
When implementing the ParameterDict, I refer to ModuleDict from #28652, in which the ModuleDict derived from It is my first time to commit to PyTorch, so please correct me if my understanding is totally wrong, thank you guys so much! |
|
@yyn19951228
Take a look at the python implementation here: What you need to do, is to follow the parameterDict python impl translate it into C++. I think most of the function will be exactly the. same as python implementation. The update() api, you might need to overload it into different version. (Iterator, dict, etc). Let me know if you need any help. |
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See my comment above.
|
@glaringlee Many thanks for reviewing my diff! I do have some misunderstanding about the implementations, let me rewrite these parts. |
|
@glaringlee I have updated my impl for parameterDict, please check if it meet the necessity. Also, I get lost for two places:
But when I want to init by using it shows that May I ask what might be the proper way to implement this?
Thank you very much! And sorry for these questions, they seem dummy maybe. |
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyn19951228 Thank you so much for working on this. I've made some comments. Please take a look. I pasted some links in the comments, so you know why the code should work like that. Let me know if you need any help, I got some env issue today, can not build libtorch for long time, sorry for replying late.
test/cpp/api/parameterdict.cpp
Outdated
| "(b): Parameter containing: [CPUFloatType of size [1]]\n" | ||
| "(c): Parameter containing: [CPUFloatType of size [1]]\n" | ||
| ")"); | ||
| } No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a new line to avoid lint error
| TORCH_MODULE(ParameterDict); | ||
|
|
||
| } // namespace nn | ||
| } // namespace torch No newline at end of file |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pls add a new line to avoid lint error
| torch::OrderedDict<std::string, torch::Tensor> params) { | ||
| parameters_.reserve(params.size()); | ||
| for (const auto& item : params) { | ||
| insert(std::move(item.key()), std::move(item.value())); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think here, you should insert directly into 'parameters_'.
Check the ordered_dict.h,
| class OrderedDict { |
maybe you can just do parameters_(params).
The reason is that, this is constructor, we should keep whatever is passed in. If passed in parameters don't need grad, then we don't have to set require_grad to true as default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Compiler complains
torch::OrderedDict<std::__1::string, at::Tensor> torch::nn::Module::parameters_
The registered parameters of this `Module`.
Inorder to access parameters_ in ParameterDict and ParameterList
type 'OrderedDict<std::string, at::Tensor>' (aka 'OrderedDict<basic_string<char, char_traits<char>, allocator<char> >, at::Tensor>') does not provide a call operator
call of an object of a class type without appropriate operator() or conversion functions to pointer-to-function type
if I use parameters_(params) here.
So can I use parameters_ = params here? I think OrderedDict implements the copy constructor.
| OrderedDict& operator=(OrderedDict&& other) = default; |
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
| @@ -0,0 +1,114 @@ | |||
| #include <gtest/gtest.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please modify the test if necessary after you finish modifying the parameterdict implementation.
|
Now answer your previous two questions.
|
|
@yyn19951228 stuck with another code review today, will review yours tomorrow if I can not make it today. |
|
@glaringlee Doesn't matter. |
glaringlee
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyn19951228 Sorry about delay. I added more comments, please take a look at. Thanks a lot.
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yyn19951228 some nit things, then we are good to go. Thanks a lot for your help!
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
torch/csrc/api/include/torch/nn/modules/container/parameterdict.h
Outdated
Show resolved
Hide resolved
|
Hi @glaringlee , I have updated these comments! Please check if there is anything left, thank you very much! |
Summary: Pull Request resolved: #40534 Differential Revision: D22258874 Pulled By: seemethere fbshipit-source-id: 1954a22ed52e1a65caf89725ab1db9f40ff917b8
Summary: Switch windows CPU testers from `windows.xlarge` to `windows.medium` class. Remove VS 14.16 CUDA build Only do smoke force-on-cpu tests using VS2019+CUDA10.1 config. Pull Request resolved: #40592 Differential Revision: D22259351 Pulled By: malfet fbshipit-source-id: f934ff774dfc7d47f12c3da836ca314c12d92208
Summary: Pull Request resolved: #40584 Also patch [this github issue](#33124) involving an illegal assembly instruction in 8x8-dq-aarch64-neon.S. Test Plan: Build binaries, copy to shaker, run executables. Also run all existing caffe tests. Reviewed By: kimishpatel Differential Revision: D22240670 fbshipit-source-id: 51960266ce58699fe6830bcf75632b92a122f638
Summary: Pull Request resolved: #40381 Differential Revision: D22173938 Pulled By: Krovatkin fbshipit-source-id: 305fc4484977e828cc4cee6e053a1e1ab9f0d6c7
Summary: Pull Request resolved: #40614 This update pulls in a oneliner fix, which sets the TCP_NODELAY option on the TCP sockets of the UV transport. This leads to exceptional performance gains in terms of latency, with about a 25x improvement in one simple benchmark. This thus resolves a regression that TensorPipe had compared to the ProcessGroup agent and, in fact, ends up beating it by 2x. The benchmark I ran is this, with the two endpoints pinned to different cores of the same machine: ``` torch.jit.script def remote_fn(t: int): return t torch.jit.script def local_fn(): for _ in range(1_000_000): fut = rpc.rpc_async("rhs", remote_fn, (42,)) fut.wait() ``` And the average round-trip time (one iteration) is: - TensorPipe with SHM: 97.2 us - TensorPipe with UV _after the fix_: 205us - Gloo: 440us - TensorPipe with UV _before the fix_: 5ms Test Plan: Ran PyTorch RPC test suite Differential Revision: D22255393 fbshipit-source-id: 3f6825d03317d10313704c05a9280b3043920507
|
@yyn19951228 |
Summary: Pull Request resolved: #40611 This commit removes a dead store in `transformWith` of exit_transforms.cpp. Test Plan: Continuous integration. Reviewed By: suo Differential Revision: D22254136 fbshipit-source-id: f68c4625f7be8ae29b3500303211b2299ce5d6f6
Summary: Pull Request resolved: #40595 ghstack-source-id: 106691774 Test Plan: waitforsandcastle Differential Revision: D22247729 fbshipit-source-id: 14745588cae267c1e0cc51cd9541a9b8abb830e5
Summary: Pull Request resolved: #40632 Differential Revision: D22262316 Pulled By: malfet fbshipit-source-id: 3d525767bfbfc8e2497541849d85cabf0379a43b
Summary: Pull Request resolved: #36764 This allows bundling inputs that are large uniform buffers in channels-last memory format. Test Plan: Unit test. Differential Revision: D21142660 Pulled By: dreiss fbshipit-source-id: 31bbea6586d07c1fd0bcad4cb36ed2b8bb88a7e4
Summary: Pull Request resolved: #37055 Sometimes it's okay to bundle a large example input tensor with a model. Add a utility function to make it easy for users to do that *on purpose*. Test Plan: Unit test. Differential Revision: D22264239 Pulled By: dreiss fbshipit-source-id: 05c6422be1aa926cca850f994ff1ae83c0399119
Summary: This replicates the pattern of other "do for luck" commands. Prep change to add RocM to CircleCI. Pull Request resolved: #40631 Differential Revision: D22261707 Pulled By: malfet fbshipit-source-id: 3dadfa434deab866a8800715f3197e84169cf43e
…ion API (#39459) Summary: Pull Request resolved: #39459 Update to this PR: this code isn't going to fully solve #37010. The changes required for 37010 is more than this PR initially planned. Instead, this PR switches op registration of rng related tests to use the new API (similar to what was done in #36925) Test Plan: 1) unit tests Imported from OSS Reviewed By: ezyang Differential Revision: D22264889 fbshipit-source-id: 82488ac6e3b762a756818434e22c2a0f9cb9dd47
Summary: Pull Request resolved: #40466 Extend row wise sparse Adagrad fusion op to FP16 (rounding-to-nearest) for PyTorch. Reviewed By: jianyuh Differential Revision: D22003571 fbshipit-source-id: e97e01745679a9f6e7b0f81ce5a6ebf4d4a1df41
Summary: ## TLDR Support using NaN default value for missing dense features in RawInputProcessor for *DPER2*. In preparation for subsequent support for null flag features in *compute meta*. For train_eval this is already supported in DPER3 and we do not plan to support this in DPER2 train eval. ## Overview Intern project plan to support adding dense flags for missing feature values instead of replacing with zero. Project plan : https://docs.google.com/document/d/1OsPUTjpJycwxWLCue3Tnb1mx0uDC_2KKWvC1Rwpo2NI/edit?usp=sharing ## Code paths: See https://fb.quip.com/eFXUA0tbDmNw for the call stack for all affected code paths. Test Plan: # A. DPER3 blob value inspection ## 1. Build local bento kernel in fbcode folder `buck build mode/dev-nosan //bento/kernels:bento_kernel_ads_ranking` ## 2. Use kernel `ads_ranking (local)` to print dense feature blob values n280239 ## 2.1 Try `default_dense_value = "0.0"` (default) ``` preproc_6/feature_preproc_6/dper_feature_processor_7/raw_input_proc_7/float_feature_sparse_to_dense_7/float_features [[0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [0. ] [1. ] [1.7857143] [1.7777778] [1. ] [0. ] [0.5625 ] [0. ] [0. ] [0.8 ] [0. ] [1. ] [0.56 ] [0. ]] ``` ## 2.2 Try `default_dense_value = "123"` ``` preproc_2/feature_preproc_2/dper_feature_processor_3/raw_input_proc_3/float_feature_sparse_to_dense_3/float_features [[123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [123. ] [ 1. ] [ 1.7857143] [ 1.7777778] [ 1. ] [123. ] [ 0.5625 ] [123. ] [123. ] [ 0.8 ] [123. ] [ 1. ] [ 0.56 ] [123. ]] ``` ## 2.3 Try `default_dense_value = float("nan")` ``` RuntimeError: [enforce fail at enforce_finite_op.h:40] std::isfinite(input_data[i]). Index 0 is not finite (e.g., NaN, Inf): -nan (Error from operator: input: "unary_4/logistic_regression_loss_4/average_loss_4/average_loss" name: "" type: "EnforceFinite" device_option { random_seed: 54 }) ``` which is expected due to nan input. # B. Unit test `buck test fblearner/flow/projects/dper/tests/preprocs:raw_feature_extractor_test` https://www.internalfb.com/intern/testinfra/testconsole/testrun/5348024586274923/ {F241336814} Differential Revision: D21961595 fbshipit-source-id: 3dcb153b3c7f42f391584f5e7f52f3d9c76de31f
|
Hi @glaringlee , sorry about that! I have created a new PR #40654, and could you please check that? Thank you very much! |
|
Close this PR since it is hard to rebase in this PR. |
Try to implement the
nn::ParameterDictfor the C++ front-end from #25883.@yf225 please take a look and tell me if the PR has problems. I took the #36904 and #28652 as reference. I am hoping for any suggestions.