Deliver a list of Featurizers by yuslepukhin · Pull Request #2958 · microsoft/onnxruntime

yuslepukhin · 2020-01-31T04:23:00Z

Description: Provide ORT kernels for the following:
CountVectorizer, Tfidfvectorizer left out pending decision.

• FromStringFeaturizer
• L1NormalizeFeaturizer
• L2NormalizeFeaturizer
• MaxNormalizeFeaturizer
• MeanImputerFeaturizer
• MedianImputerFeaturizer
• MinMaxImputerFeaturizer
• ModeImputerFeaturizer
• NumericalizeFeaturizer
• PCAFeaturizer
• StandardScaleWrapperFeaturizer
• TruncatedSVDFeaturizer

Add StandardScaleWrapperTransformer kernel and tests. Add Numericalize Featurizer. Bring kernel implementation for MeanImputer, MedianImputer, MinMaxInputer and ModeImputer. Add MinMaxImputerTransformer Add mean_imputer_transformer_test Add MedianImputerTest. Add ModeImputerTest.

Add FromStringTransformer Add NormalizeFeaturizer and tests. First verson of PCA and Truncated SVD implementation. TODO: CountVectorizer, TfidfVectorizer

with their tests.

…izer tests due to linkage errors.

…error.

pranavsharma · 2020-02-01T00:28:55Z

onnxruntime/featurizers_ops/cpu/mode_imputer_transformer.cc

+namespace onnxruntime {
+namespace featurizers {
+
+template <typename T>


Do these structs need to be repeated in every source file? #Resolved

This is a part of a generated code. They are per featurizer so they may be different.

In reply to: 373739890 [](ancestors = 373739890)

pranavsharma · 2020-02-01T00:31:50Z

onnxruntime/featurizers_ops/cpu/mode_imputer_transformer.cc

+  }
+};
+
+class ModeImputerTransformer final : public OpKernel {


Do you want to allow copy/move operations for this class? #Resolved

The class is stateless and kernels are never copied. Not sure I understood the question.

In reply to: 373740339 [](ancestors = 373740339)

pranavsharma · 2020-02-01T00:42:11Z

onnxruntime/featurizers_ops/cpu/truncated_svd_transformer.cc

+    Eigen::Map<MatrixT> output_matrix(output_data, dim_0, dim_1);
+
+    std::function<void(MatrixT val)> callback;
+    callback = [&output_matrix](MatrixT val) {


Is there any disadvantage in assigning this lambda in the previous line itself? #Resolved

The previous line would not be an assignment. It would require a constuctor which std::function does not have.

In reply to: 373741707 [](ancestors = 373741707)

Oh may be the constructor doesn't exist in the compiler version we're using? For reference https://en.cppreference.com/w/cpp/utility/functional/function/function #Resolved

Will play safe for older compilers.

In reply to: 373744508 [](ancestors = 373744508,373741707)

pranavsharma · 2020-02-01T00:43:23Z

onnxruntime/test/featurizers_ops/mean_imputer_transformer_test.cc

+
+namespace {
+template <typename T>
+std::vector<uint8_t> GetStream(const std::vector<typename NS::Traits<T>::nullable_type>& trainingBatches, size_t colIndex) {


naming convention of colIndex is off. #Pending

Corrected everywhere

In reply to: 373741877 [](ancestors = 373741877)

pranavsharma · 2020-02-01T00:48:34Z

onnxruntime/core/graph/featurizers_ops/featurizers_defs.cc

-          }
-      );
+void RegisterFromStringFeaturizerVer1() {
+  static const char* doc = R"DOC(


I would try to avoid such static strings for op documentation as they occupy space in the binary. They're not required when running the model. May be we can disable them behind a macro like ONNX (__ONNX_NO_DOC_STRINGS). #Resolved

Shall we omit SetDoc call?

In reply to: 373742554 [](ancestors = 373742554)

Unfortunately SetDoc only reduces the memory footprint of the opschema object. The static string is still stuck in the binary occupying space. The ONNX schema is filled with many such examples and so is our contrib ops. #Resolved

I will both Omit SetDoc and make strings comments

In reply to: 373746248 [](ancestors = 373746248)

unkwnon option.

snnn · 2020-02-03T04:42:06Z

cmake/onnxruntime_providers.cmake


 if (onnxruntime_USE_FEATURIZERS)
-  if(NOT MSVC)
-    set_source_files_properties(${onnxruntime_cpu_featurizers_cc_srcs} PROPERTIES COMPILE_FLAGS "-Wno-unknown-warning")


You may add the following line to: https://github.com/microsoft/onnxruntime/blob/master/cmake/CMakeLists.txt#L502

check_cxx_compiler_flag(-Wno-unknown-warning HAS_NO_UNKNOWN_WARNING)

Then at here you can use:

if(HAS_NO_UNKNOWN_WARNING) set_source_files_properties(${onnxruntime_cpu_featurizers_cc_srcs} PROPERTIES COMPILE_FLAGS "-Wno-unknown-warning")

…sioning.

snnn · 2020-02-04T22:08:45Z

cmake/CMakeLists.txt

@@ -1,3 +1,4 @@
+


Better to exclude this file from this PR

snnn · 2020-02-04T22:11:53Z

Maybe it is a bit out of topic, what is L1NormalizeFeaturizer? Based on what I know, L1/L2 norm is only used in training, not inferencing.

Merge up to commit 4f4f4bc There were several very large pull requests in public master: #2956 #2958 #2961 **BERT-Large, FP16, seq=128:** Batch = 66 Throughput = 189.049 ex/sec **BERT-Large, FP16, seq=512:** Batch = 10 Throughput = 36.6335 ex/sec **BERT-Large, FP32, seq=128:** Batch = 33 Throughput = 42.2642 ex/sec **BERT-Large, FP32, seq=512:** Batch = 5 Throughput = 9.32792 ex/sec **BERT-Large LAMB convergence:** ![image.png](https://aiinfra.visualstudio.com/530acbc4-21bc-487d-8cd8-348ff451d2ff/_apis/git/repositories/adc1028e-6f04-44b7-a3cf-cb157be4fb65/pullRequests/5567/attachments/image.png) `$ python watch_experiment.py --subscription='4aaa645c-5ae2-4ae9-a17a-84b9023bc56a' --resource_group='onnxtraining' --workspace='onnxtraining' --remote_dir='logs/tensorboard/' --local_dir='D:/tensorboard/bert-large/fp16/lamb/seq128/lr3e-3/wr0.2843/master/' --run='BERT-ONNX_1581120364_71872cef'` **E2E**: PASSED https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=117300&view=results

yuslepukhin added 9 commits January 23, 2020 10:12

Add new Featurizers

884dacb

Add FromStringTransformer Add NormalizeFeaturizer and tests. First verson of PCA and Truncated SVD implementation. TODO: CountVectorizer, TfidfVectorizer

Merge branch 'master' into yuslepukhin/std_scale_wrapper

c2e75a3

Add CountVectorizer and TfidfVecotorizer trasnformers kernels along

6fdd947

with their tests.

Fix TruncatedSVD test and comment out CountVectorizer and TfidfVector…

c1aa65c

…izer tests due to linkage errors.

Merge branch 'master' into yuslepukhin/std_scale_wrapper

016f371

Advance commit, link to components code as well.

b89bcde

Uncomment tests.

448831c

Remove countervectorizer and tidfvectorizer. Disable unknown-warning …

19f68fb

…error.

yuslepukhin requested review from pranavsharma and snnn February 1, 2020 00:17

yuslepukhin marked this pull request as ready for review February 1, 2020 00:24

yuslepukhin requested a review from a team as a code owner February 1, 2020 00:24

pranavsharma reviewed Feb 1, 2020

View reviewed changes

Address review comments.

1dbe300

pranavsharma previously approved these changes Feb 1, 2020

View reviewed changes

Remove optiona suppression as we moved inclusion hight enough

2cb2119

yuslepukhin dismissed pranavsharma’s stale review via 2cb2119 February 2, 2020 03:34

yuslepukhin added 2 commits February 2, 2020 19:52

Remove source file properties as MacOS complains about

c61b563

unkwnon option.

Merge branch 'master' into yuslepukhin/std_scale_wrapper

7c1a785

snnn reviewed Feb 3, 2020

View reviewed changes

yuslepukhin added 2 commits February 4, 2020 10:32

Get rid of constant States. Generate one on the fly with a proper ver…

c3677e9

…sioning.

Absorb strings.h fix. Adjust kernels and tests for Scalers name changes.

21d5d90

snnn previously approved these changes Feb 4, 2020

View reviewed changes

yuslepukhin added 2 commits February 4, 2020 14:01

Merge branch 'master' into yuslepukhin/std_scale_wrapper

fc7d100

Fix merge issues.

753d925

yuslepukhin dismissed snnn’s stale review via 753d925 February 4, 2020 22:06

snnn reviewed Feb 4, 2020

View reviewed changes

cmake/CMakeLists.txt

@@ -1,3 +1,4 @@

Copy link

Contributor

snnn Feb 4, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better to exclude this file from this PR

pranavsharma approved these changes Feb 4, 2020

View reviewed changes

yuslepukhin merged commit c86db80 into master Feb 4, 2020

yuslepukhin deleted the yuslepukhin/std_scale_wrapper branch February 4, 2020 23:43

Conversation

yuslepukhin commented Jan 31, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pranavsharma Feb 1, 2020 • edited by yuslepukhin Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

snnn commented Feb 4, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yuslepukhin commented Jan 31, 2020 •

edited

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading

pranavsharma Feb 1, 2020 •

edited by yuslepukhin

Loading