Conversation
Add StandardScaleWrapperTransformer kernel and tests. Add Numericalize Featurizer. Bring kernel implementation for MeanImputer, MedianImputer, MinMaxInputer and ModeImputer. Add MinMaxImputerTransformer Add mean_imputer_transformer_test Add MedianImputerTest. Add ModeImputerTest.
Add FromStringTransformer Add NormalizeFeaturizer and tests. First verson of PCA and Truncated SVD implementation. TODO: CountVectorizer, TfidfVectorizer
with their tests.
…izer tests due to linkage errors.
| namespace onnxruntime { | ||
| namespace featurizers { | ||
|
|
||
| template <typename T> |
There was a problem hiding this comment.
Do these structs need to be repeated in every source file? #Resolved
There was a problem hiding this comment.
This is a part of a generated code. They are per featurizer so they may be different.
In reply to: 373739890 [](ancestors = 373739890)
| } | ||
| }; | ||
|
|
||
| class ModeImputerTransformer final : public OpKernel { |
There was a problem hiding this comment.
Do you want to allow copy/move operations for this class? #Resolved
There was a problem hiding this comment.
The class is stateless and kernels are never copied. Not sure I understood the question.
In reply to: 373740339 [](ancestors = 373740339)
| Eigen::Map<MatrixT> output_matrix(output_data, dim_0, dim_1); | ||
|
|
||
| std::function<void(MatrixT val)> callback; | ||
| callback = [&output_matrix](MatrixT val) { |
There was a problem hiding this comment.
Is there any disadvantage in assigning this lambda in the previous line itself? #Resolved
There was a problem hiding this comment.
The previous line would not be an assignment. It would require a constuctor which std::function does not have.
In reply to: 373741707 [](ancestors = 373741707)
There was a problem hiding this comment.
Oh may be the constructor doesn't exist in the compiler version we're using? For reference https://en.cppreference.com/w/cpp/utility/functional/function/function #Resolved
There was a problem hiding this comment.
|
|
||
| namespace { | ||
| template <typename T> | ||
| std::vector<uint8_t> GetStream(const std::vector<typename NS::Traits<T>::nullable_type>& trainingBatches, size_t colIndex) { |
There was a problem hiding this comment.
naming convention of colIndex is off. #Pending
There was a problem hiding this comment.
| } | ||
| ); | ||
| void RegisterFromStringFeaturizerVer1() { | ||
| static const char* doc = R"DOC( |
There was a problem hiding this comment.
I would try to avoid such static strings for op documentation as they occupy space in the binary. They're not required when running the model. May be we can disable them behind a macro like ONNX (__ONNX_NO_DOC_STRINGS). #Resolved
There was a problem hiding this comment.
There was a problem hiding this comment.
Unfortunately SetDoc only reduces the memory footprint of the opschema object. The static string is still stuck in the binary occupying space. The ONNX schema is filled with many such examples and so is our contrib ops. #Resolved
There was a problem hiding this comment.
cmake/onnxruntime_providers.cmake
Outdated
|
|
||
| if (onnxruntime_USE_FEATURIZERS) | ||
| if(NOT MSVC) | ||
| set_source_files_properties(${onnxruntime_cpu_featurizers_cc_srcs} PROPERTIES COMPILE_FLAGS "-Wno-unknown-warning") |
There was a problem hiding this comment.
You may add the following line to: https://github.com/microsoft/onnxruntime/blob/master/cmake/CMakeLists.txt#L502
check_cxx_compiler_flag(-Wno-unknown-warning HAS_NO_UNKNOWN_WARNING)
Then at here you can use:
if(HAS_NO_UNKNOWN_WARNING)
set_source_files_properties(${onnxruntime_cpu_featurizers_cc_srcs} PROPERTIES COMPILE_FLAGS "-Wno-unknown-warning")
| @@ -1,3 +1,4 @@ | |||
|
|
|||
There was a problem hiding this comment.
Better to exclude this file from this PR
|
Maybe it is a bit out of topic, what is L1NormalizeFeaturizer? Based on what I know, L1/L2 norm is only used in training, not inferencing. |
Merge up to commit 4f4f4bc There were several very large pull requests in public master: #2956 #2958 #2961 **BERT-Large, FP16, seq=128:** Batch = 66 Throughput = 189.049 ex/sec **BERT-Large, FP16, seq=512:** Batch = 10 Throughput = 36.6335 ex/sec **BERT-Large, FP32, seq=128:** Batch = 33 Throughput = 42.2642 ex/sec **BERT-Large, FP32, seq=512:** Batch = 5 Throughput = 9.32792 ex/sec **BERT-Large LAMB convergence:**  `$ python watch_experiment.py --subscription='4aaa645c-5ae2-4ae9-a17a-84b9023bc56a' --resource_group='onnxtraining' --workspace='onnxtraining' --remote_dir='logs/tensorboard/' --local_dir='D:/tensorboard/bert-large/fp16/lamb/seq128/lr3e-3/wr0.2843/master/' --run='BERT-ONNX_1581120364_71872cef'` **E2E**: PASSED https://aiinfra.visualstudio.com/Lotus/_build/results?buildId=117300&view=results
Description: Provide ORT kernels for the following:
CountVectorizer, Tfidfvectorizer left out pending decision.
• FromStringFeaturizer
• L1NormalizeFeaturizer
• L2NormalizeFeaturizer
• MaxNormalizeFeaturizer
• MeanImputerFeaturizer
• MedianImputerFeaturizer
• MinMaxImputerFeaturizer
• ModeImputerFeaturizer
• NumericalizeFeaturizer
• PCAFeaturizer
• StandardScaleWrapperFeaturizer
• TruncatedSVDFeaturizer