Conversation
…range for better profiling
### Description The current compare function to decide whether two elements in std::set<OrtDevice> in SessionState::ResolveMemoryPatternFlag() is not right, which caused mem_pattern_ different behavior between main and pe branch ### Motivation and Context This change is to make mem_pattern_ the same behavior between main and pe branch Co-authored-by: Lei Cao <[email protected]@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
|
This pull request introduces 18 alerts and fixes 6 when merging 709f424 into 1ab11a1 - view on LGTM.com new alerts:
fixed alerts:
|
…ame (#13334) replace vector with InlinedVector in ExecutionContext and ExecutionFrame ### Description replace vector with InlinedVector in ExecutionContext and ExecutionFrame ### Motivation and Context replace vector with InlinedVector in ExecutionContext and ExecutionFrame Co-authored-by: Lei Cao <[email protected]>
|
This pull request introduces 18 alerts and fixes 6 when merging 94c163b into e398241 - view on LGTM.com new alerts:
fixed alerts:
|
There is no need to use deviceId as parameter, we can get device id in the implementation of GetAllocator() Refers to: onnxruntime/core/framework/allocation_planner.cc:1899 in 94c163b. [](commit_id = 94c163b, deletion_comment = False) |
### Description use cudaStreamNonBlocking for perf improvement ### Motivation and Context use cudaStreamNonBlocking for perf improvement Co-authored-by: Lei Cao <[email protected]@orttrainingdev9.d32nl1ml4oruzj4qz3bqlggovf.px.internal.cloudapp.net>
pranavsharma
left a comment
There was a problem hiding this comment.
Still reviewing. In the meanwhile, can you please resolve the comments added by the bots as it's becoming quite inconvenient to review?
| static const char* const kOrtSessionOptionsConfigStrictShapeTypeInference = "session.strict_shape_type_inference"; | ||
|
|
||
| // The file saves configuration for partitioning node among logic streams | ||
| static const char* const kNodePartitionConfigFile = "session.node_partition_config_file"; |
There was a problem hiding this comment.
Is this required for this PR? If not, can we please separate these as it's experimental?
There was a problem hiding this comment.
it is not strongly required, but we need this to trigger the unit tests for the multi-stream in single graph feature. let me fix other comments first then go back to this one.
There was a problem hiding this comment.
i will keep this comments in the new PR: #13495
| constexpr size_t kAllocAlignment = 256; | ||
|
|
||
| class IAllocator; | ||
| std::function<void*(size_t)> GetAllocationFn(std::shared_ptr<IAllocator> allocator, bool use_reserve, Stream* stream, WaitNotificationFn wait_fn); |
There was a problem hiding this comment.
Why is the allocator aware of the stream and notification?
There was a problem hiding this comment.
This function is mainly used for GPU EP's GetScratch buffer, which need to aware which stream the scratch buffer is assigned to. it is a helper function to wrapper the stream aware allocation function, it won't impact other ep's usage.
|
This pull request introduces 18 alerts and fixes 6 when merging 72c3e6b into 928c988 - view on LGTM.com new alerts:
fixed alerts:
|
skottmckay
left a comment
There was a problem hiding this comment.
Initial comments. Have mainly looked at headers in the core code so far and still have a few of those to go before looking at implementation details.
onnxruntime/core/providers/tensorrt/tensorrt_execution_provider.h
Outdated
Show resolved
Hide resolved
pranavsharma
left a comment
There was a problem hiding this comment.
- Can you add unit tests for all the new constructs?
- What's the code coverage?
- Do we plan to exercise all the use cases this enables in CI?
- How do we plan to measure correctness? I don't see tests that do this.
- Debuggability: can you check all places where adding more debug statements will help?
| * CountDownBarrier is only for scenario that the v is set | ||
| * to the # of consumers and each consumer calls Dec() exactly once. | ||
| */ | ||
| class CountDownBarrier { |
There was a problem hiding this comment.
- Would waitable counter from nsync work instead of reinventing the wheel? https://github.com/google/nsync/blob/master/public/nsync_counter.h
- If we've to write this, would be nice to keep it in a general utility folder/header file. core/util
There was a problem hiding this comment.
yes nsync counter will work, but nsync only available on non-Windows platforms, right? This is a relatively simple implementation, so i'd prefer just to implement it instead of use nsync on linux but another implementation on windows.
I am ok to keep it in separate file if we believe it is useful, once we confirm there is no other choice, i can work on separate it.
There was a problem hiding this comment.
let's continue the discussion in the new PR: #13495
|
This pull request introduces 1 alert and fixes 6 when merging 9873a3e into 8b0669b - view on LGTM.com new alerts:
fixed alerts:
|
|
Since there are too many commits in this PR, we perform a squash and open a new PR at #13495 |
Description: This PR including following works:
Motivation and Context
currently, the execution plan is just a linear list of those primitives, ort will execute them step by step. For any given graph, ORT will serialize it to a fixed execution order. This sequential execution design simplifies most scenarios, but it has the following limitations:
This PR enhance our existing execution plan and executor to support multiple stream execution. we use an unified algorithm to mange both single stream and multiple stream scenarios.
This PR mainly focus on the infrastructure support for multiple stream execution, that is said, given a valid stream assignment, onnxruntime can execute it correctly. How to generate a good stream assignment for a given model will be in the future PR.