Skip to content

Comments

Support Slice#10

Merged
Honry merged 6 commits intostable-diffusionfrom
slice
May 18, 2023
Merged

Support Slice#10
Honry merged 6 commits intostable-diffusionfrom
slice

Conversation

@Honry
Copy link
Owner

@Honry Honry commented May 17, 2023

No description provided.

@Honry
Copy link
Owner Author

Honry commented May 17, 2023

Thanks @zesongw for contributing this op.

@fdwr, please help review it tomorrow. Thanks!

const logging::Logger& logger) const {
const auto& input_defs = node.InputDefs();
std::vector<int64_t> input_shape;
ORT_RETURN_IF_NOT(GetShape(*input_defs[0], input_shape, logger), "Cannot get shape");
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ORT_RETURN_IF_NOT(GetShape(*input_defs[0], input_shape, logger), "Cannot get shape");
ORT_RETURN_IF_NOT(GetShape(*input_defs[0], input_shape, logger), "Cannot get input shape");


emscripten::val inputs = model_builder.GetOperand(input_defs[0]->Name());
std::vector<int32_t> starts(rank, 0);
std::vector<int32_t> sizes(input_shape.begin(), input_shape.end());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't the build complain about integer 64->32 truncation? The other places I've seen use transform to downcast.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't see the complain in build.

Copy link

@fdwr fdwr May 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weird, maybe there are different build rules for the native code. 🤷‍♂️

Well I've seen this pattern frequently enough...

    std::transform(begin, end, output.begin, [](int64_t i) { return SafeInt<uint32_t>(i); });

...that it's probably worth adding a helper to downcast it. e.g. https://github.com/microsoft/onnxruntime/blob/5abaca9d6969f65998708abf6cbc7616036b1769/onnxruntime/core/providers/dml/OperatorAuthorHelper/OperatorHelper.cpp#L245-L254

DowncastDimensions(gsl::make_span(data, dimCount), /*out*/ outputDimensions);

data.clear();

// This is an optional input, return empty vector.
if (input_defs.size() <= input_idx)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional tensors can be indicated by an empty name too. e.g.:

    if (input_idx >= input_defs.size())
        return Status::OK();

    const auto& input_name = input_defs[input_idx]->Name();
    if (input_name.empty())
        return Status::OK();

Note starts and ends are not optional. So consider passing a boolean is-required parameter to the helper function . e.g. ORT_RETURN_IF_ERROR(CopyInputData(1, input_starts, true)); vs ORT_RETURN_IF_ERROR(CopyInputData(3, input_axes, false));

Btw, Bruce Dai Feng had a similar related function for reading tensors into a 1D array for Split. Maybe you can mostly reuse that here?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Btw, Bruce Dai Feng had a similar related function for reading tensors into a 1D array for Split. Maybe you can mostly reuse that here?

Umm, there's a bit difference that Slice accepts both Int32 and Int64 tensor, I will try to make it compatible.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No big deal if it's not used - just noticing some similar patterns and wanting to reduce code duplication.

We also have a little helper in the DML EP that might be useful for reference (though it reads into int32 instead of uint32): https://github.com/microsoft/onnxruntime/blob/5abaca9d6969f65998708abf6cbc7616036b1769/onnxruntime/core/providers/dml/OperatorAuthorHelper/OperatorHelper.cpp#L132-L174

return false;
}
} else if (data_type == ONNX_NAMESPACE::TensorProto_DataType_INT32) {
auto tensor_data = std::vector<int32_t>(reinterpret_cast<int32_t*>(unpacked_tensor.data()),
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to allocate a temporary vector if we're only using it to check all_of?

Does if (!std::all_of(unpacked_tensor.data(), unpacked_tensor.data() + unpacked_tensor.size(), [](int32_t i) { return i == 1; })) work?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could also use any_of, but both work, and I'm content either way:

if (std::any_of(tensor_data.begin(), tensor_data.end(), [](int32_t i) { return i != 1; }) {
}

}
}

const auto& starts_name = input_defs[1]->Name();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd double check input_defs.size() before accessing 1 and 2 (a malformed model could cause a read access violation).

Copy link

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, so fast. Thanks.

@Honry
Copy link
Owner Author

Honry commented May 18, 2023

@fdwr, thanks for your comments, PTAL again.

} else if constexpr (std::is_same<T, int32_t>::value) {
std::transform(array_data, array_data + rank,
std::back_inserter(array),
[](int64_t dim) -> int32_t { return SafeInt<int32_t>(dim); });
Copy link

@fdwr fdwr May 18, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[](int64_t dim) -> int32_t { return SafeInt<int32_t>(dim); });
[](int64_t dim) -> T { return SafeInt<T>(dim); });

And then this else branch can catch anything (currently there is no assert for the else branch in case another type T is used). Then we shouldn't need the second if.

Copy link

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some easy comments. No need for another review.

@Honry Honry merged commit 5deb825 into stable-diffusion May 18, 2023
Copy link

@fdwr fdwr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

Honry pushed a commit that referenced this pull request Aug 28, 2023
### Description
Release OrtEnv before main function returns. Before this change, OrtEnv
is deleted when C/C++ runtime destructs all global variables in ONNX
Runtime's core framework.
The callstack is like this:
```
  * frame #0: 0x00007fffee39f5a6 libonnxruntime.so.1.16.0`onnxruntime::Environment::~Environment(this=0x00007fffee39fbf2) at environment.h:20:7
    frame #1: 0x00007fffee39f614 libonnxruntime.so.1.16.0`std::default_delete<onnxruntime::Environment>::operator()(this=0x00007ffff4c30e50, __ptr=0x0000000005404b00) const at unique_ptr.h:85:2
    frame #2: 0x00007fffee39edca libonnxruntime.so.1.16.0`std::unique_ptr<onnxruntime::Environment, std::default_delete<onnxruntime::Environment>>::~unique_ptr(this=0x5404b00) at unique_ptr.h:361:17
    frame #3: 0x00007fffee39e2ab libonnxruntime.so.1.16.0`OrtEnv::~OrtEnv(this=0x00007ffff4c30e50) at ort_env.cc:43:1
    frame #4: 0x00007fffee39fa96 libonnxruntime.so.1.16.0`std::default_delete<OrtEnv>::operator()(this=0x00007fffefff8f78, __ptr=0x00007ffff4c30e50) const at unique_ptr.h:85:2
    frame #5: 0x00007fffee39f394 libonnxruntime.so.1.16.0`std::unique_ptr<OrtEnv, std::default_delete<OrtEnv>>::~unique_ptr(this=0x7ffff4c30e50) at unique_ptr.h:361:17
    frame #6: 0x00007ffff78574b5 libc.so.6`__run_exit_handlers + 261
    frame #7: 0x00007ffff7857630 libc.so.6`exit + 32
    frame #8: 0x00007ffff783feb7 libc.so.6`__libc_start_call_main + 135
    frame #9: 0x00007ffff783ff60 libc.so.6`__libc_start_main@@GLIBC_2.34 + 128
    frame #10: 0x0000000000abbdee node`_start + 46
```
After this change, OrtEnv will be deleted before the main function
returns and nodejs is still alive.
Honry pushed a commit that referenced this pull request Aug 7, 2024
… transient connection exceptions. (microsoft#21612)

### Description
Improve docker commands to make docker image layer caching works.
It can make docker building faster and more stable.
So far, A100 pool's system disk is too small to use docker cache.
We won't use pipeline cache for docker image and remove some legacy
code.

### Motivation and Context
There are often an exception of
```
64.58 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
286.4 curl: (92) HTTP/2 stream 0 was not closed cleanly: INTERNAL_ERROR (err 2)
```
Because Onnxruntime pipeline have been sending too many requests to
download Nodejs in docker building.
Which is the major reason of pipeline failing now

In fact, docker image layer caching never works.
We can always see the scrips are still running
```
#9 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#9 0.234 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /bin/sh: warning: setlocale: LC_ALL: cannot change locale (en_US.UTF-8)
#9 0.235 /tmp/scripts/install_centos.sh: line 1: !/bin/bash: No such file or directory
#9 0.235 ++ '[' '!' -f /etc/yum.repos.d/microsoft-prod.repo ']'
#9 0.236 +++ tr -dc 0-9.
#9 0.236 +++ cut -d . -f1
#9 0.238 ++ os_major_version=8
....
#9 60.41 + curl https://nodejs.org/dist/v18.17.1/node-v18.17.1-linux-x64.tar.gz -sSL --retry 5 --retry-delay 30 --create-dirs -o /tmp/src/node-v18.17.1-linux-x64.tar.gz --fail
#9 60.59 + return 0
...
```

This PR is improving the docker command to make image layer caching
work.
Thus, CI won't send so many redundant request of downloading NodeJS.
```
#9 [2/5] ADD scripts /tmp/scripts
#9 CACHED

#10 [3/5] RUN cd /tmp/scripts && /tmp/scripts/install_centos.sh && /tmp/scripts/install_deps.sh && rm -rf /tmp/scripts
#10 CACHED

#11 [4/5] RUN adduser --uid 1000 onnxruntimedev
#11 CACHED

#12 [5/5] WORKDIR /home/onnxruntimedev
#12 CACHED
```

###Reference
https://docs.docker.com/build/drivers/

---------

Co-authored-by: Yi Zhang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants