Skip to content

[Java] JNI refactor for ONNX Tensor#12281

Merged
yuslepukhin merged 11 commits intomicrosoft:masterfrom
Craigacp:jni-onnx-tensor
Aug 8, 2022
Merged

[Java] JNI refactor for ONNX Tensor#12281
yuslepukhin merged 11 commits intomicrosoft:masterfrom
Craigacp:jni-onnx-tensor

Conversation

@Craigacp
Copy link
Contributor

@Craigacp Craigacp commented Jul 22, 2022

Description:

Following on from #12013, this PR improves the error handling in the JNI code for OnnxTensor. OrtSession and the OrtJniUtil helper file are the remaining bits of unfixed JNI code.

Motivation and Context

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 3, 2022

@fs-eire @yuslepukhin please can I get a review for this PR?

@yuslepukhin
Copy link
Member

@fs-eire @yuslepukhin please can I get a review for this PR?

Will take a look

@yuslepukhin
Copy link
Member

yuslepukhin commented Aug 4, 2022

}

Still need to release OrtStatus


In reply to: 1205601429


In reply to: 1205601429


Refers to: java/src/main/native/OrtJniUtil.c:1074 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

@yuslepukhin
Copy link
Member

yuslepukhin commented Aug 4, 2022

OrtErrorCode checkOrtStatus(JNIEnv *jniEnv, const OrtApi * api, OrtStatus * status) {

Documentation question.
There is a number of checkOrtStatus calls in the implementation. Say the first one detects an error and then throws Java exception. However, the code continues to do other calls and other checkOrtStatus() call.
Is it the correct thing to do? Can java throw multiple exceptions at once?


In reply to: 1205761943


In reply to: 1205761943


Refers to: java/src/main/native/OrtJniUtil.c:1063 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

}

// Assign the strings into the Tensor
checkOrtStatus(jniEnv, api, api->FillStringTensor(ortValue, strings, length));
Copy link
Member

@yuslepukhin yuslepukhin Aug 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

checkOrtStatus

What if this has an error, what happens if the next checkOrtStatus also errors out? Nested exceptions or just the last one?
Should we create a chain of exceptions? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It returns the first exception thrown rather than any later ones.

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 4, 2022

OrtErrorCode checkOrtStatus(JNIEnv *jniEnv, const OrtApi * api, OrtStatus * status) {

Documentation question. There is a number of checkOrtStatus calls in the implementation. Say the first one detects an error and then throws Java exception. However, the code continues to do other calls and other checkOrtStatus() call. Is it the correct thing to do? Can java throw multiple exceptions at once?

Refers to: java/src/main/native/OrtJniUtil.c:1063 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

Java cannot throw multiple exceptions at once, the remainder are discarded. The goal of this refactor is that no other checkOrtStatus calls happen after an exception has been thrown on the Java side, so if there still is one (in this code, I've not finished the OrtSession or OrtJniUtil files) then that's something that needs fixing.

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 4, 2022

I've fixed all the things discussed.

@yuslepukhin
Copy link
Member

We are getting troubles from cpplinter. Can you, please, add this change to your PR?
#12094

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

@fs-eire
Copy link
Contributor

fs-eire commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

if (code == ORT_OK) {
// Create the buffers for the Java strings
const char** strings = NULL;
code = checkOrtStatus(jniEnv, api, api->AllocatorAlloc(allocator, sizeof(char*) * length, (void**)&strings));
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason to use ORT Allocator here. One can just use malloc or whatever. It would result in a simpler code.

The data is internally copied anyway because C++ code stores it in std::string objects.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been moderately consistently using ORT's allocator as otherwise there are three different memory allocators in use (Java's, malloc and ORT's allocators). I can replace the use of ORT's allocators with malloc if you want, but I'd assumed it was ok as ORT uses it's own allocators to allocate strings in a bunch of the C API methods, rather than malloc and requiring users to free that memory.

Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not mean to replace all of the cases. Sometimes, the API requires it to be allocated with the ORT allocator. But in this case, it is just a piece of temporary memory which could be on the stack if we could do it.

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

return NAN;
(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
const OrtApi* api = (const OrtApi*) apiHandle;
if (onnxType == 9) {
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this be enum from ORT API or Java? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is in Java, but these numbers are only in there. As far as I can tell the C enums don't expose the ordinal and I didn't want to have yet another copy of the enum I'd need to keep in sync.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

C enums are ints with values in the order or declaration. But they can be assigned any value you wish.

jfloat floatVal = convertHalfToFloat(*arr);
return floatVal;
}
} else if (onnxType == 10) {
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

10

enum #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
const OrtApi* api = (const OrtApi*) apiHandle;
if (onnxType == 1) {
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1

enum.
In this case, the code seems to be completely identical.
We shold combine these two cases with uint8_t. The type is casted away anyway. #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

return floatVal;
}
} else if (onnxType == 10) {
jfloat* arr = NULL;
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

jfloat

Is jfloat always 32 bits? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. All the java types are defined to be a fixed size across platforms.

(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
const OrtApi* api = (const OrtApi*) apiHandle;
if (onnxType == 3) {
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if (onnxType == 3) {

Same comments as above.
enum + combine the code.
What worries me here, is that in Java a positve uint16_t can become negative.
What would be the best way dealing with it?
I would say we need to return unsigned integers in larger ones. uint16_t needs to be returned in a 32-bit signed.
This may warrant restructuring the API. For example:

We can group types together in a different manner.
uint8_t + int16_t -> jshort
uint16_t + int32_t -> jint
uint32_t + int64_t -> jlong
uint64_t + ????? Some big Java type? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion on that is that people who get unsigned values back in Java should deal with that themselves. The bit pattern is correct, and if you're working in unsigned types you should understand how to sort that out for your use case.

There is no bigger primitive type than long, so we don't have a good solution for that one because you'd end up making an array of objects to hold it and it quickly becomes intractable. When Java gets value types then we'll be able to define unsigned types and return those, but for the moment people usually expect to get the signed versions and if they need something else it's on them to figure it out from the bits.

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
(void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
const OrtApi* api = (const OrtApi*) apiHandle;
if (onnxType == 5) {
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(onnxType == 5) {

Same as above #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.


// Get reference to the string
jobject output = (*jniEnv)->GetObjectArrayElement(jniEnv, outputArray, 0);
jobjectArray outputArray = createStringArrayFromTensor(jniEnv, api, (OrtAllocator*) allocatorHandle,
Copy link
Member

@yuslepukhin yuslepukhin Aug 5, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

createStringArrayFromTensor

My understanding this is a single string case. Does the function check it is a single element tensor? #Resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We check that in Java before this call. The individual "getX" methods are private and only called if it's a single element.

Copy link
Member

@yuslepukhin yuslepukhin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🕐

@yuslepukhin
Copy link
Member

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

@edgchen1
Copy link
Contributor

edgchen1 commented Aug 5, 2022

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

does it work if it is added to "flags"?

flags: --linelength=120

@Craigacp
Copy link
Contributor Author

Craigacp commented Aug 7, 2022

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

does it work if it is added to "flags"?

flags: --linelength=120

Yep, that works. I'd missed that option the first time around.

@yuslepukhin
Copy link
Member

/azp run MacOS CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,onnxruntime-python-checks-ci-pipeline

@yuslepukhin
Copy link
Member

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline

@yuslepukhin
Copy link
Member

/azp run orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed onnxruntime-binary-size-checks-ci-pipeline

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@azure-pipelines
Copy link

Azure Pipelines successfully started running 6 pipeline(s).

@yuslepukhin
Copy link
Member

/azp run onnxruntime-binary-size-checks-ci-pipeline orttraining-ortmodule-dist

@azure-pipelines
Copy link

No pipelines are associated with this pull request.

@yuslepukhin yuslepukhin merged commit 8a86b34 into microsoft:master Aug 8, 2022
@Craigacp Craigacp deleted the jni-onnx-tensor branch August 8, 2022 20:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants