[Java] JNI refactor for ONNX Tensor by Craigacp · Pull Request #12281 · microsoft/onnxruntime

Craigacp · 2022-07-22T03:09:59Z

Description:

Following on from #12013, this PR improves the error handling in the JNI code for OnnxTensor. OrtSession and the OrtJniUtil helper file are the remaining bits of unfixed JNI code.

Motivation and Context

Why is this change required? What problem does it solve? The error handling and exception throwing in the JNI code doesn't handle repeated errors very well. See JAVA API does not handle exceptions correctly - causing crash or potential memory leak #11451.
If it fixes an open issue, please link to the issue here. Partial fix for JAVA API does not handle exceptions correctly - causing crash or potential memory leak #11451.

Craigacp · 2022-08-03T18:55:03Z

@fs-eire @yuslepukhin please can I get a review for this PR?

yuslepukhin · 2022-08-03T20:14:53Z

@fs-eire @yuslepukhin please can I get a review for this PR?

Will take a look

java/src/main/native/OrtJniUtil.h

java/src/main/native/OrtJniUtil.c

java/src/main/native/ai_onnxruntime_OnnxTensor.c

yuslepukhin · 2022-08-04T18:08:46Z

Still need to release OrtStatus

In reply to: 1205601429

Refers to: java/src/main/native/OrtJniUtil.c:1074 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

java/src/main/native/ai_onnxruntime_OnnxTensor.c

yuslepukhin · 2022-08-04T20:59:42Z

OrtErrorCode checkOrtStatus(JNIEnv *jniEnv, const OrtApi * api, OrtStatus * status) {

Documentation question.
There is a number of checkOrtStatus calls in the implementation. Say the first one detects an error and then throws Java exception. However, the code continues to do other calls and other checkOrtStatus() call.
Is it the correct thing to do? Can java throw multiple exceptions at once?

In reply to: 1205761943

Refers to: java/src/main/native/OrtJniUtil.c:1063 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

yuslepukhin · 2022-08-04T21:01:17Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

+        }
+
+        // Assign the strings into the Tensor
+        checkOrtStatus(jniEnv, api, api->FillStringTensor(ortValue, strings, length));


checkOrtStatus

What if this has an error, what happens if the next checkOrtStatus also errors out? Nested exceptions or just the last one?
Should we create a chain of exceptions? #Resolved

It returns the first exception thrown rather than any later ones.

java/src/main/native/ai_onnxruntime_OnnxTensor.c

Craigacp · 2022-08-04T22:03:48Z

OrtErrorCode checkOrtStatus(JNIEnv *jniEnv, const OrtApi * api, OrtStatus * status) {

Documentation question. There is a number of checkOrtStatus calls in the implementation. Say the first one detects an error and then throws Java exception. However, the code continues to do other calls and other checkOrtStatus() call. Is it the correct thing to do? Can java throw multiple exceptions at once?

Refers to: java/src/main/native/OrtJniUtil.c:1063 in 68b389f. [](commit_id = 68b389f, deletion_comment = False)

Java cannot throw multiple exceptions at once, the remainder are discarded. The goal of this refactor is that no other checkOrtStatus calls happen after an exception has been thrown on the Java side, so if there still is one (in this code, I've not finished the OrtSession or OrtJniUtil files) then that's something that needs fixing.

Craigacp · 2022-08-04T22:48:32Z

I've fixed all the things discussed.

yuslepukhin · 2022-08-05T19:54:28Z

We are getting troubles from cpplinter. Can you, please, add this change to your PR?
#12094

java/src/main/native/ai_onnxruntime_OnnxTensor.c

Craigacp · 2022-08-05T20:12:55Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

fs-eire · 2022-08-05T20:50:18Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

yuslepukhin · 2022-08-05T20:52:14Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

+    if (code == ORT_OK) {
+      // Create the buffers for the Java strings
+      const char** strings = NULL;
+      code = checkOrtStatus(jniEnv, api, api->AllocatorAlloc(allocator, sizeof(char*) * length, (void**)&strings));


There is no reason to use ORT Allocator here. One can just use malloc or whatever. It would result in a simpler code.

The data is internally copied anyway because C++ code stores it in std::string objects.

I've been moderately consistently using ORT's allocator as otherwise there are three different memory allocators in use (Java's, malloc and ORT's allocators). I can replace the use of ORT's allocators with malloc if you want, but I'd assumed it was ok as ORT uses it's own allocators to allocate strings in a bunch of the C API methods, rather than malloc and requiring users to free that memory.

I did not mean to replace all of the cases. Sometimes, the API requires it to be allocated with the ORT allocator. But in this case, it is just a piece of temporary memory which could be on the stack if we could do it.

Craigacp · 2022-08-05T21:25:26Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

yuslepukhin · 2022-08-05T21:25:41Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

-        return NAN;
+  (void) jobj;  // Required JNI parameter not needed by functions which don't need to access their host object.
+  const OrtApi* api = (const OrtApi*) apiHandle;
+  if (onnxType == 9) {


Can this be enum from ORT API or Java? #Resolved

It is in Java, but these numbers are only in there. As far as I can tell the C enums don't expose the ordinal and I didn't want to have yet another copy of the enum I'd need to keep in sync.

C enums are ints with values in the order or declaration. But they can be assigned any value you wish.

yuslepukhin · 2022-08-05T21:26:47Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

+      jfloat floatVal = convertHalfToFloat(*arr);
+      return floatVal;
+    }
+  } else if (onnxType == 10) {


10

enum #Resolved

yuslepukhin · 2022-08-05T21:27:48Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

-    (void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
+    (void) jobj;  // Required JNI parameter not needed by functions which don't need to access their host object.
    const OrtApi* api = (const OrtApi*) apiHandle;
  if (onnxType == 1) {


1

enum.
In this case, the code seems to be completely identical.
We shold combine these two cases with uint8_t. The type is casted away anyway. #Resolved

yuslepukhin · 2022-08-05T21:30:38Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

+      return floatVal;
+    }
+  } else if (onnxType == 10) {
+    jfloat* arr = NULL;


jfloat

Is jfloat always 32 bits? #Resolved

Yes. All the java types are defined to be a fixed size across platforms.

yuslepukhin · 2022-08-05T21:32:34Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

-    (void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
+    (void) jobj;  // Required JNI parameter not needed by functions which don't need to access their host object.
    const OrtApi* api = (const OrtApi*) apiHandle;
  if (onnxType == 3) {


if (onnxType == 3) {

Same comments as above.
enum + combine the code.
What worries me here, is that in Java a positve uint16_t can become negative.
What would be the best way dealing with it?
I would say we need to return unsigned integers in larger ones. uint16_t needs to be returned in a 32-bit signed.
This may warrant restructuring the API. For example:

We can group types together in a different manner.
uint8_t + int16_t -> jshort
uint16_t + int32_t -> jint
uint32_t + int64_t -> jlong
uint64_t + ????? Some big Java type? #Resolved

My opinion on that is that people who get unsigned values back in Java should deal with that themselves. The bit pattern is correct, and if you're working in unsigned types you should understand how to sort that out for your use case.

There is no bigger primitive type than long, so we don't have a good solution for that one because you'd end up making an array of objects to hold it and it quickly becomes intractable. When Java gets value types then we'll be able to define unsigned types and return those, but for the moment people usually expect to get the signed versions and if they need something else it's on them to figure it out from the bits.

Craigacp · 2022-08-05T21:35:27Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This reverts commit 6af4863.

yuslepukhin · 2022-08-05T21:43:07Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

-    (void) jobj; // Required JNI parameter not needed by functions which don't need to access their host object.
+    (void) jobj;  // Required JNI parameter not needed by functions which don't need to access their host object.
    const OrtApi* api = (const OrtApi*) apiHandle;
  if (onnxType == 5) {


(onnxType == 5) {

Same as above #Resolved

yuslepukhin · 2022-08-05T21:48:30Z

java/src/main/native/ai_onnxruntime_OnnxTensor.c

-
-    // Get reference to the string
-    jobject output = (*jniEnv)->GetObjectArrayElement(jniEnv, outputArray, 0);
+    jobjectArray outputArray = createStringArrayFromTensor(jniEnv, api, (OrtAllocator*) allocatorHandle,


createStringArrayFromTensor

My understanding this is a single string case. Does the function check it is a single element tensor? #Resolved

We check that in Java before this call. The individual "getX" methods are private and only called if it's a single element.

yuslepukhin

🕐

yuslepukhin · 2022-08-05T22:33:56Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

Craigacp · 2022-08-05T22:37:10Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

edgchen1 · 2022-08-05T23:00:26Z

My understanding from #12013 (comment) was that we'd leave the linter as is and accept that it was grumpy about not using C++ casts. But I can add it if you want.

How about we redo #12094 but only exclude the JNI subfolder?

I am fine with excluding only the JNI subfolder. Running cpplint on C code is just wrong.

Ok, I believe I've modified the cpplint action to exclude java/src/native/*.c.

The cpplint action doesn't pass through the exclude argument (https://github.com/cpplint/cpplint/blob/develop/cpplint.py#L206). So I guess the option is to remove .c files from cpplint, but that seems like it has a wider blast radius.

This might be helpful. google/styleguide#220

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

does it work if it is added to "flags"?

onnxruntime/.github/workflows/lint.yml

Line 89 in e85e31e

flags: --linelength=120

Craigacp · 2022-08-07T02:52:03Z

Ah, no I mean the GitHub action which runs cpplint doesn't pass through the exclude argument to the cpplint execution. You can see it complain in one of the runs that exclude is an unsupported option.

does it work if it is added to "flags"?

onnxruntime/.github/workflows/lint.yml

Line 89 in e85e31e

flags: --linelength=120

Yep, that works. I'd missed that option the first time around.

yuslepukhin · 2022-08-08T17:30:44Z

/azp run MacOS CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,onnxruntime-python-checks-ci-pipeline

yuslepukhin · 2022-08-08T17:30:59Z

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux Nuphar CI Pipeline,Linux OpenVINO CI Pipeline

yuslepukhin · 2022-08-08T17:31:08Z

/azp run orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed onnxruntime-binary-size-checks-ci-pipeline

azure-pipelines · 2022-08-08T17:31:10Z

Azure Pipelines successfully started running 6 pipeline(s).

azure-pipelines · 2022-08-08T17:31:24Z

Azure Pipelines successfully started running 3 pipeline(s).

azure-pipelines · 2022-08-08T17:31:25Z

Azure Pipelines successfully started running 6 pipeline(s).

yuslepukhin · 2022-08-08T18:47:21Z

/azp run onnxruntime-binary-size-checks-ci-pipeline orttraining-ortmodule-dist

azure-pipelines · 2022-08-08T18:47:27Z

No pipelines are associated with this pull request.

Craigacp added 3 commits July 20, 2022 19:11

Working on JNI refactor for OnnxTensor.

b65981b

Finishing OnnxTensor.

7a740c1

Placate the linter.

68b389f

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/OrtJniUtil.h Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/OrtJniUtil.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

yuslepukhin reviewed Aug 4, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

Updates from the review.

1dff7dd

Placate the linter.

da5e56d

yuslepukhin reviewed Aug 5, 2022

View reviewed changes

java/src/main/native/ai_onnxruntime_OnnxTensor.c Outdated Show resolved Hide resolved

Craigacp mentioned this pull request Aug 5, 2022

[Java] JNI refactor for OrtSession #12496

Merged

Simplifying the error handling logic in createTensor.

6d9a7b3

yuslepukhin reviewed Aug 5, 2022

View reviewed changes

Excluding JNI C files from cpplint.

6af4863

yuslepukhin reviewed Aug 5, 2022

View reviewed changes

Craigacp added 2 commits August 5, 2022 17:35

Revert "Excluding JNI C files from cpplint."

8fe3150

This reverts commit 6af4863.

Collapsing casting branches and migrating to ONNX element type enum.

1972b5c

yuslepukhin reviewed Aug 5, 2022

View reviewed changes

yuslepukhin requested changes Aug 5, 2022

View reviewed changes

Disable cpplint for JNI C files.

db240f7

yuslepukhin approved these changes Aug 8, 2022

View reviewed changes

yuslepukhin merged commit 8a86b34 into microsoft:master Aug 8, 2022

Craigacp deleted the jni-onnx-tensor branch August 8, 2022 20:18

Craigacp mentioned this pull request Aug 9, 2022

[Java] JNI refactor for OrtJniUtil #12516

Merged

Conversation

Craigacp commented Jul 22, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Craigacp commented Aug 3, 2022

Uh oh!

yuslepukhin commented Aug 3, 2022

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yuslepukhin commented Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuslepukhin Aug 4, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Craigacp commented Aug 4, 2022

Uh oh!

Craigacp commented Aug 4, 2022

Uh oh!

yuslepukhin commented Aug 5, 2022

Uh oh!

Uh oh!

Craigacp commented Aug 5, 2022

Uh oh!

fs-eire commented Aug 5, 2022

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Craigacp commented Aug 5, 2022

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

yuslepukhin Aug 5, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Craigacp commented Jul 22, 2022 •

edited

Loading

yuslepukhin commented Aug 4, 2022 •

edited

Loading

yuslepukhin commented Aug 4, 2022 •

edited

Loading

yuslepukhin Aug 4, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading

yuslepukhin Aug 5, 2022 •

edited

Loading