Skip to content

Python add support for utf-8 error messages#16946

Merged
lidizheng merged 1 commit intogrpc:masterfrom
lidizheng:py-utf-8
Nov 6, 2018
Merged

Python add support for utf-8 error messages#16946
lidizheng merged 1 commit intogrpc:masterfrom
lidizheng:py-utf-8

Conversation

@lidizheng
Copy link
Copy Markdown
Contributor

@lidizheng lidizheng added lang/Python release notes: yes Indicates if PR needs to be in release notes labels Oct 19, 2018
@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,238      Total (=)      1,997,238

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,642      Total (<)     11,021,645

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,238      Total (=)      1,997,238

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,641      Total (<)     11,021,646

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,238      Total (=)      1,997,238

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,762      Total (=)     11,021,762

 No significant differences in binary sizes


@lidizheng lidizheng changed the title Add support for utf-8 error messages Python server add support for utf-8 error messages Oct 19, 2018
Copy link
Copy Markdown
Contributor

@mehrdada mehrdada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an area we really need tests for; ideally ones that failed previously.

@lidizheng
Copy link
Copy Markdown
Contributor Author

The unit test I added will fail if my code are removed. And about the decoding part, I tried to decode to unicode object instead of passing the byte string to the user. It seems more consistent with previous logic.

@lidizheng
Copy link
Copy Markdown
Contributor Author

You are right that I can't reproduce this client-side error... That's why I renamed the title to server-side.

Invalid encoding on b'\xc3'
Traceback (most recent call last):
  File "src/python/grpcio/grpc/_cython/_cygrpc/grpc_string.pyx.pxi", line 51, in grpc._cython.cygrpc._decode
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xc3 in position 0: unexpected end of data

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,238      Total (=)      1,997,238

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,764      Total (>)     11,021,763

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,294      Total (=)      1,997,294

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,762      Total (<)     11,021,763

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,294      Total (=)      1,997,294

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,756      Total (>)     11,021,754

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,294      Total (=)      1,997,294

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,753      Total (<)     11,021,763

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,294      Total (=)      1,997,294

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,769      Total (>)     11,021,763

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,294      Total (=)      1,997,294

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,767      Total (>)     11,021,761

 No significant differences in binary sizes


@lidizheng
Copy link
Copy Markdown
Contributor Author

lidizheng commented Oct 23, 2018

@mehrdada Can you review the code? I added the unit test for UTF-8 error message, which will fail both the client side and server side without updated common.encode and common.decode.

@euroelessar
Copy link
Copy Markdown
Contributor

Please also add a test for invalid UTF-8 messages, for example b'abc\x80\xd0\xaf' should be converted to something like u'abc\ufffd\u042f' or similar.


def decode(b):
if isinstance(b, str):
def decode(b): #pylint: disable=inconsistent-return-statements
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe smth simple like this?

if isinstance(b, bytes):
  return b.decode('utf-8', 'replace')
return b

similar for def encode, just always encoding it as utf-8 should make a deal

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible, I want to keep the message content. The replace will get rid of character that utf-8 can't understand. It will fail some cases, if people transmit pure binary data.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

gRPC HTTP/2 spec explicitly states that message is an utf-8 message, so pure binary data is not an acceptable input (and therefore it should be sanitized, but not honored).
Please see https://github.com/grpc/grpc/blob/master/doc/PROTOCOL-HTTP2.md#responses for more details.

It will be also consistent to grpc-go behavior, which currently sanitizes status messages by replacing invalid bytes sequences with the Unicode replacement character.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please review the latest change. I have adopted your advices and added test cases.

@lidizheng
Copy link
Copy Markdown
Contributor Author

@euroelessar Thanks for your suggestion. Since we don't need to worry about other codec, the logic is much simpler. And I added your test case to the unit test.

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 1,997,274      Total (=)      1,997,274

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,021,701      Total (>)     11,021,696

 No significant differences in binary sizes


@srini100
Copy link
Copy Markdown
Contributor

@lidizheng, if this is going to affect metadata encoding/decoding, we should keep in mind that binary encoded metadata needs to be support based on this proposal https://github.com/grpc/proposal/blob/master/G1-true-binary-metadata.md.

@lidizheng
Copy link
Copy Markdown
Contributor Author

@srini100
The encode/decode function for Metadata and error message (or status details or status message) is different. The encode/decode function in _common.py which I changed only affect four items:

  1. url to listen/bind
  2. exception message
  3. method handler name
  4. error message/status details. <---

As for the metadata in your concern, it is built in Cython layer which have another set of encode/decode functions in grpc_string.pyx.pxi. So, my change will not affect current logic of metadata at all.

And if we want to proceed to the unicode metadata... I guess we can open another issue for it.

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,015,578      Total (=)      2,015,578

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,116,197      Total (>)     11,116,193

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,015,578      Total (=)      2,015,578

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,116,200      Total (>)     11,116,194

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,016,505      Total (=)      2,016,505

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,123,078      Total (<)     11,123,080

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@lidizheng
Copy link
Copy Markdown
Contributor Author

@nicolasnoble Can you take a look at my latest changes? I added interop tests and they are passed in CI. But other tests that I didn't touch are failing. I tried to de-flake them by force-running multiple times, but they still fail...

* Both server and client should be fine with utf-8 error messages now
* Adding an interop test: special status message
@grpc-testing
Copy link
Copy Markdown

****************************************************************

libgrpc.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]


****************************************************************

libgrpc++.so

     VM SIZE        FILE SIZE
 ++++++++++++++  ++++++++++++++

  [ = ]       0        0  [ = ]



@grpc-testing
Copy link
Copy Markdown

[trickle] No significant performance differences

@grpc-testing
Copy link
Copy Markdown

Objective-C binary sizes
*****************STATIC******************
  New size                      Old size
 2,020,616      Total (=)      2,020,616

 No significant differences in binary sizes

***************FRAMEWORKS****************
  New size                      Old size
11,180,943      Total (=)     11,180,943

 No significant differences in binary sizes


@grpc-testing
Copy link
Copy Markdown

Corrupt JSON data (indicates timeout or crash): 
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.new: 10
    bm_call_create.BM_IsolatedFilter_ClientChannelFilter_NoOp_.counters.old: 10


[microbenchmarks] No significant performance differences

@lidizheng lidizheng merged commit a04a928 into grpc:master Nov 6, 2018
@lock lock bot locked as resolved and limited conversation to collaborators Feb 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

lang/Python release notes: yes Indicates if PR needs to be in release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

8 participants