Skip to content

fixes #175. utf-8 errors in Python 3. #208

Merged
msumit merged 1 commit intoqubole:unreleasedfrom
mcarlsen:master
Mar 19, 2019
Merged

fixes #175. utf-8 errors in Python 3. #208
msumit merged 1 commit intoqubole:unreleasedfrom
mcarlsen:master

Conversation

@mcarlsen
Copy link
Copy Markdown
Contributor

@mcarlsen mcarlsen commented Nov 6, 2017

Caused by block reads chopping multibyte utf-8 sequences in half

Because of the 8192 bytes read block size, a utf-8 character can possibly be cut in two, causing the block to be invalid utf-8.

Fixed by not decoding the block. Instead encode the delimiter and do the replace operation with bytes instead of str.

@dancrew32
Copy link
Copy Markdown

This patch is great, thanks! Who should we assign it to in order to merge it into master?

@Kirill-Babkin
Copy link
Copy Markdown

Kirill-Babkin commented Mar 15, 2019

Hey Hey is anyone is still trying to get it in, this fixed solved a problem I was having with the SDK and I would love to see it in SDK.

@chattarajoy chattarajoy changed the base branch from master to unreleased March 15, 2019 03:46
@msumit msumit merged commit 3648855 into qubole:unreleased Mar 19, 2019
@msumit
Copy link
Copy Markdown
Contributor

msumit commented Mar 19, 2019

Thanks for the patch. Merged into unreleased branch for now, which will be picked up in the next release and master branch as well.

chattarajoy pushed a commit that referenced this pull request May 14, 2019
Caused by block reads chopping multibyte utf-8 sequences in half.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants