Skip to content

fix(ingestion): escape surrogate characters when producing to kafka#9368

Merged
yakkomajuri merged 11 commits intomasterfrom
escape-json
Apr 11, 2022
Merged

fix(ingestion): escape surrogate characters when producing to kafka#9368
yakkomajuri merged 11 commits intomasterfrom
escape-json

Conversation

@yakkomajuri
Copy link
Copy Markdown
Contributor

@yakkomajuri yakkomajuri commented Apr 8, 2022

Summary of the problem we have as posted on Slack:

ClickHouse crashes when parsing JSONEachRow records (or JSON formats in general) if it encounters a surrogate character without a pair.
In our case, some user’s events contained \ud83d\.

The approach I took here was to follow what was done here: fastify/fast-json-stringify#151

That means escaping all surrogates, rather than trying to look for surrogates without a pair. It's a valid tradeoff.

Once impletmented in JS I then made sure Python had the same functionality.

Has tests for both sides

@yakkomajuri yakkomajuri mentioned this pull request Apr 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants