Skip to content
This repository was archived by the owner on Sep 14, 2020. It is now read-only.

[275] Parse JSON lines ourselves to avoid aiohttp's "Line is too long"#276

Merged
nolar merged 4 commits intozalando-incubator:masterfrom
nolar:275-line-too-long
Dec 19, 2019
Merged

[275] Parse JSON lines ourselves to avoid aiohttp's "Line is too long"#276
nolar merged 4 commits intozalando-incubator:masterfrom
nolar:275-line-too-long

Conversation

@nolar
Copy link
Copy Markdown
Contributor

@nolar nolar commented Dec 16, 2019

Parse the JSON lines of Kubernetes watch-streams with no memory limit.

Issue : #275

Description

Kubernetes can send JSON lines with huge objects. E.g., secrets of 2MB in #275.

aiohttp has a hard-coded limit the lines received from the stream, and it is 128KB (aiohttp.streams.DEFAULT_LIMIT = 2**16 = 64 KB for the low watermark of the buffer, multiplied by 2x for high watermark, where the exception is raised).

There is no way to externally configure the buffer size of the aiohttp's StreamReader, so as there is no way to inject our own StreamReader instances with limit= properly set. The docs say:

User should never instantiate streams manually but use existing aiohttp.web.Request.content and aiohttp.ClientResponse.content properties for accessing raw BODY data.

For this reason, we implement our own per-line iterator with no memory control — it takes as much memory, as it is needed. Though, we try not to waste it too much by not having duplicates of each line in the buffer and by not having multiple copies of the same line.

The efficiency is not tested and is not measured. Specifically, it can be slow if there are hundreds or thousands of such huge resources or events are happening very often. But we assume it is sufficient for now, and can later be improved when some performance optimisations and measurements are performed.

FYI: The provided test clearly shows how the pure aiohttp approach fails with "Line is too long" — if used without this wrapping iterator.

Types of Changes

  • Bug fix (non-breaking change which fixes an issue)

Review

List of tasks the reviewer must do to review the PR

  • Tests

@nolar nolar added the bug Something isn't working label Dec 16, 2019
@nolar nolar requested a review from samurang87 as a code owner December 16, 2019 18:49
@zincr
Copy link
Copy Markdown

zincr bot commented Dec 16, 2019

🤖 zincr found 0 problems , 0 warnings

✅ Large Commits
✅ Approvals
✅ Specification
✅ Dependency Licensing

@zincr
Copy link
Copy Markdown

zincr bot commented Dec 16, 2019

🤖 zincr found 1 problem , 0 warnings

❌ Approvals
✅ Large Commits
✅ Specification
✅ Dependency Licensing

Details on how to resolve are provided below


Approvals

All proposed changes must be reviewed by project maintainers before they can be merged

Not enough people have approved this pull request - please ensure that 1 additional user, who have not contributed to this pull request approve the changes.

  • ✅ Approved by PR author @nolar
  • ❌ 1 additional approval needed
     

@pshchelo
Copy link
Copy Markdown
Contributor

Thanks a lot for a quick patch!
I will test it tomorrow on our actual use case and report back.

@nolar
Copy link
Copy Markdown
Contributor Author

nolar commented Dec 16, 2019

PS: Releasing it as 0.23.3 hotfix on top of 0.23.2 is not possible, as the CI/CD of that old state is broken due to new releases of the external components (#272 #269). So, this fix goes on top of the master — with all the changes that are already there — and will be released as 0.24.

Preview of 0.24:

image

Otherwise, it freezes at the real cluster (example 99) due to an empty
line is yielded, and cannot be parsed.
@pshchelo
Copy link
Copy Markdown
Contributor

@nolar seems to work fine with us! eagerly waiting for 0.24 release now ...

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants