Conversation
The -f and -i format selection flags for Zed tools recognize both "json" and "ndjson". While the exact behavior for the "json" format is not easy to describe, "-f json" is roughly equivalent to "-f ndjson" together with "summarize collect(this) | yield this" while "-i json" is roughly equivalent to "-i ndjson" together with "over this". These Zed language constructs weren't available when the "json" was originally implemented, but now that they are, its specialzed behavior unneeded. Simplify by renaming the "ndjson" format to "json". More specifically, 1. Give "json" the current behavior of "ndjson" for the -f, -i, and zio/anyio. 2. Remove "ndjson" as a format recognized by -f, -i, and zio/anyio. 3. Remove zio/ndjsonio. 4. Replace zio/jsonio.WriterOpts and its ForceArray option with zio/zjsonio.ArrayWriter (for application/json HTTP responses).
zio/jsonio.Reader is slow for a few reasons. 1. To normalize record field order, it unmarshals JSON to Go values and then marshals back to JSON (via encoding/json). 2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which is not optimized. 3. Each call to its Read method calls zsonio.NewReader, which does a number of allocations. Speed it up by replacing all of that with an optimized builder that minimizes the number of allocations (though encoding/json.Decoder.Token still does too many). Note that unmarshaling via encoding/json can discard precision for numbers because it unmarshals them into a float64. Removing this step accounts for the output change in zio/jsonio/ztests/reader.yaml.
mccanne
approved these changes
Feb 3, 2022
brim-bot
added a commit
to brimdata/brimcap
that referenced
this pull request
Feb 3, 2022
This is an auto-generated commit with a Zed dependency update. The Zed PR brimdata/super#3558, authored by @nwt, has been merged. speed up zio/jsonio.Reader with a builder zio/jsonio.Reader is slow for a few reasons. 1. To normalize record field order, it unmarshals JSON to Go values and then marshals back to JSON (via encoding/json). 2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which is not optimized. 3. Each call to its Read method calls zsonio.NewReader, which does a number of allocations. Speed it up by replacing all of that with an optimized builder that minimizes the number of allocations (though encoding/json.Decoder.Token still does too many). Note that unmarshaling via encoding/json can discard precision for numbers because it unmarshals them into a float64. Removing this step accounts for the output change in zio/jsonio/ztests/reader.yaml. Reading NDJSON with `-f json` is about three times faster for me with this branch compared to main. Depends on brimdata/super#3555.
brim-bot
added a commit
to brimdata/brimcap
that referenced
this pull request
Feb 3, 2022
This is an auto-generated commit with a Zed dependency update. The Zed PR brimdata/super#3558, authored by @nwt, has been merged. speed up zio/jsonio.Reader with a builder zio/jsonio.Reader is slow for a few reasons. 1. To normalize record field order, it unmarshals JSON to Go values and then marshals back to JSON (via encoding/json). 2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which is not optimized. 3. Each call to its Read method calls zsonio.NewReader, which does a number of allocations. Speed it up by replacing all of that with an optimized builder that minimizes the number of allocations (though encoding/json.Decoder.Token still does too many). Note that unmarshaling via encoding/json can discard precision for numbers because it unmarshals them into a float64. Removing this step accounts for the output change in zio/jsonio/ztests/reader.yaml. Reading NDJSON with `-f json` is about three times faster for me with this branch compared to main. Depends on brimdata/super#3555.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
zio/jsonio.Reader is slow for a few reasons.
To normalize record field order, it unmarshals JSON to Go values and
then marshals back to JSON (via encoding/json).
To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which
is not optimized.
Each call to its Read method calls zsonio.NewReader, which does a
number of allocations.
Speed it up by replacing all of that with an optimized builder that
minimizes the number of allocations (though encoding/json.Decoder.Token
still does too many).
Note that unmarshaling via encoding/json can discard precision for
numbers because it unmarshals them into a float64. Removing this step
accounts for the output change in zio/jsonio/ztests/reader.yaml.
Reading NDJSON with
-f json-i jsonis about three times faster for me with this branch compared to main.Depends on #3555.