Skip to content

speed up zio/jsonio.Reader with a builder#3558

Merged
nwt merged 8 commits intomainfrom
jsonio-builder
Feb 3, 2022
Merged

speed up zio/jsonio.Reader with a builder#3558
nwt merged 8 commits intomainfrom
jsonio-builder

Conversation

@nwt
Copy link
Member

@nwt nwt commented Feb 3, 2022

zio/jsonio.Reader is slow for a few reasons.

  1. To normalize record field order, it unmarshals JSON to Go values and
    then marshals back to JSON (via encoding/json).

  2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which
    is not optimized.

  3. Each call to its Read method calls zsonio.NewReader, which does a
    number of allocations.

Speed it up by replacing all of that with an optimized builder that
minimizes the number of allocations (though encoding/json.Decoder.Token
still does too many).

Note that unmarshaling via encoding/json can discard precision for
numbers because it unmarshals them into a float64. Removing this step
accounts for the output change in zio/jsonio/ztests/reader.yaml.

Reading NDJSON with -f json -i json is about three times faster for me with this branch compared to main.

Depends on #3555.

nwt added 5 commits February 2, 2022 17:27
The -f and -i format selection flags for Zed tools recognize both "json"
and "ndjson".  While the exact behavior for the "json" format is not
easy to describe, "-f json" is roughly equivalent to "-f ndjson"
together with "summarize collect(this) | yield this" while "-i json" is
roughly equivalent to "-i ndjson" together with "over this".  These Zed
language constructs weren't available when the "json" was originally
implemented, but now that they are, its specialzed behavior unneeded.

Simplify by renaming the "ndjson" format to "json".  More specifically,

1. Give "json" the current behavior of "ndjson" for the -f, -i, and
   zio/anyio.

2. Remove "ndjson" as a format recognized by -f, -i, and zio/anyio.

3. Remove zio/ndjsonio.

4. Replace zio/jsonio.WriterOpts and its ForceArray option with
   zio/zjsonio.ArrayWriter (for application/json HTTP responses).
zio/jsonio.Reader is slow for a few reasons.

1. To normalize record field order, it unmarshals JSON to Go values and
   then marshals back to JSON (via encoding/json).

2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which
   is not optimized.

3. Each call to its Read method calls zsonio.NewReader, which does a
   number of allocations.

Speed it up by replacing all of that with an optimized builder that
minimizes the number of allocations (though encoding/json.Decoder.Token
still does too many).

Note that unmarshaling via encoding/json can discard precision for
numbers because it unmarshals them into a float64.  Removing this step
accounts for the output change in zio/jsonio/ztests/reader.yaml.
@nwt nwt requested a review from a team February 3, 2022 17:38
Base automatically changed from rename-ndjson-to-json to main February 3, 2022 20:40
@nwt nwt merged commit f67f3ab into main Feb 3, 2022
@nwt nwt deleted the jsonio-builder branch February 3, 2022 22:12
brim-bot added a commit to brimdata/brimcap that referenced this pull request Feb 3, 2022
This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#3558, authored by @nwt,
has been merged.

speed up zio/jsonio.Reader with a builder

zio/jsonio.Reader is slow for a few reasons.

1. To normalize record field order, it unmarshals JSON to Go values and
   then marshals back to JSON (via encoding/json).

2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which
   is not optimized.

3. Each call to its Read method calls zsonio.NewReader, which does a
   number of allocations.

Speed it up by replacing all of that with an optimized builder that
minimizes the number of allocations (though encoding/json.Decoder.Token
still does too many).

Note that unmarshaling via encoding/json can discard precision for
numbers because it unmarshals them into a float64.  Removing this step
accounts for the output change in zio/jsonio/ztests/reader.yaml.

Reading NDJSON with `-f json` is about three times faster for me with this branch compared to main.

Depends on brimdata/super#3555.
brim-bot added a commit to brimdata/brimcap that referenced this pull request Feb 3, 2022
This is an auto-generated commit with a Zed dependency update. The Zed PR
brimdata/super#3558, authored by @nwt,
has been merged.

speed up zio/jsonio.Reader with a builder

zio/jsonio.Reader is slow for a few reasons.

1. To normalize record field order, it unmarshals JSON to Go values and
   then marshals back to JSON (via encoding/json).

2. To unmarshal JSON to Zed values, it uses a zio/zsonio.Reader, which
   is not optimized.

3. Each call to its Read method calls zsonio.NewReader, which does a
   number of allocations.

Speed it up by replacing all of that with an optimized builder that
minimizes the number of allocations (though encoding/json.Decoder.Token
still does too many).

Note that unmarshaling via encoding/json can discard precision for
numbers because it unmarshals them into a float64.  Removing this step
accounts for the output change in zio/jsonio/ztests/reader.yaml.

Reading NDJSON with `-f json` is about three times faster for me with this branch compared to main.

Depends on brimdata/super#3555.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants