-
Notifications
You must be signed in to change notification settings - Fork 1.4k
parser_json: use JSON as fallback parser instead of Yajl for performance #4813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
we are happy if we could observe micro benchmark result. |
Micro benchmarkbenchmark coderequire 'bundler/inline'
gemfile do
source 'https://rubygems.org'
gem 'benchmark-ips'
gem 'oj'
gem 'json'
gem 'yajl-ruby', require: 'yajl'
end
Benchmark.ips do |x|
json = '{"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]}'
x.report('Oj.load') { Oj.load(json) }
x.report('JSON.load') { JSON.load(json) }
x.report('Yajl.load') { Yajl.load(json) }
x.compare!
endresult |
2078284 to
9b3887f
Compare
|
By the way, which json gem version did you tested with? |
I tested with json v2.9.1 (latest version), it is default version in Ruby 3.4.1 If we will use Ruby 3.4 or later in the next fluent-package LTS, hmmm, I think we don't need to specify the version... |
9968af9 to
bfc027e
Compare
|
NOTE: will not be backported to v1.16 |
Signed-off-by: Shizuo Fujita <[email protected]>
bfc027e to
aa67069
Compare
daipom
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks!!
Waiting CI.
|
I created separated PR in #4817 |
**Which issue(s) this PR fixes**: Fixes # **What this PR does / why we need it**: Ref. #4813 (comment) Fix wrong usage of `JSON.load` method. We should use `JSON.parse` instead. JSON.load performs unnecessary deserialisation. ``` irb(main):001> JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => "Hello" irb(main):002> JSON.parse('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }') => {"json_class" => "String", "raw" => [72, 101, 108, 108, 111]} ``` **Docs Changes**: **Release Note**: Signed-off-by: Shizuo Fujita <[email protected]>
…rmance (#4835) **Which issue(s) this PR fixes**: Fixes # **What this PR does / why we need it**: Recently, Ruby's json has incredible performance improvements. It might be faster than oj gem. So, I think json is a suitable as fallback. This is similar with #4813 * Before * It spent 70.708872503 sec to handle 10 GB file * After * It spent 54.428017716 sec to handle 10 GB file * config ``` <source> @type tail path "#{File.expand_path('~/tmp/fluentd/access*.log')}" pos_file "#{File.expand_path('~/tmp/fluentd/access.log.pos')}" tag log read_from_head true <parse> @type none </parse> </source> <match **> @type file path "#{File.expand_path('~/tmp/fluentd/output/log')}" # use formatter_json format json </match> ``` **Docs Changes**: **Release Note**: --------- Signed-off-by: Shizuo Fujita <[email protected]>
|
Following this, can yajl-ruby be a development_dependency, so that it is not installed by default? |
|
The json gem cannot handle stream-based JSON using IO objects, so we still need |
Could you clarify what you mean by streaming here? Is it SAX style parsing ? Or JSONL ? Other ? I tried searching for YAJL use in that repo, but saw nothing like that. |
|
@byroot See in_forward.rb for example. The input is read from a socket in chunks. |
Ah, sorry. It is just IO stream. |
This seem to be the code in question: fluentd/lib/fluent/plugin/in_forward.rb Lines 250 to 257 in d63d193
But just looking at the code, I'm not 100% sure I get what the format it. Seems to be JSONL ? e.g. {"foo": 1}
{"foo": 2}Is that correct? |
|
For example, we parse a JSON from an IO stream in the following. fluentd/lib/fluent/plugin/parser_json.rb Lines 96 to 102 in d63d193
|
|
@Watson1978 yes, I saw that, but just the code doesn't really help me. What I'd like to see is the content of that |
|
@byroot We have simple test case for that in fluentd/test/plugin/test_parser_json.rb Lines 146 to 168 in d63d193
require 'yajl'
y = Yajl::Parser.new
y.on_parse_complete = ->(record){
puts "Finished parsing: #{record}"
}
rio, wio = IO.pipe
wio.write '{"message": "hello"}'
wio.close
y.parse(rio)
rio.close |
|
Hum, right, that that's a single JSON document. What I'm trying to understand is whether it's expected that the pipe may contain multiple documents one after the other. That's what I think I understand, but I'll try to play a bit more with YAJL, to see what it offer. |
Which issue(s) this PR fixes:
Fixes #
What this PR does / why we need it:
Recently, Ruby's json has incredible performance improvements.
It might be faster than oj gem.
So, I think json is a suitable as fallback.
This is similar with #4759
Here is easily benchmark. (I used same log file in #4759)
Before
After
config
FYI)
Docs Changes:
fluent/fluentd-docs-gitbook#560
Release Note: