Skip to content

Conversation

@Watson1978
Copy link
Contributor

@Watson1978 Watson1978 commented Jan 31, 2025

Which issue(s) this PR fixes:
Fixes #

What this PR does / why we need it:
Recently, Ruby's json has incredible performance improvements.
It might be faster than oj gem.
So, I think json is a suitable as fallback.

This is similar with #4759

Here is easily benchmark. (I used same log file in #4759)

  • Before

    • It spent 90.50963662 sec to handle 10 GB file
  • After

    • It spent 74.624230077 sec to handle 10 GB file
  • config

<source>
  @type tail
  path "#{File.expand_path '~/tmp/access*.log'}"
  pos_file "#{File.expand_path '~/tmp/fluentd/access.log.pos'}"
  tag log
  read_from_head true
  <parse>
    @type json
  </parse>
</source>

<match **>
  @type file
  path "#{File.expand_path '~/tmp/fluentd/log'}"
</match>

FYI)

Docs Changes:
fluent/fluentd-docs-gitbook#560

Release Note:

@kenhys
Copy link
Contributor

kenhys commented Jan 31, 2025

we are happy if we could observe micro benchmark result.

@Watson1978
Copy link
Contributor Author

Micro benchmark

benchmark code

require 'bundler/inline'
gemfile do
  source 'https://rubygems.org'
  gem 'benchmark-ips'
  gem 'oj'
  gem 'json'
  gem 'yajl-ruby', require: 'yajl'
end

Benchmark.ips do |x|
  json = '{"a":"Alpha","b":true,"c":12345,"d":[true,[false,[-123456789,null],3.9676,["Something else.",false],null]],"e":{"zero":null,"one":1,"two":2,"three":[3],"four":[0,1,2,3,4]},"f":null,"h":{"a":{"b":{"c":{"d":{"e":{"f":{"g":null}}}}}}},"i":[[[[[[[null]]]]]]]}'

  x.report('Oj.load') { Oj.load(json) }
  x.report('JSON.load') { JSON.load(json) }
  x.report('Yajl.load') { Yajl.load(json) }

  x.compare!
end

result

ruby 3.4.1 (2024-12-25 revision 48d4efcb85) +PRISM [x86_64-linux]
Warming up --------------------------------------
             Oj.load    31.383k i/100ms
           JSON.load    35.513k i/100ms
           Yajl.load    13.839k i/100ms
Calculating -------------------------------------
             Oj.load    319.187k (± 1.5%) i/s    (3.13 μs/i) -      1.601M in   5.015522s
           JSON.load    364.950k (± 0.5%) i/s    (2.74 μs/i) -      1.847M in   5.060214s
           Yajl.load    137.015k (± 2.5%) i/s    (7.30 μs/i) -    691.950k in   5.053679s

Comparison:
           JSON.load:   364950.5 i/s
             Oj.load:   319187.1 i/s - 1.14x  slower
           Yajl.load:   137015.2 i/s - 2.66x  slower

@kenhys
Copy link
Contributor

kenhys commented Jan 31, 2025

By the way, which json gem version did you tested with?
Do we need to use specific version of json gem?

@Watson1978
Copy link
Contributor Author

Watson1978 commented Jan 31, 2025

By the way, which json gem version did you tested with? Do we need to use specific version of json gem?

I tested with json v2.9.1 (latest version), it is default version in Ruby 3.4.1
https://github.com/ruby/ruby/blob/master/doc/NEWS/NEWS-3.4.0.md

If we will use Ruby 3.4 or later in the next fluent-package LTS, hmmm, I think we don't need to specify the version...

@Watson1978 Watson1978 force-pushed the parser_json branch 2 times, most recently from 9968af9 to bfc027e Compare February 3, 2025 00:57
@kenhys
Copy link
Contributor

kenhys commented Feb 3, 2025

NOTE: will not be backported to v1.16

Copy link
Contributor

@daipom daipom left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks!!
Waiting CI.

@daipom daipom added this to the v1.19.0 milestone Feb 3, 2025
@daipom daipom merged commit 1d5b0de into fluent:master Feb 3, 2025
10 checks passed
@Watson1978 Watson1978 deleted the parser_json branch February 3, 2025 08:25
@Watson1978
Copy link
Contributor Author

I created separated PR in #4817

daipom pushed a commit that referenced this pull request Feb 4, 2025
**Which issue(s) this PR fixes**: 
Fixes #

**What this PR does / why we need it**: 
Ref. #4813 (comment)
Fix wrong usage of `JSON.load` method.
We should use `JSON.parse` instead.

JSON.load performs unnecessary deserialisation.

```
irb(main):001> JSON.load('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }')
=> "Hello"
irb(main):002> JSON.parse('{ "json_class": "String", "raw": [72, 101, 108, 108, 111] }')
=> {"json_class" => "String", "raw" => [72, 101, 108, 108, 111]}
```


**Docs Changes**:

**Release Note**:

Signed-off-by: Shizuo Fujita <[email protected]>
daipom pushed a commit that referenced this pull request Feb 21, 2025
…rmance (#4835)

**Which issue(s) this PR fixes**: 
Fixes #

**What this PR does / why we need it**: 
Recently, Ruby's json has incredible performance improvements.
It might be faster than oj gem.
So, I think json is a suitable as fallback.

This is similar with #4813

* Before
  * It spent 70.708872503 sec to handle 10 GB file 
* After
  * It spent 54.428017716 sec to handle 10 GB file

* config
```
<source>
  @type tail
  path "#{File.expand_path('~/tmp/fluentd/access*.log')}"
  pos_file "#{File.expand_path('~/tmp/fluentd/access.log.pos')}"
  tag log
  read_from_head true
  <parse>
    @type none
  </parse>
</source>

<match **>
  @type file
  path "#{File.expand_path('~/tmp/fluentd/output/log')}"

  # use formatter_json
  format json
</match>
```


**Docs Changes**:

**Release Note**:

---------

Signed-off-by: Shizuo Fujita <[email protected]>
@orgads
Copy link
Contributor

orgads commented Sep 10, 2025

Following this, can yajl-ruby be a development_dependency, so that it is not installed by default?

@Watson1978
Copy link
Contributor Author

The json gem cannot handle stream-based JSON using IO objects, so we still need yajl.

@byroot
Copy link

byroot commented Sep 11, 2025

The json gem cannot handle stream-based JSON using IO objects

Could you clarify what you mean by streaming here? Is it SAX style parsing ? Or JSONL ? Other ?

I tried searching for YAJL use in that repo, but saw nothing like that.

@orgads
Copy link
Contributor

orgads commented Sep 11, 2025

@byroot See in_forward.rb for example. The input is read from a socket in chunks.

@Watson1978
Copy link
Contributor Author

Could you clarify what you mean by streaming here? Is it SAX style parsing ? Or JSONL ? Other ?

Ah, sorry. It is just IO stream.

@byroot
Copy link

byroot commented Sep 11, 2025

Ah, sorry. It is just IO stream.

This seem to be the code in question:

if first == '{' || first == '[' # json
parser = Yajl::Parser.new
parser.on_parse_complete = ->(obj){
block.call(obj, bytes, serializer)
bytes = 0
}
serializer = :to_json.to_proc
feeder = ->(d){ parser << d }

But just looking at the code, I'm not 100% sure I get what the format it. Seems to be JSONL ? e.g.

{"foo": 1}
{"foo": 2}

Is that correct?

@Watson1978
Copy link
Contributor Author

For example, we parse a JSON from an IO stream in the following.

def parse_io(io, &block)
y = Yajl::Parser.new
y.on_parse_complete = ->(record){
block.call(parse_time(record), record)
}
y.parse(io, @stream_buffer_size)
end

@byroot
Copy link

byroot commented Sep 11, 2025

@Watson1978 yes, I saw that, but just the code doesn't really help me. What I'd like to see is the content of that io. Is there a test case perhaps?

@Watson1978
Copy link
Contributor Author

Watson1978 commented Sep 11, 2025

@byroot We have simple test case for that in

def test_yajl_parse_io_with_buffer_smaller_than_input
parser = Fluent::Test::Driver::Parser.new(Fluent::Plugin::JSONParser)
parser.configure(
'keep_time_key' => 'true',
'json_parser' => 'yajl',
'stream_buffer_size' => 1,
)
text = "100"
waiting(5) do
rd, wr = IO.pipe
wr.write "{\"time\":\"#{text}\"}"
parser.instance.parse_io(rd) do |time, record|
assert_equal text.to_i, time.sec
assert_equal text, record['time']
# Once a record has been received the 'write' end of the pipe must be
# closed, otherwise the test will block waiting for more input.
wr.close
end
end
end

require 'yajl'

y = Yajl::Parser.new
y.on_parse_complete = ->(record){
  puts "Finished parsing: #{record}"
}

rio, wio = IO.pipe
wio.write '{"message": "hello"}'
wio.close

y.parse(rio)
rio.close

@byroot
Copy link

byroot commented Sep 11, 2025

Hum, right, that that's a single JSON document. What I'm trying to understand is whether it's expected that the pipe may contain multiple documents one after the other. That's what I think I understand, but I'll try to play a bit more with YAJL, to see what it offer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants