-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Describe the bug
in_exec can not handle non-ASCII characters output.
It is because of the specification of child_process_execute:
external_encoding:ascii-8bitinternal_encoding:utf-8encoding_options:invalid: :replace, undef: :replace
This always breaks none non-ASCII characters.
fluentd/lib/fluent/plugin_helper/child_process.rb
Lines 65 to 72 in 1a2759c
| def child_process_execute( | |
| title, command, | |
| arguments: nil, subprocess_name: nil, interval: nil, immediate: false, parallel: false, | |
| mode: [:read, :write], stderr: :discard, env: {}, unsetenv: false, chdir: nil, | |
| internal_encoding: 'utf-8', external_encoding: 'ascii-8bit', scrub: true, replace_string: nil, | |
| wait_timeout: nil, on_exit_callback: nil, | |
| &block | |
| ) |
fluentd/lib/fluent/plugin_helper/child_process.rb
Lines 247 to 253 in 1a2759c
| encoding_options = {} | |
| if scrub | |
| encoding_options[:invalid] = encoding_options[:undef] = :replace | |
| if replace_string | |
| encoding_options[:replace] = replace_string | |
| end | |
| end |
| readio.set_encoding(external_encoding, internal_encoding, **encoding_options) |
We can easily confirm the io behavior by irb:
irb(main):001:0> require "open3"
=> true
irb(main):002:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 6>, #<IO:fd 7>, #<Process::Waiter:0x00007f7d942fea40 run>]
irb(main):003:0> r_io.read
=> "こんにちは\n"
irb(main):004:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 8>, #<IO:fd 9>, #<Process::Waiter:0x00007f7d942d45b0 run>]
irb(main):005:0> r_io.set_encoding(Encoding::ASCII_8BIT, Encoding::UTF_8, invalid: :replace, undef: :replace)
=> #<IO:fd 9>
irb(main):006:0> r_io.read
=> "���������������\n"
irb(main):007:0> I'm wondering if we should fix the implementation of in_exec as follows:
diff --git a/lib/fluent/plugin/in_exec.rb b/lib/fluent/plugin/in_exec.rb
index c2851366..ab514957 100644
--- a/lib/fluent/plugin/in_exec.rb
+++ b/lib/fluent/plugin/in_exec.rb
@@ -74,9 +74,9 @@ module Fluent::Plugin
super
if @run_interval
- child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], &method(:run))
+ child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], internal_encoding: nil, &method(:run))
else
- child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], &method(:run))
+ child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], internal_encoding: nil, &method(:run))
end
endBy specifying internal_encoding: nil, we can stop the automatic encoding conversion in child_process_execute.
This allows in_exec to handle non-ASCII characters.
Does the current automatic encoding conversion make any sense?
One possible cause could be that the encoding of the data must be utf-8.
Even if so, I believe it would be wrong to always convert the actual encode to utf-8 of the result of the command in in_exec.
To Reproduce
Run the following sample config.
Expected behavior
in_exec can handle non-ASCII characters output as well.
Your Environment
- Fluentd version: 1.16.5
- Operating system: Ubuntu 20.04.6 LTS, Windows 10
- Kernel version: 5.15.0-101-genericYour Configuration
<source>
@type exec
command "echo こんにちは"
tag test
<parse>
@type none
</parse>
</source>
<match test>
@type stdout
</match>Your Error Log
(No error, but I put the stdout output here.)
2024-04-03 16:51:59 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: parsing config file is succeeded path="/test/fluentd/config/in_exec/1.conf"
2024-04-03 16:51:59 +0900 [info]: gem 'fluentd' version '1.16.5'
2024-04-03 16:51:59 +0900 [info]: using configuration file: <ROOT>
<source>
@type exec
command "echo こんにちは"
tag "test"
<parse>
@type "none"
</parse>
</source>
<match test>
@type stdout
</match>
</ROOT>
2024-04-03 16:51:59 +0900 [info]: starting fluentd-1.16.5 pid=439655 ruby="3.2.2"
2024-04-03 16:51:59 +0900 [info]: spawn command to main: cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/test/fluentd/config/in_exec/1.conf", "--under-supervisor"]
2024-04-03 16:51:59 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: adding match pattern="test" type="stdout"
2024-04-03 16:51:59 +0900 [info]: adding source type="exec"
2024-04-03 16:51:59 +0900 [info]: #0 starting fluentd worker pid=439675 ppid=439655 worker=0
2024-04-03 16:51:59 +0900 [info]: #0 fluentd worker is now running worker=0
2024-04-03 16:51:59.808444702 +0900 test: {"message":"���������������"}Additional context
No response