Skip to content

in_exec: Can't handle non-ASCII characters output #4460

@daipom

Description

@daipom

Describe the bug

in_exec can not handle non-ASCII characters output.

It is because of the specification of child_process_execute:

  • external_encoding: ascii-8bit
  • internal_encoding: utf-8
  • encoding_options: invalid: :replace, undef: :replace

This always breaks none non-ASCII characters.

def child_process_execute(
title, command,
arguments: nil, subprocess_name: nil, interval: nil, immediate: false, parallel: false,
mode: [:read, :write], stderr: :discard, env: {}, unsetenv: false, chdir: nil,
internal_encoding: 'utf-8', external_encoding: 'ascii-8bit', scrub: true, replace_string: nil,
wait_timeout: nil, on_exit_callback: nil,
&block
)

encoding_options = {}
if scrub
encoding_options[:invalid] = encoding_options[:undef] = :replace
if replace_string
encoding_options[:replace] = replace_string
end
end

readio.set_encoding(external_encoding, internal_encoding, **encoding_options)

We can easily confirm the io behavior by irb:

irb(main):001:0> require "open3"
=> true
irb(main):002:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 6>, #<IO:fd 7>, #<Process::Waiter:0x00007f7d942fea40 run>]
irb(main):003:0> r_io.read
=> "こんにちは\n"
irb(main):004:0> w_io, r_io, thread = Open3.popen2("echo こんにちは")
=> [#<IO:fd 8>, #<IO:fd 9>, #<Process::Waiter:0x00007f7d942d45b0 run>]
irb(main):005:0> r_io.set_encoding(Encoding::ASCII_8BIT, Encoding::UTF_8, invalid: :replace, undef: :replace)
=> #<IO:fd 9>
irb(main):006:0> r_io.read
=> "���������������\n"
irb(main):007:0> 

I'm wondering if we should fix the implementation of in_exec as follows:

diff --git a/lib/fluent/plugin/in_exec.rb b/lib/fluent/plugin/in_exec.rb
index c2851366..ab514957 100644
--- a/lib/fluent/plugin/in_exec.rb
+++ b/lib/fluent/plugin/in_exec.rb
@@ -74,9 +74,9 @@ module Fluent::Plugin
       super

       if @run_interval
-        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, interval: @run_interval, mode: [@connect_mode], internal_encoding: nil, &method(:run))
       else
-        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], &method(:run))
+        child_process_execute(:exec_input, @command, immediate: true, mode: [@connect_mode], internal_encoding: nil,  &method(:run))
       end
     end

By specifying internal_encoding: nil, we can stop the automatic encoding conversion in child_process_execute.
This allows in_exec to handle non-ASCII characters.

Does the current automatic encoding conversion make any sense?
One possible cause could be that the encoding of the data must be utf-8.
Even if so, I believe it would be wrong to always convert the actual encode to utf-8 of the result of the command in in_exec.

To Reproduce

Run the following sample config.

Expected behavior

in_exec can handle non-ASCII characters output as well.

Your Environment

- Fluentd version: 1.16.5
- Operating system: Ubuntu 20.04.6 LTS, Windows 10
- Kernel version: 5.15.0-101-generic

Your Configuration

<source>
  @type exec
  command "echo こんにちは"
  tag test
  <parse>
    @type none
  </parse>
</source>

<match test>
  @type stdout
</match>

Your Error Log

(No error, but I put the stdout output here.)

2024-04-03 16:51:59 +0900 [info]: init supervisor logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: parsing config file is succeeded path="/test/fluentd/config/in_exec/1.conf"
2024-04-03 16:51:59 +0900 [info]: gem 'fluentd' version '1.16.5'
2024-04-03 16:51:59 +0900 [info]: using configuration file: <ROOT>
  <source>
    @type exec
    command "echo こんにちは"
    tag "test"
    <parse>
      @type "none"
    </parse>
  </source>
  <match test>
    @type stdout
  </match>
</ROOT>
2024-04-03 16:51:59 +0900 [info]: starting fluentd-1.16.5 pid=439655 ruby="3.2.2"
2024-04-03 16:51:59 +0900 [info]: spawn command to main:  cmdline=["/home/daipom/.rbenv/versions/3.2.2/bin/ruby", "-r/home/daipom/.rbenv/versions/3.2.2/lib/ruby/site_ruby/3.2.0/bundler/setup", "-Eascii-8bit:ascii-8bit", "/home/daipom/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/bin/fluentd", "-c", "/test/fluentd/config/in_exec/1.conf", "--under-supervisor"]
2024-04-03 16:51:59 +0900 [info]: #0 init worker0 logger path=nil rotate_age=nil rotate_size=nil
2024-04-03 16:51:59 +0900 [info]: adding match pattern="test" type="stdout"
2024-04-03 16:51:59 +0900 [info]: adding source type="exec"
2024-04-03 16:51:59 +0900 [info]: #0 starting fluentd worker pid=439675 ppid=439655 worker=0
2024-04-03 16:51:59 +0900 [info]: #0 fluentd worker is now running worker=0
2024-04-03 16:51:59.808444702 +0900 test: {"message":"���������������"}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions