Skip to content

IO::Read is changing encoding from ASCII-8BIT(binary) to UTF-8 #4986

@nightsurge

Description

@nightsurge

Environment

Provide at least:

  • JRuby version 9.1.15.0
  • JRUBY_OPTS ='-J-Xmx2048m -J-Xmn512m --server'
  • MacOS 10.12.6

Other relevant info you may wish to add:

  • HTTP.rb (http gem v 3.0.0)
  • Rails 4.1

Expected Behavior

  • I expect that when I am reading from a file that the encoding should be binary/ASCII-8BIT
require "stringio"

# Provides IO interface
class IOTest
  # IO object
  def initialize(io)
    @buffer = String.new
    puts "@buffer: #{@buffer.encoding} in initialize\n\n"
    @io = if io.is_a?(String)
            StringIO.new(io)
          elsif io.respond_to?(:read)
            io
          else
            raise ArgumentError,
              "#{io.inspect} is neither a String nor an IO object"
          end
  end

  # @param [Integer] length Number of bytes to retrieve
  # @param [String] outbuf String to be replaced with retrieved data
  #
  # @return [String, nil]
  def read(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_outbuf(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end

  def read_force_both(length = nil, outbuf = nil)
    outbuf = outbuf.to_s.clear
    outbuf.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} setup"
    puts "@buffer: #{@buffer.encoding} setup"
    @io.read(length, @buffer)
    outbuf << @buffer.force_encoding(Encoding::BINARY)

    puts "Outbuf: #{outbuf.encoding} after read"
    puts "@buffer: #{@buffer.encoding} after read"
    outbuf << @buffer

    puts "Outbuf: #{outbuf.encoding} after outbuf append"
    puts "@buffer: #{@buffer.encoding} after outbuf append"

    if length
      length -= @buffer.length
      break if length.zero?
    end

    outbuf unless length && outbuf.empty?
  end
end

the_test = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read without force_encoding"
the_test.read
puts "**************** END read without force_encoding\n\n"


the_test_2 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with outbuf force_encoding"
the_test_2.read_force_outbuf
puts "**************** END read with outbuf force_encoding\n\n"

the_test_3 = IOTest.new(File.new('/some_image_file.jpg'))
puts "**************** read with both outbuf/@buffer force_encoding"
the_test_3.read_force_both
puts "**************** END read with both outbuf/@buffer force_encoding"

Actual Behavior

  • reading IO from a file converts buffer to UTF-8 and breaks further downstream file upload processes as a result
@buffer: ASCII-8BIT in initialize

**************** read without force_encoding
Outbuf: US-ASCII setup
@buffer: ASCII-8BIT setup
Outbuf: US-ASCII after read
@buffer: UTF-8 after read
Outbuf: UTF-8 after outbuf append
@buffer: UTF-8 after outbuf append
**************** END read without force_encoding

@buffer: ASCII-8BIT in initialize

**************** read with outbuf force_encoding
Outbuf: ASCII-8BIT setup
@buffer: ASCII-8BIT setup
Outbuf: ASCII-8BIT after read
@buffer: UTF-8 after read
Outbuf: UTF-8 after outbuf append
@buffer: UTF-8 after outbuf append
**************** END read with outbuf force_encoding

@buffer: ASCII-8BIT in initialize

**************** read with both outbuf/@buffer force_encoding
Outbuf: ASCII-8BIT setup
@buffer: ASCII-8BIT setup
Outbuf: ASCII-8BIT after read
@buffer: ASCII-8BIT after read
Outbuf: ASCII-8BIT after outbuf append
@buffer: ASCII-8BIT after outbuf append
**************** END read with both outbuf/@buffer force_encoding

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions