Skip to content

cksum: Figure out the use of --binary on Windows #9168

@RenjiSann

Description

@RenjiSann

Quoting info '(coreutils) md5sum invocation':

‘-b’
‘--binary’
Treat each input file as binary, by reading it in binary mode and
outputting a ‘*’ flag. This is the inverse of ‘--text’. On
systems like GNU that do not distinguish between binary and text
files, this option merely flags each input mode as binary: the MD5
checksum is unaffected. This option is the default on systems like
MS-DOS that distinguish between binary and text files, except for
reading standard input when standard input is a terminal.

and quoting Windows' documentation

In text mode, carriage return-line feed (CRLF) combinations are translated into single line feed (LF) characters on input, and LF characters are translated to CRLF combinations on output. When a Unicode stream-I/O function operates in text mode (the default), the source or destination stream is assumed to be a sequence of multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters to wide characters (as if by a call to the mbtowc function). For the same reason, the Unicode stream-output functions convert wide characters to multibyte characters (as if by a call to the wctomb function).

So I expect that on Windows, when I open a file that contains CRLFs in text mode, I have a different checksum than when I use --binary,
which I could not witness:

$ md5sum --version
md5sum (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.

$ xxd my_test
00000000: 7465 7374 0d0a 7465 7374                 test..test # CRLF File
$ xxd my_test2
00000000: 7465 7374 0a74 6573 74                   test.test  # LF File

CRLF File

$ �md5sum my_test
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ md5sum my_test --binary
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ md5sum my_test --text
76ce9f441de2ed5de337d391ad4516b7  my_test  # I would have expected this hash to be different

$ coreutils.exe hashsum --md5 my_test
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ coreutils.exe hashsum --md5 my_test --binary
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ coreutils.exe hashsum --md5 my_test --text
be778b473235e210cc577056226536a4  my_test # Like this one

LF File

$.md5sum my_test2
be778b473235e210cc577056226536a4 *my_test2
$ md5sum my_test2 --binary
be778b473235e210cc577056226536a4 *my_test2
$ md5sum my_test2 --text
be778b473235e210cc577056226536a4  my_test2 # This I expect

$ coreutils.exe hashsum --md5 my_test2
be778b473235e210cc577056226536a4 *my_test2
$ coreutils.exe hashsum --md5 my_test2 --binary
be778b473235e210cc577056226536a4 *my_test2
$ coreutils.exe hashsum --md5 my_test2 --text
be778b473235e210cc577056226536a4  my_test2 # Like this one

So several reasons could justify this problem:

  1. My Windows setup is flawed and I have a skill issue preventing me to witness the desired behavior
  2. The version of coreutils I could get my hands for Windows is too old
  3. I misunderstood the documentation
  4. There is a bug in GNU's coreutils

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions