-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Quoting info '(coreutils) md5sum invocation':
‘-b’
‘--binary’
Treat each input file as binary, by reading it in binary mode and
outputting a ‘*’ flag. This is the inverse of ‘--text’. On
systems like GNU that do not distinguish between binary and text
files, this option merely flags each input mode as binary: the MD5
checksum is unaffected. This option is the default on systems like
MS-DOS that distinguish between binary and text files, except for
reading standard input when standard input is a terminal.
and quoting Windows' documentation
In text mode, carriage return-line feed (CRLF) combinations are translated into single line feed (LF) characters on input, and LF characters are translated to CRLF combinations on output. When a Unicode stream-I/O function operates in text mode (the default), the source or destination stream is assumed to be a sequence of multibyte characters. Therefore, the Unicode stream-input functions convert multibyte characters to wide characters (as if by a call to the mbtowc function). For the same reason, the Unicode stream-output functions convert wide characters to multibyte characters (as if by a call to the wctomb function).
So I expect that on Windows, when I open a file that contains CRLFs in text mode, I have a different checksum than when I use --binary,
which I could not witness:
$ md5sum --version
md5sum (GNU coreutils) 8.32
Copyright (C) 2020 Free Software Foundation, Inc.
$ xxd my_test
00000000: 7465 7374 0d0a 7465 7374 test..test # CRLF File
$ xxd my_test2
00000000: 7465 7374 0a74 6573 74 test.test # LF File
CRLF File
$ �md5sum my_test
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ md5sum my_test --binary
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ md5sum my_test --text
76ce9f441de2ed5de337d391ad4516b7 my_test # I would have expected this hash to be different
$ coreutils.exe hashsum --md5 my_test
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ coreutils.exe hashsum --md5 my_test --binary
76ce9f441de2ed5de337d391ad4516b7 *my_test
$ coreutils.exe hashsum --md5 my_test --text
be778b473235e210cc577056226536a4 my_test # Like this one
LF File
$.md5sum my_test2
be778b473235e210cc577056226536a4 *my_test2
$ md5sum my_test2 --binary
be778b473235e210cc577056226536a4 *my_test2
$ md5sum my_test2 --text
be778b473235e210cc577056226536a4 my_test2 # This I expect
$ coreutils.exe hashsum --md5 my_test2
be778b473235e210cc577056226536a4 *my_test2
$ coreutils.exe hashsum --md5 my_test2 --binary
be778b473235e210cc577056226536a4 *my_test2
$ coreutils.exe hashsum --md5 my_test2 --text
be778b473235e210cc577056226536a4 my_test2 # Like this one
So several reasons could justify this problem:
- My Windows setup is flawed and I have a skill issue preventing me to witness the desired behavior
- The version of coreutils I could get my hands for Windows is too old
- I misunderstood the documentation
- There is a bug in GNU's coreutils