Skip to content

cksum: Reverse-engineering implicit behavior of text/binary tag/untagged #6364

@BenWiederhake

Description

@BenWiederhake

cksum has some really weird and funky implicit tags going on, see #6256

So let's figure out what exactly cksum is doing.

$ ../gnu/src/cksum -a md5 --tag README.md # This is the tagged format:
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b
$ ../gnu/src/cksum -a md5 README.md # tagged by default:
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b
$ ../gnu/src/cksum -a md5 --text README.md # tagged+text is a problem:
../gnu/src/cksum: --text mode is only supported with --untagged
Try '../gnu/src/cksum --help' for more information.
[$? = 1]
$ ../gnu/src/cksum -a md5 --text --tag README.md # tagged+text is not a problem?!
MD5 (README.md) = add2d697731ef0facc3a56207aa03a9b

So yes, something funny is going on. Let's just brute-force all possible 1024 + 256 + 64 + 16 + 4 + 1 combinations of zero to five arguments (--binary, --text, --tag, --untagged), and visualize the behavior as a graph:
general_nondet_graph

(legend: edges are marked b/t/T/U for binary/text/Tag/Untagged, and vertices are the observed behavior: E/T/A/S for Error/Tagged/UntaggedSpace/UntaggedAsterisk)

First, observe that -b/-t seems to be doing precisely what we would hope for: toggle between binary/text mode. Good!

Next, observe that --tag/--untagged seems to be the flags that have the weird behavior attached to them. In particular, the T state seems to be more that one actual state, probably differentiated along the "text-binary-axis".

Removing --untagged from the brute-force search reveals that --tag always pulls the state in the binary direction:
nountagged_nondet_graph

Removing --binary from the brute-force search reveals that --untagged always pulls the state away from E (so a binary-ish direction), but A is unreachable ("Asterisk", which indicated a binary file in the untagged format):
nobinary_nondet_graph

Hypothesis: There are three steps along the "text-binary-axis": always-binary, always-text, and binary-ish. For simplicity, let's assume the same thing along the tagged-ness-axis.

By the previous observations, --tagged implies either always-binary or binary-ish. (Probably "binary-ish".)

Ending in bU does not determine the result:

  • bU outputs A
  • TbU outputs S
  • UbU outputs A
  • bUbU outputs A
  • TUbU outputs A
  • UTbU outputs S
  • Therefore, U does not set the binary-ness to a constant, but rather depends on the tagged-ness. Huh?
  • Assuming that we start with "tagged-ish" and T/U set "always-tagged/always-untagged", this means that "tagged-ish" and "always-untagged" do not interfere with the binary-ness, but in the "always-tagged" state it sets "binary-ish". What a surprising decision! (It probably made sense at the time it was written, and is probably also why it is no longer listed in --help.)

… and that finally predicts the correct behavior without any exceptions, hooray!

A simple piece of logic, but so much pain.

End result: https://github.com/BenWiederhake/worsethanfailure_cksum/blob/master/check_model.py#L19

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions