Skip to content

tr: Need to rewrite solve_set_characters and TranslateOperationComplement #6133

@BenWiederhake

Description

@BenWiederhake

I was trying to improve tr, gave up, and I want to write down some findings, so I'm abusing the issue track to track the following issues

Sorry for creating yet another meta-issue, i've split it into a few manageable regular issues.

Should not accept unaligned [:upper:]

Moved to #6341

Should fix whitespace definition

Fixed by #6141

Fun fact: I was actually trying to detect a different bug (unicode whitespace), but that's apparently alright.

Does this ordering issue affect other character classes, too? No, apparently not.

Should not allow special classes in set2

Moved to #6342

Should handle high input bytes in command-line

Moved to #6343

Should handle high input bytes in input during translate complement

$ echo -en '\001amp\0376\0377' | tr -c 'abc' '0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEF'     
1aadBC
$ echo -en '\001amp\0376\0377' | cargo run tr -c 'abc' '0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789ABCDEF'     
thread 'main' panicked at 'attempt to add with overflow', /rustc/90c541806f23a127002de5b4038be731ba1458ca/library/core/src/iter/range.rs:391:1
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
[$? = 101]

Fixed by #6340

Correctly "fill" set2 in case of complement

Moved to #6344

Replicate weird behavior around 256-limit

Moved to #6345

Feel free to grab this issue (or parts of it).

I gave up for two reasons:

  • Deduplicating set1 and fixing a proper ordering for set2 completely goes against how the current tr implementation works, so I feel a total rewrite of solve_set_characters and TranslateOperationComplement is necessary. (EDIT: Fixed by tr: calculate complement set early #6340!)
  • However, I can't build a reasonably-complete mental model of how exactly that needs to work. I hope the above examples provide a good starting point for whoever comes after me.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions