Skip to content

Conversation

@sylvestre
Copy link
Contributor

@sylvestre sylvestre commented May 25, 2024

Summary of the change:

  • Move the common code into checksum
  • Create a structure HashAlgorithm to handle the algorithm (instead of the 3 variables)
  • Use the same function for cksum & hashsum for --check (perform_checksum_validation)
  • Use the same for function for the hash generation (digest_reader)
  • Add unit tests
  • Add integration tests
  • Fix some incorrect tests

@sylvestre
Copy link
Contributor Author

Sorry for the big commit but as you can imagine, it wasn't possible to split it :(

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cksum/md5sum. tests/cksum/md5sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/md5sum-bsd. tests/cksum/md5sum-bsd is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/sha1sum. tests/cksum/sha1sum is passing on 'main'. Maybe you have to rebase?
Skip an intermittent issue tests/tail/inotify-dir-recreate (fails in this run but passes in the 'main' branch)

@sylvestre sylvestre marked this pull request as draft May 25, 2024 11:00
@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cksum/md5sum. tests/cksum/md5sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/md5sum-bsd. tests/cksum/md5sum-bsd is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/sha1sum. tests/cksum/sha1sum is passing on 'main'. Maybe you have to rebase?

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't have the time to go though all the code itself, but the tests alone are a great starting point :D

  • A filename confusion, which might allow an attacker to pass verification by simply staging a "correct" file at the confused filename
  • A few missed errors about silly flag combinations
  • It seems like you removed some of the flag parsing code in hashsum? I can't see where some of the flags get stored.
  • Some forgotten debug prints in test code.

Comment on lines +182 to +185
pub const STATUS: &str = "status";
pub const WARN: &str = "warn";
pub const IGNORE_MISSING: &str = "ignore-missing";
pub const QUIET: &str = "quiet";
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is named "refactor the common code" and you implement 4 new features? XD

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i had to :(
because one of them didn't implemented the option of the other

Copy link
Collaborator

@BenWiederhake BenWiederhake left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Removing the "changes requested" flag due to administrative reasons, even though I'm still requesting the above changes.)

@github-actions
Copy link

GNU testsuite comparison:

GNU test failed: tests/cksum/cksum-c. tests/cksum/cksum-c is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/md5sum. tests/cksum/md5sum is passing on 'main'. Maybe you have to rebase?
GNU test failed: tests/cksum/md5sum-bsd. tests/cksum/md5sum-bsd is passing on 'main'. Maybe you have to rebase?

@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

sylvestre added 10 commits May 29, 2024 09:08
Summary of the change:
* Move the common code into checksum
* Create a structure HashAlgorithm to handle the algorithm (instead of the 3 variables)
* Use the same function for cksum & hashsum for --check (perform_checksum_validation)
* Use the same for function for the hash generation (digest_reader)
* Add unit tests
* Add integration tests
* Fix some incorrect tests
We have 3 different kinds of input:
* "algo (filename) = checksum"
  example: `BLAKE2 (a) = bedfbb90d858c2d67b7ee8f7523be3d3b54004ef9e4f02f2ad79a1d05bfdfe49b81e3c92ebf99b504102b6bf003fa342587f5b3124c205f55204e8c4b4ce7d7c`
* "checksum  filename"
  example: `60b725f10c9c85c70d97880dfe8191b3  a`
* "checksum filename"
  example: `60b725f10c9c85c70d97880dfe8191b3 a`

These algo/regexp are tricky as files can be called "a, " b", " ", or "*c".
We look at the first time to analyze the kind of input and reuse the same regexp then.
@github-actions
Copy link

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre marked this pull request as ready for review June 1, 2024 08:36
@sylvestre
Copy link
Contributor Author

@cakebaker it is ready for review. :)
thanks

@github-actions
Copy link

github-actions bot commented Jun 2, 2024

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

Comment on lines +316 to +330
const ALGO_BASED_REGEX: &str = r"^\s*\\?(?P<algo>(?:[A-Z0-9]+|BLAKE2b))(?:-(?P<bits>\d+))?\s?\((?P<filename>.*)\)\s*=\s*(?P<checksum>[a-fA-F0-9]+)$";
const DOUBLE_SPACE_REGEX: &str = r"^(?P<checksum>[a-fA-F0-9]+)\s{2}(?P<filename>.*)$";

// In this case, we ignore the *
const SINGLE_SPACE_REGEX: &str = r"^(?P<checksum>[a-fA-F0-9]+)\s(?P<filename>\*?.*)$";

fn get_filename_for_output(filename: &OsStr, input_is_stdin: bool) -> String {
if input_is_stdin {
"standard input"
} else {
filename.to_str().unwrap()
}
.maybe_quote()
.to_string()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would swap the consts and the get_filename_for_output function so that the consts are next to the determine_regex function.

filename: &OsStr,
input_is_stdin: bool,
lines: &[String],
) -> UResult<(Regex, bool)> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of this return type and you already have to workaround it in the perform_checksum_validation function ;-) But I think fixing it is something for a future PR.

Comment on lines +365 to +366
#[allow(clippy::too_many_arguments)]
#[allow(clippy::cognitive_complexity)]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very true :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, i will improve this in a future PR :)

Some(Some(bits_value / 8))
} else {
properly_formatted = false;
None // Return None to signal a parsing or divisibility issue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in case of a parse issue you don't reach this code.

Suggested change
None // Return None to signal a parsing or divisibility issue
None // Return None to signal a divisibility issue

// When a specific algorithm name is input, use it and default bits to None
(a.to_lowercase(), length_input)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the last part of the comment, "and default bits to None", is incorrect.

util_name(),
filename_input.maybe_quote(),
);
//skip_summary = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//skip_summary = true;

}
}

/// Calculates the length of the digest for the given algorithm.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Calculates the length of the digest for the given algorithm.
/// Calculates the length of the digest.

Comment on lines +40 to +45
//check: bool,
tag: bool,
nonames: bool,
status: bool,
quiet: bool,
strict: bool,
warn: bool,
//status: bool,
//quiet: bool,
//strict: bool,
//warn: bool,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the reason for commenting them out instead of removing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because i will implement them next

//let strict = matches.get_flag("strict");
let warn = matches.get_flag("warn") && !status;
let zero = matches.get_flag("zero");
let zero: bool = matches.get_flag("zero");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let zero: bool = matches.get_flag("zero");
let zero = matches.get_flag("zero");


let reader = BufReader::new(file);
let lines: Vec<String> = reader.lines().collect::<Result<_, _>>()?;
let (chosen_regex, algo_based_format) =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rename algo_based_format to something like is_algo_based_format so that it's obvious it's a bool.


let length = match input_length {
Some(length) => {
if binary_name == ALGORITHM_OPTIONS_BLAKE2B || binary_name == "b2sum" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

binary_name == ALGORITHM_OPTIONS_BLAKE2B is always false because of line 186 ;-)

Comment on lines 251 to 252
let text_flag: bool = matches.get_flag("text");
let binary_flag: bool = matches.get_flag("binary");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let text_flag: bool = matches.get_flag("text");
let binary_flag: bool = matches.get_flag("binary");
let text_flag = matches.get_flag("text");
let binary_flag = matches.get_flag("binary");

Comment on lines 258 to 263
// Determine the appropriate algorithm option to pass
let algo_option = if algo.name.is_empty() {
None
} else {
Some(algo.name)
};
Copy link
Contributor

@cakebaker cakebaker Jun 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible I'm missing something, but I think algo.name is never empty and thus the algo_option var is unnecessary.

@github-actions
Copy link

github-actions bot commented Jun 3, 2024

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@sylvestre sylvestre requested a review from cakebaker June 3, 2024 18:53
@github-actions
Copy link

github-actions bot commented Jun 3, 2024

GNU testsuite comparison:

Skipping an intermittent issue tests/tail/inotify-dir-recreate (passes in this run but fails in the 'main' branch)

@cakebaker cakebaker merged commit f56e121 into uutils:main Jun 4, 2024
@cakebaker
Copy link
Contributor

Good work :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants