Scripts: command_history behaviour by Lestropie · Pull Request #1389 · MRtrix3/mrtrix3

Lestropie · 2018-07-09T03:09:19Z

This was giving me the irrits with my own data, so I fixed it.
(More details in commit messages)

Do however need to make sure that all scripts behave as expected before merging. They pass pylint, but best to be sure.

One thing I did with dwiintensitynorm / population_template is: outputs that are based on the whole population, rather than just one input subject, will only have a single line in command_history, corresponding to the script invocation. I feel like that's more 'honest' information than including the command history of one arbitrarily-chosen subject in the history of such data.

Closes #1188.

Because Python scripts frequently call multiple underlying commands, the key-value field 'command_history' typically gets mangled by including a large number of steps that occur internally within the script and were not executed by the user; additionally, any requisite conversion to NIfTI within a script will result in loss of the command history, as well as any other header key-value fields. This commit adds the necessary functionalities in order for the header key-value entries of an image generated by an MRtrix3 Python script to behave identically to any C++ executable command: The key-value entries are copied from the input image to the output, and field 'command_history' is updated specifically with the command-line invocation of that particular script. This is handled internally using a new mrconvert option '-compel_keyvalues', which is appropriately hidden from users. Closes #1188.

This option will erase two key-value entries from the image header: - "comments": This frequently contains a subject name from DICOM conversion. - "command_history": This can contain subject identifying information from file / directory names of either previous steps, or indeed from the command currently being executed; accordingly, removal of this field must occur within function Header::create(), as the concatenation of the currently-executed command should also be suppressed if the goal is to yield anonymised data. Requirement logged in #1188.

Lestropie · 2018-07-10T01:57:45Z

Current code:

$ dwi2response tournier dwi.mif -mask mask.mif test.txt -force -voxels test.mif
...
$ mrinfo test.mif 
************************************************
Image:               "test.mif"
************************************************
  ...
  command_history:   /home/rob/src/mrtrix3/bin/mrconvert "/home/rob/MRI_data/SIFT2/diffusion/dwi.mif" "-" "-strides" "0,0,0,1"  (version=3.0_RC3-73-g343dba54-dirty)
  [9 entries]        /home/rob/src/mrtrix3/bin/dwiextract "-" "/home/rob/MRI_data/SIFT2/diffusion/dwi2response-tmp-1IOC8S/dwi.mif" "-singleshell" "-no_bzero"  (version=3.0_RC3-73-g343dba54-dirty)
                     ...
                     /home/rob/src/mrtrix3/bin/mrthreshold "iter5_CF.mif" "-top" "300" "iter5_SF.mif"  (version=3.0_RC3-73-g343dba54-dirty)
                     /home/rob/src/mrtrix3/bin/mrconvert "voxels.mif" "/home/rob/MRI_data/SIFT2/diffusion/test.mif" "-force"  (version=3.0_RC3-73-g343dba54-dirty)
  ...

This PR:

$ dwi2response tournier dwi.mif -mask mask.mif test.txt -force -voxels test.mif
...
$ mrinfo test.mif 
************************************************
Image:               "test.mif"
************************************************
  ...
  command_history:   /home/rob/src/mrtrix3/bin/dwi2response "tournier" "dwi.mif" "-mask" "mask.mif" "test.txt" "-force" "-voxels" "test.mif"  (version=3.0_RC3-37-g6bd397ce)
  ...

@thijsdhollander: This also adds an -anonymise option to mrconvert to prevent command_history from appearing at all in the output file, which should solve the issue you had with the workshop data.

jdtournier · 2018-07-11T12:18:32Z

Looks good.

Not sure about the -anonymise option though. It seems the same can be achieved with mrconvert -clear command_history -clear comments? I'm not sure this warrants a whole new option when we can simply document this. Also, I'm not convinced it's necessarily a good idea to call this -anonymise, this implies some level of guarantee that I'm not convinced we can provide - there may be patient identifiable information in other (custom) fields that would be totally ignored (unlikely, admittedly, but possible). Whereas I think simply documenting the process for clearing these fields doesn't hold us responsible for ensuring actual anonymity in the output data (at least not to the same extent).

Also, as implemented you've put the handling of the -anonymise option deep in the core header handling code, in the main library - but only mrconvert has that option available. I'm guessing you're anticipating this becoming available in other commands too? In which case it would make sense to have it in the standard options section. Otherwise, I'd say the handling of this option should remain within cmd/mrconvert.cpp.

Lestropie · 2018-07-11T13:01:13Z

The problem with mrconvert -clear_property command_history is that the current command will still be written to this field within Header::create() after cmd/mrconvert.cpp has wiped it, and that one line may contain identifying information; that's specifically the issue Thijs ran into with the workshop data. That's why handling wiping this field needs to be in Header::create().

In retrospect though, an alternative solution would be to ditch the -anonymise option, and instead look for the presence of -clear_property command_history within Header::create(), using the presence of that option to prevent addition of the one line representing the current command. This wouldn't have the same "implied guarantees" than an "-anonymise" option may...?

thijsdhollander · 2018-07-12T08:55:26Z

Also, I'm not convinced it's necessarily a good idea to call this -anonymise, this implies some level of guarantee that I'm not convinced we can provide - there may be patient identifiable information in other (custom) fields that would be totally ignored (unlikely, admittedly, but possible).

Yeah, I agree with that entirely. Also the reverse, I think: there are quite feasible scenarios where anonymising would not strictly have to imply deleting the command history. But it's a sneaky one, since folder names can end up invariably in the command history, even actually caused by e.g. other earlier steps that performed some kind(s) of anonymisation, but in doing so, those commands inserted sensitive information in the command history. From this, it's clear that a page in the documentation entirely dedicated to the topic of anonimisation wouldn't be a bad idea (if anyone has the time to write it, that is). So well, for all those reasons, I wouldn't include this -anonymise option with this functionality either. Also, anything of this kind of functionality (even just the options to remove command history entries or comments) should definitely not appear in other commands (but mrconvert), I'd say.

That of course doesn't do away with the fact that...

The problem with mrconvert -clear_property command_history is that the current command will still be written to this field within Header::create() after cmd/mrconvert.cpp has wiped it,

...but then entirely independent of the other (anonymisation related) fact that...

and that one line may contain identifying information; that's specifically the issue Thijs ran into with the workshop data. That's why handling wiping this field needs to be in Header::create().

So all anonymisation entirely aside (even though extra sneaky in that context), when I encountered this, I found this more of a case of conflicting post-conditions from a user interface point of view (i.e. user interfacing with the command; agnostic of implementation). Our post-conditions towards the user aren't entirely formalised, but are often (intuitively) implied. One of them is that running a command will (typically) append it to the command history. Another one is that mrconvert -clear_property clears a property, with "clear" meaning removing the corresponding header field and contents, so the post-condition being that they are no longer present after this option has been applied. Fulfilling both in a mrconvert -clear_property command_history is clearly impossible. Introducing a notion of the order of both operations towards the user is violating good encapsulation (and still violating a post-condition anyway, by the way). However, since these post-conditions aren't formalised anywhere, the question is which one is intuitively in the users' mind/intention when they run mrconvert -clear_property command_history. This is by definition subjective, but I'd argue that them actively specifying that they wish to clear the property, means that that goal and the associated post-condition is their first and foremost interest. Also, many "typical" users aren't even relying on the command history for anything, I'd think (most of them are probably not running commands with the intention of adding stuff to the command history; it's just a side effect, an "optional" feature if you will). So well, I'd then argue that mrconvert -clear_property command_history needs to fulfil the post-condition of removing that property (entirely). Also, just from a practical point of view, I don't think there's much use (to anyone even?) to have that particular command in the command history...?

look for the presence of -clear_property command_history within Header::create(), using the presence of that option to prevent addition of the one line representing the current command.

So I'd agree with that. 👍 (if that is the cleanest way to achieve this? Sadly still a bit of an ugly hack, but there may simply be no other way...?)

This wouldn't have the same "implied guarantees" than an "-anonymise" option may...?

No, certainly not. The only promise there is "to remove the command history field and contents". Actually, at the moment that is the somewhat "implied" guarantee that is not yet delivered upon. So fixing this would better serve the user; even in a scenario where they're after anonymisation step(s) and wouldn't want to be caught off-guard by the current somewhat "sneaky" behaviour.

jdtournier · 2018-07-12T10:54:09Z

Right, I'd forgotten that particular issue (that -clear command_history was applied prior to adding the current command to the history). That clearly needs to be fixed. Glad we're all on side regarding the -anonymise option.

As to how best to achieve that, well I guess the idea of inserting the handling of the option in the Header::create() is an appealing candidate. There is another option however, which I reckon might be better and cleaner: we move the lines responsible for appending to the command history from the Header::create() call to the Header constructors - just add a Header::append_command_to_history() call or something, and invoke in all constructors (apart from move constructor). In practice, this means nothing changes in terms of how the rest of the code is written, but it does mean that manipulations to the command history before the Header::create() will be final. What do you reckon?

There is one potential issue with this approach, which is that we'd get duplicate entries in the history if a Header is constructed (either fresh or by copy), and then copy-constructed again subsequently, and that copy is then fed to Header::create(). I'd need to check whether that might be a serious problem - I have a feeling such operations happen routinely to handle numbered file formats (i.e. here and here), I'd need to check that this doesn't introduce issues... If it does, then the handling would need to remain where it is, as would the explicit clearing of the history - can't really see a way around that.

jdtournier · 2018-07-12T10:57:12Z

Actually, on second thoughts, I can't see how my idea of appending to history from Header constructor can work. That would mean the Header::open() would insert the current command as the first entry in the history, before it's even started loading anything, since it's using a local default-constructed Header. Not a great idea...

- Remove -anonymise option added in 6bd397c. - When -clear_property command_history is used, prevent concatenation of the current command string to the "command_history" property within Header::create(). based on discussion in #1389.

Lestropie · 2018-07-13T05:34:59Z

👍?

jdtournier · 2018-07-17T15:04:19Z

Sorry for taking my time over this, difficult to find the time with everything going on ATM...

Having had a look through your proposed changes, there's a few things about it that I'd like to modify. In no particular order:

I'm not sold on the hidden -compel_keyvalues idea. I'd rather we had an explicit, non-hidden -copy_properties original option, which would transfer all generic header entries from original over to the output image wholesale (this might be a useful option to have anyway). You could then additionally use the -set_property command_history "mrconvert blah blah" to override the command-line history (or maybe the -append_property option would be better suited here?). It would then be a matter of ensuring the current command wasn't also added to the command history, much like we've been discussing already.
I'm still uneasy with so much of the handling being in the main Header::create() call. I think I'd rather add a bool add_command_to_history = true default argument added to the Header::create() and Image::create() calls, which mrconvert could then tap into to prevent the current command from being added to the history. This would mean that all command history manipulations could be performed within the mrconvert code, removing the need for such explicit manipulation in the library itself.
Approach (1) above would remove the need for hidden options, which I'm not a fan of - if we need to hide functionality like this, I reckon we'd be better off using environment variables or similar approaches, which bypass the command-line parsing altogether. In fact, it might be an idea to modify the current App::parse_special_options() call to such an approach... Means we'd need to modify docs/generate_user_docs.sh to run something like:
```
__MRTRIX_DUMP_HELP_PAGE__=SYNOPSIS command
```
instead of:
```
command __print_synopsis__
```
as is currently the case. But I digress...

I realise this means fairly drastic changes to your pull request... What do you reckon...?

Not within core library, as discussed on #1389

jdtournier · 2018-07-20T16:17:56Z

OK, I had a go at modifying the handling to match what I was suggesting. I think it's cleaner that way, and involves fewer modifications to the core. It'll need testing to make sure it works as advertised, and also the testing will need to be fixed up (not sure why Travis is struggling so much at the moment, but there's a lot of errors on multiple fronts...). Let me know what you think...

Using type_image_in() identifier for the input argument to this option would have precluded import of header key-value pairs via JSON file, which is utilised within dwipreproc to erase specific key-value pairs upon script completion.

Functionality of loading key-value entries from a JSON file was moved from core/header.cpp to cmd/mrconvert.cpp in 31fd405.

Rather than just detecting the use of -copy_properties and deleting the next five entries (assuming the use of function app.mrconvertOutputOption()), which may cause issues if a script were to use mrconvert -copy_properties in some other way, be more selective in the detection and removal of these options from the terminal printout.

Lestropie · 2018-07-21T02:40:32Z

Some small tweaks; looks fine, but I need to confirm that dwipreproc works before merging.

Lestropie · 2018-08-01T01:38:15Z

Confirmed that dwipreproc works and that the header fields are correct. So happy for this to go now to dev, unless anybody else wants to test it further beforehand.

jdtournier · 2018-08-01T14:26:25Z

Merged. This is going to dev, it's the best place for it to receive more exhaustive testing...

Function app.mrconvertOutputOption() can only be applied when run.command() is calling mrconvert; the command-line options it exploits are not present in other binaries.

dwiintensitynorm: Fix bug introduced in #1389

dwipreproc performs explicit removal of unwanted header fields from the output DWI, as introduced in 47e7288 / #1389, by exporting the DWI header key-value pairs to a JSON file and explicitly editing the JSON contents. The writing of the input JSON file was however lost during the process of resolving divergent code in 7617787. This commit fixes the issue. Correct operation of code for selecting & executing different versions of eddy - including if both CUDA and OpenMP versions are installed but the former does not run correctly - has been verified for the changes in #1449.

Related to 5eec075; follows #1389. Because the contents of the current command string that are to be appended to "command_history" may itself contain escape quotes (e.g. for filepaths containing spaces), this becomes error-prone if trying to parse within the input to run.command(). Therefore, with these changes, the intended "template" image / file from which key-value pairs will be pulled during a final mrconvert call at the end of a Python script is instead handled by providing a named argument to run.command(), "-mrconvert_keyval". run.command() is then responsible for inserting the relevant command-line options to the mrconvert call. Coincidentally this also makes it much easier to avoid including these particular command-line options in the terminal output feedback generated by run.command(). Due to these modifications, the management of forcing output file overwrite has also been moved to a named input argument for run.command(). Similarly to C++ in 5eec075: - "COMMAND_STRING" has been renamed to "COMMAND_HISTORY_STRING" to better reflect its contents. - Functions matrix.save_*() accept dictionaries via the "header" and "footer" named arguments, and will write the corresponding key-value pairs as comments in the output text file. - Boolean controlling whether or not the current command string should be appended to "command_history" is provided as a named argument. Note changes also include basic fixes to matrix.load_numeric() and matrix.save_numeric().

Lestropie added 2 commits July 6, 2018 18:09

Lestropie added enhancement scripts labels Jul 9, 2018

Lestropie self-assigned this Jul 9, 2018

Lestropie added 2 commits July 13, 2018 12:06

Merge branch 'dev' into scripts_command_history

2daca80

mrconvert: handle command-line history modifications within app

31fd405

Not within core library, as discussed on #1389

jdtournier force-pushed the scripts_command_history branch from 5795a94 to 31fd405 Compare July 20, 2018 15:59

jdtournier added 2 commits July 20, 2018 17:02

app.cpp: remove hidden option handling

a1356a5

update script command history handling to use copy_properties

066859e

Lestropie added 3 commits July 21, 2018 11:37

mrconvert -copy_properties: Make type_various()

37e79b7

Using type_image_in() identifier for the input argument to this option would have precluded import of header key-value pairs via JSON file, which is utilised within dwipreproc to erase specific key-value pairs upon script completion.

core/header.cpp: Remove JSON header

bb6e743

Functionality of loading key-value entries from a JSON file was moved from core/header.cpp to cmd/mrconvert.cpp in 31fd405.

jdtournier added 2 commits July 30, 2018 16:20

Merge branch 'dev' into scripts_command_history

abe7f33

update docs to match syntax changes

9aa69c4

Merge branch 'dev' into scripts_command_history

b26311d

jdtournier merged commit f831ec3 into dev Aug 1, 2018

jdtournier deleted the scripts_command_history branch August 1, 2018 14:26

Lestropie mentioned this pull request Sep 1, 2018

Scripts: Proper behaviour of command_history #1188

Closed

Lestropie added a commit that referenced this pull request Jan 14, 2019

Merge pull request #1534 from MRtrix3/dwiintensitynorm_fix

ca806b0

dwiintensitynorm: Fix bug introduced in #1389

Lestropie mentioned this pull request May 30, 2020

dwifslpreproc: Don't discard modified dw_scheme #2067

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scripts: command_history behaviour#1389

Scripts: command_history behaviour#1389
jdtournier merged 13 commits intodevfrom
scripts_command_history

Lestropie commented Jul 9, 2018

Uh oh!

Lestropie commented Jul 10, 2018

Uh oh!

jdtournier commented Jul 11, 2018

Uh oh!

Lestropie commented Jul 11, 2018

Uh oh!

thijsdhollander commented Jul 12, 2018

Uh oh!

jdtournier commented Jul 12, 2018

Uh oh!

jdtournier commented Jul 12, 2018

Uh oh!

Lestropie commented Jul 13, 2018

Uh oh!

jdtournier commented Jul 17, 2018

Uh oh!

jdtournier commented Jul 20, 2018

Uh oh!

Lestropie commented Jul 21, 2018

Uh oh!

Lestropie commented Aug 1, 2018

Uh oh!

jdtournier commented Aug 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Lestropie commented Jul 9, 2018

Uh oh!

Lestropie commented Jul 10, 2018

Uh oh!

jdtournier commented Jul 11, 2018

Uh oh!

Lestropie commented Jul 11, 2018

Uh oh!

thijsdhollander commented Jul 12, 2018

Uh oh!

jdtournier commented Jul 12, 2018

Uh oh!

jdtournier commented Jul 12, 2018

Uh oh!

Lestropie commented Jul 13, 2018

Uh oh!

jdtournier commented Jul 17, 2018

Uh oh!

jdtournier commented Jul 20, 2018

Uh oh!

Lestropie commented Jul 21, 2018

Uh oh!

Lestropie commented Aug 1, 2018

Uh oh!

jdtournier commented Aug 1, 2018

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants