Skip to content

Scripts: command_history behaviour#1389

Merged
jdtournier merged 13 commits intodevfrom
scripts_command_history
Aug 1, 2018
Merged

Scripts: command_history behaviour#1389
jdtournier merged 13 commits intodevfrom
scripts_command_history

Conversation

@Lestropie
Copy link
Member

This was giving me the irrits with my own data, so I fixed it.
(More details in commit messages)

Do however need to make sure that all scripts behave as expected before merging. They pass pylint, but best to be sure.

One thing I did with dwiintensitynorm / population_template is: outputs that are based on the whole population, rather than just one input subject, will only have a single line in command_history, corresponding to the script invocation. I feel like that's more 'honest' information than including the command history of one arbitrarily-chosen subject in the history of such data.

Closes #1188.

Lestropie added 2 commits July 6, 2018 18:09
Because Python scripts frequently call multiple underlying commands, the key-value field 'command_history' typically gets mangled by including a large number of steps that occur internally within the script and were not executed by the user; additionally, any requisite conversion to NIfTI within a script will result in loss of the command history, as well as any other header key-value fields.
This commit adds the necessary functionalities in order for the header key-value entries of an image generated by an MRtrix3 Python script to behave identically to any C++ executable command: The key-value entries are copied from the input image to the output, and field 'command_history' is updated specifically with the command-line invocation of that particular script.
This is handled internally using a new mrconvert option '-compel_keyvalues', which is appropriately hidden from users.
Closes #1188.
This option will erase two key-value entries from the image header:
- "comments": This frequently contains a subject name from DICOM conversion.
- "command_history": This can contain subject identifying information from file / directory names of either previous steps, or indeed from the command currently being executed; accordingly, removal of this field must occur within function Header::create(), as the concatenation of the currently-executed command should also be suppressed if the goal is to yield anonymised data.
Requirement logged in #1188.
@Lestropie
Copy link
Member Author

Current code:

$ dwi2response tournier dwi.mif -mask mask.mif test.txt -force -voxels test.mif
...
$ mrinfo test.mif 
************************************************
Image:               "test.mif"
************************************************
  ...
  command_history:   /home/rob/src/mrtrix3/bin/mrconvert "/home/rob/MRI_data/SIFT2/diffusion/dwi.mif" "-" "-strides" "0,0,0,1"  (version=3.0_RC3-73-g343dba54-dirty)
  [9 entries]        /home/rob/src/mrtrix3/bin/dwiextract "-" "/home/rob/MRI_data/SIFT2/diffusion/dwi2response-tmp-1IOC8S/dwi.mif" "-singleshell" "-no_bzero"  (version=3.0_RC3-73-g343dba54-dirty)
                     ...
                     /home/rob/src/mrtrix3/bin/mrthreshold "iter5_CF.mif" "-top" "300" "iter5_SF.mif"  (version=3.0_RC3-73-g343dba54-dirty)
                     /home/rob/src/mrtrix3/bin/mrconvert "voxels.mif" "/home/rob/MRI_data/SIFT2/diffusion/test.mif" "-force"  (version=3.0_RC3-73-g343dba54-dirty)
  ...

This PR:

$ dwi2response tournier dwi.mif -mask mask.mif test.txt -force -voxels test.mif
...
$ mrinfo test.mif 
************************************************
Image:               "test.mif"
************************************************
  ...
  command_history:   /home/rob/src/mrtrix3/bin/dwi2response "tournier" "dwi.mif" "-mask" "mask.mif" "test.txt" "-force" "-voxels" "test.mif"  (version=3.0_RC3-37-g6bd397ce)
  ...

@thijsdhollander: This also adds an -anonymise option to mrconvert to prevent command_history from appearing at all in the output file, which should solve the issue you had with the workshop data.

@jdtournier
Copy link
Member

Looks good.

Not sure about the -anonymise option though. It seems the same can be achieved with mrconvert -clear command_history -clear comments? I'm not sure this warrants a whole new option when we can simply document this. Also, I'm not convinced it's necessarily a good idea to call this -anonymise, this implies some level of guarantee that I'm not convinced we can provide - there may be patient identifiable information in other (custom) fields that would be totally ignored (unlikely, admittedly, but possible). Whereas I think simply documenting the process for clearing these fields doesn't hold us responsible for ensuring actual anonymity in the output data (at least not to the same extent).

Also, as implemented you've put the handling of the -anonymise option deep in the core header handling code, in the main library - but only mrconvert has that option available. I'm guessing you're anticipating this becoming available in other commands too? In which case it would make sense to have it in the standard options section. Otherwise, I'd say the handling of this option should remain within cmd/mrconvert.cpp.

@Lestropie
Copy link
Member Author

The problem with mrconvert -clear_property command_history is that the current command will still be written to this field within Header::create() after cmd/mrconvert.cpp has wiped it, and that one line may contain identifying information; that's specifically the issue Thijs ran into with the workshop data. That's why handling wiping this field needs to be in Header::create().

In retrospect though, an alternative solution would be to ditch the -anonymise option, and instead look for the presence of -clear_property command_history within Header::create(), using the presence of that option to prevent addition of the one line representing the current command. This wouldn't have the same "implied guarantees" than an "-anonymise" option may...?

@thijsdhollander
Copy link
Contributor

Also, I'm not convinced it's necessarily a good idea to call this -anonymise, this implies some level of guarantee that I'm not convinced we can provide - there may be patient identifiable information in other (custom) fields that would be totally ignored (unlikely, admittedly, but possible).

Yeah, I agree with that entirely. Also the reverse, I think: there are quite feasible scenarios where anonymising would not strictly have to imply deleting the command history. But it's a sneaky one, since folder names can end up invariably in the command history, even actually caused by e.g. other earlier steps that performed some kind(s) of anonymisation, but in doing so, those commands inserted sensitive information in the command history. From this, it's clear that a page in the documentation entirely dedicated to the topic of anonimisation wouldn't be a bad idea (if anyone has the time to write it, that is). So well, for all those reasons, I wouldn't include this -anonymise option with this functionality either. Also, anything of this kind of functionality (even just the options to remove command history entries or comments) should definitely not appear in other commands (but mrconvert), I'd say.

That of course doesn't do away with the fact that...

The problem with mrconvert -clear_property command_history is that the current command will still be written to this field within Header::create() after cmd/mrconvert.cpp has wiped it,

...but then entirely independent of the other (anonymisation related) fact that...

and that one line may contain identifying information; that's specifically the issue Thijs ran into with the workshop data. That's why handling wiping this field needs to be in Header::create().

So all anonymisation entirely aside (even though extra sneaky in that context), when I encountered this, I found this more of a case of conflicting post-conditions from a user interface point of view (i.e. user interfacing with the command; agnostic of implementation). Our post-conditions towards the user aren't entirely formalised, but are often (intuitively) implied. One of them is that running a command will (typically) append it to the command history. Another one is that mrconvert -clear_property clears a property, with "clear" meaning removing the corresponding header field and contents, so the post-condition being that they are no longer present after this option has been applied. Fulfilling both in a mrconvert -clear_property command_history is clearly impossible. Introducing a notion of the order of both operations towards the user is violating good encapsulation (and still violating a post-condition anyway, by the way). However, since these post-conditions aren't formalised anywhere, the question is which one is intuitively in the users' mind/intention when they run mrconvert -clear_property command_history. This is by definition subjective, but I'd argue that them actively specifying that they wish to clear the property, means that that goal and the associated post-condition is their first and foremost interest. Also, many "typical" users aren't even relying on the command history for anything, I'd think (most of them are probably not running commands with the intention of adding stuff to the command history; it's just a side effect, an "optional" feature if you will). So well, I'd then argue that mrconvert -clear_property command_history needs to fulfil the post-condition of removing that property (entirely). Also, just from a practical point of view, I don't think there's much use (to anyone even?) to have that particular command in the command history...?

look for the presence of -clear_property command_history within Header::create(), using the presence of that option to prevent addition of the one line representing the current command.

So I'd agree with that. 👍 (if that is the cleanest way to achieve this? Sadly still a bit of an ugly hack, but there may simply be no other way...?)

This wouldn't have the same "implied guarantees" than an "-anonymise" option may...?

No, certainly not. The only promise there is "to remove the command history field and contents". Actually, at the moment that is the somewhat "implied" guarantee that is not yet delivered upon. So fixing this would better serve the user; even in a scenario where they're after anonymisation step(s) and wouldn't want to be caught off-guard by the current somewhat "sneaky" behaviour.

@jdtournier
Copy link
Member

Right, I'd forgotten that particular issue (that -clear command_history was applied prior to adding the current command to the history). That clearly needs to be fixed. Glad we're all on side regarding the -anonymise option.

As to how best to achieve that, well I guess the idea of inserting the handling of the option in the Header::create() is an appealing candidate. There is another option however, which I reckon might be better and cleaner: we move the lines responsible for appending to the command history from the Header::create() call to the Header constructors - just add a Header::append_command_to_history() call or something, and invoke in all constructors (apart from move constructor). In practice, this means nothing changes in terms of how the rest of the code is written, but it does mean that manipulations to the command history before the Header::create() will be final. What do you reckon?

There is one potential issue with this approach, which is that we'd get duplicate entries in the history if a Header is constructed (either fresh or by copy), and then copy-constructed again subsequently, and that copy is then fed to Header::create(). I'd need to check whether that might be a serious problem - I have a feeling such operations happen routinely to handle numbered file formats (i.e. here and here), I'd need to check that this doesn't introduce issues... If it does, then the handling would need to remain where it is, as would the explicit clearing of the history - can't really see a way around that.

@jdtournier
Copy link
Member

Actually, on second thoughts, I can't see how my idea of appending to history from Header constructor can work. That would mean the Header::open() would insert the current command as the first entry in the history, before it's even started loading anything, since it's using a local default-constructed Header. Not a great idea...

- Remove -anonymise option added in 6bd397c.
- When -clear_property command_history is used, prevent concatenation of the current command string to the "command_history" property within Header::create().
based on discussion in #1389.
@Lestropie
Copy link
Member Author

👍?

@jdtournier
Copy link
Member

Sorry for taking my time over this, difficult to find the time with everything going on ATM...

Having had a look through your proposed changes, there's a few things about it that I'd like to modify. In no particular order:

  1. I'm not sold on the hidden -compel_keyvalues idea. I'd rather we had an explicit, non-hidden -copy_properties original option, which would transfer all generic header entries from original over to the output image wholesale (this might be a useful option to have anyway). You could then additionally use the -set_property command_history "mrconvert blah blah" to override the command-line history (or maybe the -append_property option would be better suited here?). It would then be a matter of ensuring the current command wasn't also added to the command history, much like we've been discussing already.

  2. I'm still uneasy with so much of the handling being in the main Header::create() call. I think I'd rather add a bool add_command_to_history = true default argument added to the Header::create() and Image::create() calls, which mrconvert could then tap into to prevent the current command from being added to the history. This would mean that all command history manipulations could be performed within the mrconvert code, removing the need for such explicit manipulation in the library itself.

  3. Approach (1) above would remove the need for hidden options, which I'm not a fan of - if we need to hide functionality like this, I reckon we'd be better off using environment variables or similar approaches, which bypass the command-line parsing altogether. In fact, it might be an idea to modify the current App::parse_special_options() call to such an approach... Means we'd need to modify docs/generate_user_docs.sh to run something like:

    __MRTRIX_DUMP_HELP_PAGE__=SYNOPSIS command
    

    instead of:

    command __print_synopsis__
    

    as is currently the case. But I digress...

I realise this means fairly drastic changes to your pull request... What do you reckon...?

@jdtournier jdtournier force-pushed the scripts_command_history branch from 5795a94 to 31fd405 Compare July 20, 2018 15:59
@jdtournier
Copy link
Member

OK, I had a go at modifying the handling to match what I was suggesting. I think it's cleaner that way, and involves fewer modifications to the core. It'll need testing to make sure it works as advertised, and also the testing will need to be fixed up (not sure why Travis is struggling so much at the moment, but there's a lot of errors on multiple fronts...). Let me know what you think...

Using type_image_in() identifier for the input argument to this option would have precluded import of header key-value pairs via JSON file, which is utilised within dwipreproc to erase specific key-value pairs upon script completion.
Functionality of loading key-value entries from a JSON file was moved from core/header.cpp to cmd/mrconvert.cpp in 31fd405.
Rather than just detecting the use of -copy_properties and deleting the next five entries (assuming the use of function app.mrconvertOutputOption()), which may cause issues if a script were to use mrconvert -copy_properties in some other way, be more selective in the detection and removal of these options from the terminal printout.
@Lestropie
Copy link
Member Author

Some small tweaks; looks fine, but I need to confirm that dwipreproc works before merging.

@Lestropie
Copy link
Member Author

Confirmed that dwipreproc works and that the header fields are correct. So happy for this to go now to dev, unless anybody else wants to test it further beforehand.

@jdtournier jdtournier merged commit f831ec3 into dev Aug 1, 2018
@jdtournier jdtournier deleted the scripts_command_history branch August 1, 2018 14:26
@jdtournier
Copy link
Member

Merged. This is going to dev, it's the best place for it to receive more exhaustive testing...

Lestropie added a commit that referenced this pull request Jan 13, 2019
Function app.mrconvertOutputOption() can only be applied when run.command() is calling mrconvert; the command-line options it exploits are not present in other binaries.
Lestropie added a commit that referenced this pull request Jan 14, 2019
dwiintensitynorm: Fix bug introduced in #1389
Lestropie added a commit that referenced this pull request Jan 16, 2019
dwipreproc performs explicit removal of unwanted header fields from the output DWI, as introduced in 47e7288 / #1389, by exporting the DWI header key-value pairs to a JSON file and explicitly editing the JSON contents. The writing of the input JSON file was however lost during the process of resolving divergent code in 7617787. This commit fixes the issue.
Correct operation of code for selecting & executing different versions of eddy - including if both CUDA and OpenMP versions are installed but the former does not run correctly - has been verified for the changes in #1449.
Lestropie added a commit that referenced this pull request Apr 26, 2019
Related to 5eec075; follows #1389.
Because the contents of the current command string that are to be appended to "command_history" may itself contain escape quotes (e.g. for filepaths containing spaces), this becomes error-prone if trying to parse within the input to run.command(). Therefore, with these changes, the intended "template" image / file from which key-value pairs will be pulled during a final mrconvert call at the end of a Python script is instead handled by providing a named argument to run.command(), "-mrconvert_keyval". run.command() is then responsible for inserting the relevant command-line options to the mrconvert call. Coincidentally this also makes it much easier to avoid including these particular command-line options in the terminal output feedback generated by run.command().
Due to these modifications, the management of forcing output file overwrite has also been moved to a named input argument for run.command().

Similarly to C++ in 5eec075:
- "COMMAND_STRING" has been renamed to "COMMAND_HISTORY_STRING" to better reflect its contents.
- Functions matrix.save_*() accept dictionaries via the "header" and "footer" named arguments, and will write the corresponding key-value pairs as comments in the output text file.
- Boolean controlling whether or not the current command string should be appended to "command_history" is provided as a named argument.

Note changes also include basic fixes to matrix.load_numeric() and matrix.save_numeric().
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants