Conversation
Because Python scripts frequently call multiple underlying commands, the key-value field 'command_history' typically gets mangled by including a large number of steps that occur internally within the script and were not executed by the user; additionally, any requisite conversion to NIfTI within a script will result in loss of the command history, as well as any other header key-value fields. This commit adds the necessary functionalities in order for the header key-value entries of an image generated by an MRtrix3 Python script to behave identically to any C++ executable command: The key-value entries are copied from the input image to the output, and field 'command_history' is updated specifically with the command-line invocation of that particular script. This is handled internally using a new mrconvert option '-compel_keyvalues', which is appropriately hidden from users. Closes #1188.
This option will erase two key-value entries from the image header: - "comments": This frequently contains a subject name from DICOM conversion. - "command_history": This can contain subject identifying information from file / directory names of either previous steps, or indeed from the command currently being executed; accordingly, removal of this field must occur within function Header::create(), as the concatenation of the currently-executed command should also be suppressed if the goal is to yield anonymised data. Requirement logged in #1188.
|
Current code: This PR: @thijsdhollander: This also adds an |
|
Looks good. Not sure about the Also, as implemented you've put the handling of the |
|
The problem with In retrospect though, an alternative solution would be to ditch the |
Yeah, I agree with that entirely. Also the reverse, I think: there are quite feasible scenarios where anonymising would not strictly have to imply deleting the command history. But it's a sneaky one, since folder names can end up invariably in the command history, even actually caused by e.g. other earlier steps that performed some kind(s) of anonymisation, but in doing so, those commands inserted sensitive information in the command history. From this, it's clear that a page in the documentation entirely dedicated to the topic of anonimisation wouldn't be a bad idea (if anyone has the time to write it, that is). So well, for all those reasons, I wouldn't include this That of course doesn't do away with the fact that...
...but then entirely independent of the other (anonymisation related) fact that...
So all anonymisation entirely aside (even though extra sneaky in that context), when I encountered this, I found this more of a case of conflicting post-conditions from a user interface point of view (i.e. user interfacing with the command; agnostic of implementation). Our post-conditions towards the user aren't entirely formalised, but are often (intuitively) implied. One of them is that running a command will (typically) append it to the command history. Another one is that
So I'd agree with that. 👍 (if that is the cleanest way to achieve this? Sadly still a bit of an ugly hack, but there may simply be no other way...?)
No, certainly not. The only promise there is "to remove the command history field and contents". Actually, at the moment that is the somewhat "implied" guarantee that is not yet delivered upon. So fixing this would better serve the user; even in a scenario where they're after anonymisation step(s) and wouldn't want to be caught off-guard by the current somewhat "sneaky" behaviour. |
|
Right, I'd forgotten that particular issue (that As to how best to achieve that, well I guess the idea of inserting the handling of the option in the There is one potential issue with this approach, which is that we'd get duplicate entries in the history if a Header is constructed (either fresh or by copy), and then copy-constructed again subsequently, and that copy is then fed to |
|
Actually, on second thoughts, I can't see how my idea of appending to history from |
|
👍? |
|
Sorry for taking my time over this, difficult to find the time with everything going on ATM... Having had a look through your proposed changes, there's a few things about it that I'd like to modify. In no particular order:
I realise this means fairly drastic changes to your pull request... What do you reckon...? |
Not within core library, as discussed on #1389
5795a94 to
31fd405
Compare
|
OK, I had a go at modifying the handling to match what I was suggesting. I think it's cleaner that way, and involves fewer modifications to the core. It'll need testing to make sure it works as advertised, and also the testing will need to be fixed up (not sure why Travis is struggling so much at the moment, but there's a lot of errors on multiple fronts...). Let me know what you think... |
Using type_image_in() identifier for the input argument to this option would have precluded import of header key-value pairs via JSON file, which is utilised within dwipreproc to erase specific key-value pairs upon script completion.
Functionality of loading key-value entries from a JSON file was moved from core/header.cpp to cmd/mrconvert.cpp in 31fd405.
Rather than just detecting the use of -copy_properties and deleting the next five entries (assuming the use of function app.mrconvertOutputOption()), which may cause issues if a script were to use mrconvert -copy_properties in some other way, be more selective in the detection and removal of these options from the terminal printout.
|
Some small tweaks; looks fine, but I need to confirm that |
|
Confirmed that |
|
Merged. This is going to |
Function app.mrconvertOutputOption() can only be applied when run.command() is calling mrconvert; the command-line options it exploits are not present in other binaries.
dwiintensitynorm: Fix bug introduced in #1389
dwipreproc performs explicit removal of unwanted header fields from the output DWI, as introduced in 47e7288 / #1389, by exporting the DWI header key-value pairs to a JSON file and explicitly editing the JSON contents. The writing of the input JSON file was however lost during the process of resolving divergent code in 7617787. This commit fixes the issue. Correct operation of code for selecting & executing different versions of eddy - including if both CUDA and OpenMP versions are installed but the former does not run correctly - has been verified for the changes in #1449.
Related to 5eec075; follows #1389. Because the contents of the current command string that are to be appended to "command_history" may itself contain escape quotes (e.g. for filepaths containing spaces), this becomes error-prone if trying to parse within the input to run.command(). Therefore, with these changes, the intended "template" image / file from which key-value pairs will be pulled during a final mrconvert call at the end of a Python script is instead handled by providing a named argument to run.command(), "-mrconvert_keyval". run.command() is then responsible for inserting the relevant command-line options to the mrconvert call. Coincidentally this also makes it much easier to avoid including these particular command-line options in the terminal output feedback generated by run.command(). Due to these modifications, the management of forcing output file overwrite has also been moved to a named input argument for run.command(). Similarly to C++ in 5eec075: - "COMMAND_STRING" has been renamed to "COMMAND_HISTORY_STRING" to better reflect its contents. - Functions matrix.save_*() accept dictionaries via the "header" and "footer" named arguments, and will write the corresponding key-value pairs as comments in the output text file. - Boolean controlling whether or not the current command string should be appended to "command_history" is provided as a named argument. Note changes also include basic fixes to matrix.load_numeric() and matrix.save_numeric().
This was giving me the irrits with my own data, so I fixed it.
(More details in commit messages)
Do however need to make sure that all scripts behave as expected before merging. They pass
pylint, but best to be sure.One thing I did with
dwiintensitynorm/population_templateis: outputs that are based on the whole population, rather than just one input subject, will only have a single line incommand_history, corresponding to the script invocation. I feel like that's more 'honest' information than including the command history of one arbitrarily-chosen subject in the history of such data.Closes #1188.