Skip to content

Implement Pydra code-generators#2665

Open
tclose wants to merge 28 commits intoMRtrix3:devfrom
tclose:pydra-usage
Open

Implement Pydra code-generators#2665
tclose wants to merge 28 commits intoMRtrix3:devfrom
tclose:pydra-usage

Conversation

@tclose
Copy link
Contributor

@tclose tclose commented Jun 29, 2023

Implementation of #2605

Tasks:

  • __print_pydra_code__ for C++ commands.
  • __print_pydra_code__ for Python commands
  • Script to loop through all cmds and print their "pydra usage" to *.py files
  • Fix up package TOML (maybe switch to hatchling?)
  • GitHub action to be run on new release to upload pydra-mrtrix3 package to PyPI
  • Tests to check generated code can print cmdlines properly (@Lestropie, do you have any ideas for this? I was thinking if there was a hidden "dry-run" option then

Blocked by:

@Lestropie
Copy link
Member

As mentioned in #2605, Python will require #1392 via ankitasanil#1 to have typed command-line parameters.
@ankitasanil: Still interest in completing the PR, or will @MRtrix3/mrtrix3-devs need to finish it off?

@Lestropie Lestropie marked this pull request as draft June 29, 2023 04:44
@tclose
Copy link
Contributor Author

tclose commented Jun 30, 2023

As mentioned in #2605, Python will require #1392 via ankitasanil#1 to have typed command-line parameters. @ankitasanil: Still interest in completing the PR, or will @MRtrix3/mrtrix3-devs need to finish it off?

I have written the Python version as best as I can with the current level of typing, but you are right that it will need more type information to get it to work in a guaranteed way.

On typing of files, how many different formats does MRtrix interact with (e.g. bvec, bval, response.txt, etc...) and would it be too much to ask to be able to give each format its own ArgType (like Image* and Track* do currently)? Because I have a PR that refactors Pydra's type-checking to be aware of file-formats (nipype/pydra#662), and it would be great to be able to specify them exactly instead of defaulting to a generic FileSet class.

@Lestropie
Copy link
Member

how many different (file) formats does MRtrix interact with

Assuming upstream exclusion of images & tractograms from that set, here's what comes to mind:

  • 1D & 2D numerical data input & output (currently ASCII only but also NPY after Npy file format support & half-precision floating-point #2437)
  • TSFs
  • JSONs (in multiple contexts)
  • .dcm files
  • Other text files:
    • Lists of paths to other files (eg. stats command inputs)
    • Generic tabulated data (eg. parcellation lookup tables but others also)

As far as further custom types in user interfacing, I would argue that:

  • That could potentially be the case for TSFs / JSONs / .dcms, since those represent singular unambiguous cases of file extension / content of data / requisite parsing code / interpretation of data.
  • The specific examples you provide (bvec / bval / response functions) I'm more sceptical of, as they are just numerical data, albeit sometimes with expectations of dimensionality / size; they're just subsets of "Nd numerical data" to which certain interpretations are applied after the data are parsed.

Curious to hear others' thoughts.

@tclose
Copy link
Contributor Author

tclose commented Jun 30, 2023

how many different (file) formats does MRtrix interact with

Assuming upstream exclusion of images & tractograms from that set, here's what comes to mind:

  • 1D & 2D numerical data input & output (currently ASCII only but also NPY after Npy file format support #2437)

  • TSFs

  • JSONs (in multiple contexts)

  • .dcm files

  • Other text files:

    • Lists of paths to other files (eg. stats command inputs)
    • Generic tabulated data (eg. parcellation lookup tables but others also)

As far as further custom types in user interfacing, I would argue that:

  • That could potentially be the case for TSFs / JSONs / .dcms, since those represent singular unambiguous cases of file extension / content of data / requisite parsing code / interpretation of data.
  • The specific examples you provide (bvec / bval / response functions) I'm more sceptical of, as they are just numerical data, albeit sometimes with expectations of dimensionality / size; they're just subsets of "Nd numerical data" to which certain interpretations are applied after the data are parsed.

Curious to hear others' thoughts.

Yes, maybe file formats is a bit of a misnomer, data types would be more accurate/actually what is captured in Pydra. In any case, it would be very helpful on the Pydra code-gen front if these file-data-types could be specified at bit finer granularity

@tclose
Copy link
Contributor Author

tclose commented Jul 12, 2023

Ok, so this just needs some unit tests to test out the generated Pydra interfaces and to wait for #1392.

@Lestropie, on the unit tests, do you think it would be an issue if I added a --dry-run option to the general app options, which basically tests to see whether the provided options parse and then exits?

@tclose
Copy link
Contributor Author

tclose commented Jul 12, 2023

Ok, so this just needs some unit tests to test out the generated Pydra interfaces and to wait for #1392.

@Lestropie, on the unit tests, do you think it would be an issue if I added a --dry-run option to the general app options, which basically tests to see whether the provided options parse and then exits?

If you don't want to clutter the help menu further, maybe there could be special configure arg to insert it in?

@Lestropie
Copy link
Member

One note I might throw in here just because I encountered it while working on the code and it's of some limited relevance.

For a command-line option, in C++, you specify the option, and then you can specify some number of compulsory command-line arguments that must follow that option, and get consumed from the command-line by the parser immediately following the appearance of the option. Each of those arguments may have its own type.

In Python, this can't currently be done. In argparse, the option itself is assigned a type, as well as a number of arguments. Therefore you can't have >1 arguments following a command-line option where those arguments have different types to one another.

@tclose tclose force-pushed the pydra-usage branch 3 times, most recently from cb2e711 to 0dff23a Compare August 22, 2023 07:44
@daljit46
Copy link
Member

daljit46 commented Aug 24, 2023

Please note that this PR will need to conform to the formatting requirements introduced in #2652, thus all C++ files need to be formatted using clang-format (version 16). See here.

@tclose
Copy link
Contributor Author

tclose commented Aug 25, 2023

In Python, this can't currently be done. In argparse, the option itself is assigned a type, as well as a number of arguments. Therefore you can't have >1 arguments following a command-line option where those arguments have different types to one another.

I think the typical way to represent this is to use the typing.Tuple[MyType1, MyType2] form

@Lestropie
Copy link
Member

I think the typical way to represent this is to use the typing.Tuple[MyType1, MyType2] form

Having had a small taste of Pydra now, this might be better than doing the most permissible thing of just accepting any string in such a scenario, but doesn't fully address the issue to the extent that the MRtrix3 C++ interface does, at least to the best of my understanding.

Say for instance a command-line option takes as input a string and then an integer. Yes you could make that command-line option a typing.Tuple[str, int], but that would mean that the user could erroneously specify an integer and then a string, and that wouldn't be caught by the command-line parser.

PS. Relevance to #2580: making the argument types in C++ a bitwise flag indicating what set of types is permissible would be faithful to the typing.Tuple[MyType1, MyType2] usage I've seen.

@Lestropie
Copy link
Member

@tclose: After discussion with @jdtournier, there is trepidation about embedding the Pydra interfaces within the repository.

On the plus side, it would enforce consistency, PR changeset would show interface changes that might otherwise be invisible to the CI checks of the online documentation, and be convenient from the perspective of workflow managers grabbing both the software and the interface to it.

On the minus side, it's implicitly taking on management responsibility for interfaces to an environment for which none of the core development team have experience; even if the interfaces were to be committed in a functional state, and the MRtrix3 commands were to not change, it would nevertheless be possible for there to be changes to Pydra that would necessitate revisions here.

What would it look like from the Pydra perspective if the MRtrix3 Pydra interface were to be provided as a stand-alone repository? Ie. What would be the magnitude of inconvenience? There's obviously the issue that on a tagged release of MRtrix3 you would need to update that external repository and generate a corresponding tag.

@tclose
Copy link
Contributor Author

tclose commented Feb 28, 2024

@Lestropie @jdtournier just to clarify that the interfaces are not being stored, instead generated from the MRtrix C++ & Python command declarations (see __print_usage_pydra__). Therefore, the interfaces should evolve with changes to the MRtrix commands as long as the command definition syntax stays stable.

It would be great if the print_usage_pydra special command (or whatever it is called) can live in the MRtrix source, as I'm not sure it will be easy to implement/maintain that in a separate code-base (if possible at all). However, the generation code and Python package can be stored in a separate repository if required. This is what has been typically done for other toolboxes.

We are hoping to move to a model where the Pydra interfaces could be integrated into the tool source repos to make it easier for them them stay in step with each other. Given it is my code, I'm happy to put my hand up to deal with any issues that might arise, as I would need to do this if it were stored in a separate repository. But in any case, totally up to you guys, I can integrate the changes into the existing nipype/pydra-mrtrix3 repo instead.

@jdtournier
Copy link
Member

Hey @tclose!
Just a few points/questions:

No problem at all from our point of view to include the code responsible for producing the interface definitions. That would indeed be really tricky to maintain outside of the core repo, but it's not something that I expect to change much once implemented, and it makes sense for it to be there. I would opt for a name such as __print_usage_json__ (assuming Pydra does expect JSON...) just to avoid giving the impression that this can only be used in Pydra - if it's JSON, there's no reason it can't be used in other contexts as well .

The bigger question from my point of view is whether to store the generated definitions in the core repo. It's trivial to generate them at build-time, and maybe make it part of the CMake install or something. But that doesn't require the generated interface definitions themselves to also be stored on the main repo. What I was trying to figure out was how you envisage these files to be used in practice. If they're stored alongside a full MRtrix install, it would be possible for an application like Pydra to query them at run-time and offer a version of the interface that is always guaranteed to be in lock-step with the version of MRtrix installed. If they're stored elsewhere, then there is always the risk of a mismatch between the version installed and the interface files, which could lead to odd errors. I expect with that kind of approach, you would need to host interface definitions for each version of MRtrix, along with the ability to determine the current version of MRtrix, and would only guarantee correct operation for official release versions. But the main reason I'm discussing this is because in the former case, I don't see a need to host these interface definitions on the main repo, while I might see a call for that with the latter approach (though I still think it would be cleaner from our point of view to have them hosted elsewhere, presumably somewhere more obviously related to the Pydra project).

What do you reckon?

@Lestropie
Copy link
Member

It might be the case that we need to consider separately:

  1. The implementation of __print_usage_pydra__
    (which without a doubt should be implemented within the main repository, even if it may one day need to be changed in response to Pydra changes)

  2. The storage of the outcomes of __print_usage_pydra__, which would extend Generate interfaces #2808, and potentially not require any more developer effort (eg. have a single script that generates both the interfaces and the online documentation), and CI / code review would potentially highlight CLI changes not exposed by help page changes.

  3. The additional content currently present in this PR.
    Current discussion with @jdtournier may have overlooked the fact that there's a lot of additional content here over and above the direct outputs from __print_usage_pydra__. Which if your concern is developer maintenance burden, this would exacerbate that quite a lot.

I wondered about an intermediate solution where we do 1 and 2 but not 3. However I don't think that that solves @jdtournier's concerns, and it might not make @tclose's handling any cleaner.

@tclose
Copy link
Contributor Author

tclose commented Feb 29, 2024

Ooops, I just realised that in between switching between the old build script and the cmake version I inadvertently committed my old pre-cmake build directory to the branch, which might have been confusing things. After speaking to Rob today we decided to strip everything out of this PR except to the print_usage_pydra special methods in favour of storing the package stub in https://github.com/nipype/pydra-mrtrix3. However, if you don't mind, I have kept in the GH action workflow which generates the interfaces and runs the tests with the PARSE_ONLY env var set.

The Pydra export is actually Python code rather than JSON so it will need to be specific to Pydra if that is ok. Also, the interface syntax is likely/hopefully going to change a fair bit in the near future if my proposal is accepted, so we can probably just leave this PR as a WIP until then (I can build and release the package on PyPI with this dev branch in the mean time) and the first cmake release.

@Lestropie
Copy link
Member

  • Add test data and tests for new interface generation capability (both C++ and Python)

@tclose
Copy link
Contributor Author

tclose commented Aug 28, 2025

@Lestropie do you know why I would be getting

github.GithubException.GithubException: Resource not accessible by integration: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/pulls/reviews#create-a-review-for-a-pull-request", "status": "403"}

in the clang-tidy-review and

Error: _diff_image: [ERROR] images "tmp/cfe.mif" and "fixelcfestats/default/cfe.mif" do not match within absolute precision of 9.9999999999999995e-07 (5114.81 vs 5114.81)
Error: _diff_image: [ERROR] exception thrown from one or more threads "loop threads"
CMake Error at /Users/runner/work/mrtrix3/mrtrix3/cmake/RunTest.cmake:17 (message):
  Test
  /Users/runner/work/mrtrix3/mrtrix3/testing/binaries/tests/fixelcfestats/default
  failed!

in the macos test?

@tclose tclose changed the title WIP: Implement Pydra code-generators Implement Pydra code-generators Aug 28, 2025
@tclose tclose marked this pull request as ready for review August 28, 2025 01:44
@Lestropie
Copy link
Member

First one I've not seen before. I'm not familiar with the API permissions necessary for allowing the clang tidy review action to post to PRs. Hopefully @daljit46 can resolve.

Second one is due to #3149 not yet having been merged back to dev. I need to first check #3101 and merge that before dealing with the post-3.0.5 master changes.

@tclose
Copy link
Contributor Author

tclose commented Aug 28, 2025

  • Add test data and tests for new interface generation capability (both C++ and Python)

I have set up the pydra.yaml GH action to clone the https://github.com/nipype/pydra-tasks-mrtrix3 run the generated pytest tests. Currently these are all just marked as XFAIL but it will test whether the generated code is valid Pydra at least

@tclose
Copy link
Contributor Author

tclose commented Aug 28, 2025

#3074

For complicated corner cases, I think manual post-generation edits in https://github.com/nipype/pydra-tasks-mrtrix3 will end up being necessary

@tclose
Copy link
Contributor Author

tclose commented Nov 28, 2025

I have refactored the code so that it matches the new argument parsing api. I believe this is good to merge now. I'm not sure what the issue with the clang-tidy-review is, doesn't seem to be related to my code.

Is it possible to get this merged in before any more API changes are done? Just takes me a while to work out the new syntax and update my code each time.

@tclose
Copy link
Contributor Author

tclose commented Nov 28, 2025

Looks like the clang-tidy-review error is a 403 permissions issue

@tclose
Copy link
Contributor Author

tclose commented Nov 28, 2025

Just to confirm that the new types member of ArgTypesFlag represents an or operation

@daljit46
Copy link
Member

The clang-tidy-review issue is due to the PR being created against a fork rather than a branch. I will make a PR at some point to get around that.

@tclose
Copy link
Contributor Author

tclose commented Dec 1, 2025

Ok, I think I have addressed all those code-style issues now

@tclose
Copy link
Contributor Author

tclose commented Dec 7, 2025

Is there any chance this request can get merged in now or does it need to wait for @daljit46 to fix the clang review? If the latter, can we move the PR branch into the main repo so it can run?

Also, once it is done, is there any chance you could tag a pre-release version I could point to in my builds?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants