Implement Pydra code-generators by tclose · Pull Request #2665 · MRtrix3/mrtrix3

tclose · 2023-06-29T03:55:41Z

Implementation of #2605

Tasks:

__print_pydra_code__ for C++ commands.
__print_pydra_code__ for Python commands
Script to loop through all cmds and print their "pydra usage" to *.py files
Fix up package TOML (maybe switch to hatchling?)
GitHub action to be run on new release to upload pydra-mrtrix3 package to PyPI
Tests to check generated code can print cmdlines properly (@Lestropie, do you have any ideas for this? I was thinking if there was a hidden "dry-run" option then

Blocked by:

Complete run.command(): Support for command-line piping #1392 so that Python commands type file-types

Lestropie · 2023-06-29T04:44:37Z

As mentioned in #2605, Python will require #1392 via ankitasanil#1 to have typed command-line parameters.
@ankitasanil: Still interest in completing the PR, or will @MRtrix3/mrtrix3-devs need to finish it off?

tclose · 2023-06-30T00:45:22Z

As mentioned in #2605, Python will require #1392 via ankitasanil#1 to have typed command-line parameters. @ankitasanil: Still interest in completing the PR, or will @MRtrix3/mrtrix3-devs need to finish it off?

I have written the Python version as best as I can with the current level of typing, but you are right that it will need more type information to get it to work in a guaranteed way.

On typing of files, how many different formats does MRtrix interact with (e.g. bvec, bval, response.txt, etc...) and would it be too much to ask to be able to give each format its own ArgType (like Image* and Track* do currently)? Because I have a PR that refactors Pydra's type-checking to be aware of file-formats (nipype/pydra#662), and it would be great to be able to specify them exactly instead of defaulting to a generic FileSet class.

Lestropie · 2023-06-30T01:54:12Z

how many different (file) formats does MRtrix interact with

Assuming upstream exclusion of images & tractograms from that set, here's what comes to mind:

1D & 2D numerical data input & output (currently ASCII only but also NPY after Npy file format support & half-precision floating-point #2437)
TSFs
JSONs (in multiple contexts)
.dcm files
Other text files:
- Lists of paths to other files (eg. stats command inputs)
- Generic tabulated data (eg. parcellation lookup tables but others also)

As far as further custom types in user interfacing, I would argue that:

That could potentially be the case for TSFs / JSONs / .dcms, since those represent singular unambiguous cases of file extension / content of data / requisite parsing code / interpretation of data.
The specific examples you provide (bvec / bval / response functions) I'm more sceptical of, as they are just numerical data, albeit sometimes with expectations of dimensionality / size; they're just subsets of "Nd numerical data" to which certain interpretations are applied after the data are parsed.

Curious to hear others' thoughts.

tclose · 2023-06-30T02:05:11Z

how many different (file) formats does MRtrix interact with

Assuming upstream exclusion of images & tractograms from that set, here's what comes to mind:

1D & 2D numerical data input & output (currently ASCII only but also NPY after Npy file format support #2437)

TSFs

JSONs (in multiple contexts)

.dcm files

Other text files:

Lists of paths to other files (eg. stats command inputs)

Generic tabulated data (eg. parcellation lookup tables but others also)

As far as further custom types in user interfacing, I would argue that:

That could potentially be the case for TSFs / JSONs / .dcms, since those represent singular unambiguous cases of file extension / content of data / requisite parsing code / interpretation of data.

The specific examples you provide (bvec / bval / response functions) I'm more sceptical of, as they are just numerical data, albeit sometimes with expectations of dimensionality / size; they're just subsets of "Nd numerical data" to which certain interpretations are applied after the data are parsed.

Curious to hear others' thoughts.

Yes, maybe file formats is a bit of a misnomer, data types would be more accurate/actually what is captured in Pydra. In any case, it would be very helpful on the Pydra code-gen front if these file-data-types could be specified at bit finer granularity

tclose · 2023-07-12T06:23:55Z

Ok, so this just needs some unit tests to test out the generated Pydra interfaces and to wait for #1392.

@Lestropie, on the unit tests, do you think it would be an issue if I added a --dry-run option to the general app options, which basically tests to see whether the provided options parse and then exits?

tclose · 2023-07-12T06:28:28Z

Ok, so this just needs some unit tests to test out the generated Pydra interfaces and to wait for #1392.

@Lestropie, on the unit tests, do you think it would be an issue if I added a --dry-run option to the general app options, which basically tests to see whether the provided options parse and then exits?

If you don't want to clutter the help menu further, maybe there could be special configure arg to insert it in?

Lestropie · 2023-08-18T01:17:36Z

One note I might throw in here just because I encountered it while working on the code and it's of some limited relevance.

For a command-line option, in C++, you specify the option, and then you can specify some number of compulsory command-line arguments that must follow that option, and get consumed from the command-line by the parser immediately following the appearance of the option. Each of those arguments may have its own type.

In Python, this can't currently be done. In argparse, the option itself is assigned a type, as well as a number of arguments. Therefore you can't have >1 arguments following a command-line option where those arguments have different types to one another.

daljit46 · 2023-08-24T09:19:40Z

Please note that this PR will need to conform to the formatting requirements introduced in #2652, thus all C++ files need to be formatted using clang-format (version 16). See here.

tclose · 2023-08-25T05:22:26Z

In Python, this can't currently be done. In argparse, the option itself is assigned a type, as well as a number of arguments. Therefore you can't have >1 arguments following a command-line option where those arguments have different types to one another.

I think the typical way to represent this is to use the typing.Tuple[MyType1, MyType2] form

Lestropie · 2024-02-05T02:08:58Z

I think the typical way to represent this is to use the typing.Tuple[MyType1, MyType2] form

Having had a small taste of Pydra now, this might be better than doing the most permissible thing of just accepting any string in such a scenario, but doesn't fully address the issue to the extent that the MRtrix3 C++ interface does, at least to the best of my understanding.

Say for instance a command-line option takes as input a string and then an integer. Yes you could make that command-line option a typing.Tuple[str, int], but that would mean that the user could erroneously specify an integer and then a string, and that wouldn't be caught by the command-line parser.

PS. Relevance to #2580: making the argument types in C++ a bitwise flag indicating what set of types is permissible would be faithful to the typing.Tuple[MyType1, MyType2] usage I've seen.

Lestropie · 2024-02-28T09:40:02Z

@tclose: After discussion with @jdtournier, there is trepidation about embedding the Pydra interfaces within the repository.

On the plus side, it would enforce consistency, PR changeset would show interface changes that might otherwise be invisible to the CI checks of the online documentation, and be convenient from the perspective of workflow managers grabbing both the software and the interface to it.

On the minus side, it's implicitly taking on management responsibility for interfaces to an environment for which none of the core development team have experience; even if the interfaces were to be committed in a functional state, and the MRtrix3 commands were to not change, it would nevertheless be possible for there to be changes to Pydra that would necessitate revisions here.

What would it look like from the Pydra perspective if the MRtrix3 Pydra interface were to be provided as a stand-alone repository? Ie. What would be the magnitude of inconvenience? There's obviously the issue that on a tagged release of MRtrix3 you would need to update that external repository and generate a corresponding tag.

tclose · 2024-02-28T11:47:31Z

@Lestropie @jdtournier just to clarify that the interfaces are not being stored, instead generated from the MRtrix C++ & Python command declarations (see __print_usage_pydra__). Therefore, the interfaces should evolve with changes to the MRtrix commands as long as the command definition syntax stays stable.

It would be great if the print_usage_pydra special command (or whatever it is called) can live in the MRtrix source, as I'm not sure it will be easy to implement/maintain that in a separate code-base (if possible at all). However, the generation code and Python package can be stored in a separate repository if required. This is what has been typically done for other toolboxes.

We are hoping to move to a model where the Pydra interfaces could be integrated into the tool source repos to make it easier for them them stay in step with each other. Given it is my code, I'm happy to put my hand up to deal with any issues that might arise, as I would need to do this if it were stored in a separate repository. But in any case, totally up to you guys, I can integrate the changes into the existing nipype/pydra-mrtrix3 repo instead.

jdtournier · 2024-02-28T12:19:30Z

Hey @tclose!
Just a few points/questions:

No problem at all from our point of view to include the code responsible for producing the interface definitions. That would indeed be really tricky to maintain outside of the core repo, but it's not something that I expect to change much once implemented, and it makes sense for it to be there. I would opt for a name such as __print_usage_json__ (assuming Pydra does expect JSON...) just to avoid giving the impression that this can only be used in Pydra - if it's JSON, there's no reason it can't be used in other contexts as well .

The bigger question from my point of view is whether to store the generated definitions in the core repo. It's trivial to generate them at build-time, and maybe make it part of the CMake install or something. But that doesn't require the generated interface definitions themselves to also be stored on the main repo. What I was trying to figure out was how you envisage these files to be used in practice. If they're stored alongside a full MRtrix install, it would be possible for an application like Pydra to query them at run-time and offer a version of the interface that is always guaranteed to be in lock-step with the version of MRtrix installed. If they're stored elsewhere, then there is always the risk of a mismatch between the version installed and the interface files, which could lead to odd errors. I expect with that kind of approach, you would need to host interface definitions for each version of MRtrix, along with the ability to determine the current version of MRtrix, and would only guarantee correct operation for official release versions. But the main reason I'm discussing this is because in the former case, I don't see a need to host these interface definitions on the main repo, while I might see a call for that with the latter approach (though I still think it would be cleaner from our point of view to have them hosted elsewhere, presumably somewhere more obviously related to the Pydra project).

What do you reckon?

Lestropie · 2024-02-28T23:09:07Z

It might be the case that we need to consider separately:

The implementation of __print_usage_pydra__
(which without a doubt should be implemented within the main repository, even if it may one day need to be changed in response to Pydra changes)
The storage of the outcomes of __print_usage_pydra__, which would extend Generate interfaces #2808, and potentially not require any more developer effort (eg. have a single script that generates both the interfaces and the online documentation), and CI / code review would potentially highlight CLI changes not exposed by help page changes.
The additional content currently present in this PR.
Current discussion with @jdtournier may have overlooked the fact that there's a lot of additional content here over and above the direct outputs from __print_usage_pydra__. Which if your concern is developer maintenance burden, this would exacerbate that quite a lot.

I wondered about an intermediate solution where we do 1 and 2 but not 3. However I don't think that that solves @jdtournier's concerns, and it might not make @tclose's handling any cleaner.

cmd/amp2response.cpp

tclose · 2024-02-29T07:36:24Z

Ooops, I just realised that in between switching between the old build script and the cmake version I inadvertently committed my old pre-cmake build directory to the branch, which might have been confusing things. After speaking to Rob today we decided to strip everything out of this PR except to the print_usage_pydra special methods in favour of storing the package stub in https://github.com/nipype/pydra-mrtrix3. However, if you don't mind, I have kept in the GH action workflow which generates the interfaces and runs the tests with the PARSE_ONLY env var set.

The Pydra export is actually Python code rather than JSON so it will need to be specific to Pydra if that is ok. Also, the interface syntax is likely/hopefully going to change a fair bit in the near future if my proposal is accepted, so we can probably just leave this PR as a WIP until then (I can build and release the package on PyPI with this dev branch in the mean time) and the first cmake release.

cmd/amp2response.cpp

Lestropie · 2024-03-11T11:13:51Z

Add test data and tests for new interface generation capability (both C++ and Python)

tclose · 2025-08-28T01:42:26Z

@Lestropie do you know why I would be getting

github.GithubException.GithubException: Resource not accessible by integration: 403 {"message": "Resource not accessible by integration", "documentation_url": "https://docs.github.com/rest/pulls/reviews#create-a-review-for-a-pull-request", "status": "403"}

in the clang-tidy-review and

Error: _diff_image: [ERROR] images "tmp/cfe.mif" and "fixelcfestats/default/cfe.mif" do not match within absolute precision of 9.9999999999999995e-07 (5114.81 vs 5114.81)
Error: _diff_image: [ERROR] exception thrown from one or more threads "loop threads"
CMake Error at /Users/runner/work/mrtrix3/mrtrix3/cmake/RunTest.cmake:17 (message):
  Test
  /Users/runner/work/mrtrix3/mrtrix3/testing/binaries/tests/fixelcfestats/default
  failed!

in the macos test?

Lestropie · 2025-08-28T02:05:35Z

First one I've not seen before. I'm not familiar with the API permissions necessary for allowing the clang tidy review action to post to PRs. Hopefully @daljit46 can resolve.

Second one is due to #3149 not yet having been merged back to dev. I need to first check #3101 and merge that before dealing with the post-3.0.5 master changes.

tclose · 2025-08-28T04:15:32Z

Add test data and tests for new interface generation capability (both C++ and Python)

I have set up the pydra.yaml GH action to clone the https://github.com/nipype/pydra-tasks-mrtrix3 run the generated pytest tests. Currently these are all just marked as XFAIL but it will test whether the generated code is valid Pydra at least

tclose · 2025-08-28T04:16:50Z

#3074

For complicated corner cases, I think manual post-generation edits in https://github.com/nipype/pydra-tasks-mrtrix3 will end up being necessary

tclose · 2025-11-28T05:42:32Z

I have refactored the code so that it matches the new argument parsing api. I believe this is good to merge now. I'm not sure what the issue with the clang-tidy-review is, doesn't seem to be related to my code.

Is it possible to get this merged in before any more API changes are done? Just takes me a while to work out the new syntax and update my code each time.

tclose · 2025-11-28T06:08:04Z

Looks like the clang-tidy-review error is a 403 permissions issue

tclose · 2025-11-28T07:27:20Z

Just to confirm that the new types member of ArgTypesFlag represents an or operation

cpp/core/app.cpp

daljit46 · 2025-11-28T13:50:48Z

The clang-tidy-review issue is due to the PR being created against a fork rather than a branch. I will make a PR at some point to get around that.

tclose · 2025-12-01T01:08:04Z

Ok, I think I have addressed all those code-style issues now

tclose · 2025-12-07T23:26:34Z

Is there any chance this request can get merged in now or does it need to wait for @daljit46 to fix the clang review? If the latter, can we move the PR branch into the main repo so it can run?

Also, once it is done, is there any chance you could tag a pre-release version I could point to in my builds?

Lestropie marked this pull request as draft June 29, 2023 04:44

tclose force-pushed the pydra-usage branch from 0a07600 to 8cb732d Compare July 12, 2023 06:37

This was referenced Jul 12, 2023

Python CLI changes #2678

Merged

Runtime test of command-line parsing #2679

Open

Lestropie mentioned this pull request Aug 11, 2023

Docs: Fix missing boolean types in full usage #2693

Merged

tclose force-pushed the pydra-usage branch 3 times, most recently from cb2e711 to 0dff23a Compare August 22, 2023 07:44

This was referenced Jan 31, 2024

mrcat: Change positional argument structure #2792

Merged

Save software interfaces in multiple formats #2798

Open

Lestropie mentioned this pull request Feb 19, 2024

Generate interfaces #2808

Closed

tclose force-pushed the pydra-usage branch from 025b718 to 86271ff Compare February 22, 2024 10:02

tclose commented Feb 29, 2024

View reviewed changes

cmd/amp2response.cpp Outdated Show resolved Hide resolved

tclose commented Feb 29, 2024

View reviewed changes

cmd/amp2response.cpp Outdated Show resolved Hide resolved

run pytest instead of just importing pydra package in GH action

9aa1bc7

tclose changed the title ~~WIP: Implement Pydra code-generators~~ Implement Pydra code-generators Aug 28, 2025

tclose marked this pull request as ready for review August 28, 2025 01:44

Lestropie and others added 6 commits November 17, 2025 13:30

Merge branch 'dev' into pydra-usage

10fde8b

updated pydra code gen to match new argument-handling syntax

9a455ea

debugging generation of pydra code after arg parsing refactor merge

23fff5b

fixed up code to pass syntax check

3024e11

cleaned up app.py import

f4e3bf5

Renamed pydra CI workflow

d2dfdfb

renamed pydra CI workflow

cb6bc45