Adjoint matmul in QobjEvo#2802
Conversation
- Add C++ implementations: matmul_csr_dense, matmul_diag_vector - Update Cython interface with new function declarations - Add matmul_dag_dense_dia_dense for adjoint diagonal operations - Update test_mathematics.py with test refactoring
- Add matmul_data and adjoint_rmatmul_data methods - Support scale parameters for efficient accumulation - Update _brtensor.pyx for signature compatibility
- Implement matmul_data for efficient matrix multiplication - Add adjoint_rmatmul_data for right multiplication with adjoint - Support scale parameters for accumulation operations - Update test_qobjevo.py with test refactoring
There was a problem hiding this comment.
Hi @amilsted, thank you very much for your contribution. It will take time to look at this in detail, but it looks like very good and thorough work. Benchmark code would be nice to have (and could potentially be repackaged into a tutorial notebook at some point?)
Storing superoperators not as a big Liouvillian matrix but in a more "analytic" form has been discussed internally for a while, and is a big goal for one of the next major releases. That would be a deeper change, applying not only to Lindblad-type superoperators... however, it will take some time. I imagine that it would make sense to add your feature to QuTiP now, even if it might become obsolete at some later time. We need to have some discussion within the admin team.
The following are just a few things I noticed scrolling through the code for the first time. (Also, please have a look at the warnings generated in the test runs; we treat warnings as errors.)
|
Looks like tests are failing because of warnings. I added coverage for the in-place behavior of the matmul routines, including cases where input and output matrices have different C/Fortran ordering choices (some of the existing code was actually broken for this case). These paths are slower and currently warn. Maybe these should actually be error paths? In any case, at the moment it looks like I just need to teach the tests to expect warnings here. |
Thank you for taking a look! It did occur to me that it might make sense to have a general matrix-form Liouvillian class. I hope that the matmul op work here is useful for future changes along these lines. It wouldn't be hard to generalize LindbladMatrixForm to be a general superoperator class. Could keep the same matmul efficiency if we e.g. handle |
|
Maybe I should also add: the explicit superoperator form for the Liouvillian is still often the fastest for sparse martrices, a according to my benchmarks, so you may not want to drop it entirely! |
|
@amilsted |
|
Thank you for adding mixed C, F order in the tests. It would be good to have tests that check the warnings are properly raised: I don't think they should be error. |
7c64c4b to
602c9fb
Compare
I added some code to just swallow these warnings in all cases. It's currently difficult to detect when they should or shouldn't occur, but I can work on doing that if you prefer. I also did some refactoring as I wasn't quite happy with how the tests were structured. Now there is a separate test variant with its own id for out-of-place as well as each in-place ordering. |
I have now done this. This PR now only includes the matmul stuff. I will make a second for the mesolve stuff. Would you like me to do that now or wait until we are done with this? For now, I have put the remaining changes here: amilsted#1 |
|
Added some benchmark results in a branch here: https://github.com/amilsted/qutip/tree/matmul_bench See the full results here: https://github.com/amilsted/qutip/blob/matmul_bench/BENCHMARKING.md In matmul microbenchmarks:
In JC model dynamics (sesolve and vectorized mesolve, so only exercising mat-vec routines):
|
Ericgig
left a comment
There was a problem hiding this comment.
Thank you for the benchmarks. I will try running them tomorrow.
|
Just wanted to add, thanks for this, looks cool! I had a quick question, I remember looking at what dynamiqs were doing a while back, and they had a nice trick in the Lindblad construction to slightly reduce the number of matrix-matrix multiplications, at the the cost of some small numerical error (https://github.com/dynamiqs/dynamiqs/blob/fdc6c8913bba2b0a03eb0ad314ca01b5a519030b/dynamiqs/integrators/core/diffrax_integrator.py#L297 ). Just curious if you tried this in your benchmarks, and if it was worthwhile. |
Oh man! Yes, I had come across this, but only vaguely remembered it existed. I think it's worth doing for the second PR :) Slight changes to the number of matmul ops can indeed determine the answer to "which solver variant is best". There are many cases where it is a close call! |
Ericgig
left a comment
There was a problem hiding this comment.
I am still running benchmarks, but I am happy with everything except the tests for cases with inputs. Auto detection from signature and adding an argument that you must check that it's not used anywhere except one specific case is not a clean solution.
| # Check if op supports out parameter and out_type is Dense. | ||
| # If so, we will add in-place test variants for both C and F ordered | ||
| # output arrays. | ||
| import inspect | ||
| supports_out = False | ||
| if out_type is Dense: | ||
| try: | ||
| sig = inspect.signature(op) | ||
| supports_out = 'out' in sig.parameters | ||
| except (ValueError, TypeError): | ||
| # Can't introspect (e.g., built-in function), assume no out support | ||
| pass | ||
|
|
||
| # Determine out_prealloc values | ||
| out_prealloc_values = [None, "C", "F"] if supports_out else [None] | ||
|
|
There was a problem hiding this comment.
This is not the way...
This will cause problem if the out argument was ever used in another way or named something else.
It would also skip test were the output is not Dense ,cupy , jax, cuQuantum backends uses non-Dense states in our solvers and we should test them too.
Why not use TernaryOpMixin (or ScaledTernaryOpMixin) with out as simply the third input:
the operation is really A @ B * scale + out. With the output being the same instance as out a bonus.
The function tested will have to be wrapped since scale comes before out (or we could inverse them in the functions).
It would allow to add tests form other specialization easily (says we want to test with a CSR out) and keep test_mathematically_correct and cases_type_shape_product simpler.
There was a problem hiding this comment.
Point taken. That sounds like a better approach!
There was a problem hiding this comment.
I have done this now with a InPlaceMatmulMixin, since the semantics of TernaryOpMixin were a bit different. It does seem a lot cleaner. What do you think?
This time I have verified that the rest of the tests also pass :)
Ericgig
left a comment
There was a problem hiding this comment.
Benchmarks look good on my side.
If you could just remove the unused op_matmul, it's ready.
I saw you removed the towncrier entry. Knowing that there is a follow up PR, it's fine, but this alone is lot of work and deserve a mention.
Removed the op and added the towncrier entry. |
This PR introduces a matrix-form master-equation solver variant, which has dramatically better performance when operators are dense, helping to close the gap to other packages such as QuantumOptics.jl and dynamiqs. It also adds some new "cast-complex-to-double" C++ low-level matmul routines, inspired by existing qutip matvec code, that vectorize better than the existing cython code (at least they are significantly faster on my arm Macbook Pro and, for CSR matrices, help to bring performance closer to the CSC routines used by QuantumOptics.jl).
The matrix-form master-equation solver introduces a new "rhs" class
LindbladMatrixFormthat uses matmul operations to implement the action of the Liouvillian on a density matrix without constructing explicit superoperator matrices. For dense operators of dimension n, this action scales as O(n^3) versus O(n^4) for the current "explicit superoperator matrix" approach. I have also found cases where using CSR representation with the matrix-form solver is faster than both CSR explicit superoperators and fully-dense matrix-form solves.The matrix-form solver is currently opt-in via the
mesolveAPI and is also usable via the newMESolverMatrixFormSolver class.I am happy to provide benchmark code for both claimed improvements.
I worked with gen AI tools to produce this PR. I reviewed AI-generated changes in detail myself and iterated on the changes to ensure a good fit with existing paradigms in the qutip codebase and its tests.
I realize this is a fairly significant addition and fully expect to make possibly major adjustments during the review process! For example, should it be "LindbladMatrixForm" or "LiouvillianMatrixForm"? : )
Update: I split the original PR into two. This PR now just introduces underlying matmul routine changes. A second PR will bring in the master-equation solver.