Skip to content

fix: SpMatrix::prepare_comm in case of no communication necessary#4844

Merged
WeiqunZhang merged 5 commits intoAMReX-Codes:developmentfrom
julien-fausty-gev:fix-SpMV-CPU
Dec 11, 2025
Merged

fix: SpMatrix::prepare_comm in case of no communication necessary#4844
WeiqunZhang merged 5 commits intoAMReX-Codes:developmentfrom
julien-fausty-gev:fix-SpMV-CPU

Conversation

@julien-fausty-gev
Copy link
Copy Markdown
Contributor

Summary

Currently, when computing matrix vector multiplication on the CPU, the amrex::SpMV method can accesses memory of the input vector wherever the matrix has non zero column indices if the matrix is distributed but does not need any communication to do the multiplication. This can lead to errors since accessing out of bounds memory is UB and can effectively lead to completely wrong results.

This contribution proposes first a fix by adding an else clause for the case where there are only local non-zero values in the matrix.

This commit also proposes to put the OpenMP loop one loop higher to avoid excessive thread spawning at this local matrix multiplication.

Additional background

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

Currently, when computing matrix vector multiplication on the CPU, the
`amrex::SpMV` method accesses memory of the input vector wherever the
matrix has non zero column indices. This can lead to errors since
accessing out of bounds memory is UB and can effectively lead to wrong
results.

This commit proposes first a fix checking if the column value is in the
correct range.

This commit also proposes to put the OpenMP loop one loop higher to
avoid excessive thread spawning.
Test added assembles a simple algebraic system.

Test fails without previous commit and passes with it.
This commit reverts the previous fix to SpMV and targets instead the
`SpMatrix::prepare_comm` method instead. In the case where the problem
is distributed but the non-zeros are local to each process, the previous
implementation did not shift the column values leading to accessing
unallocated memory.

This fix introduces an else clause that simply shifts the values of the
columns in this scenario similar to what happens when the non-zeros are
separated into local and remote parts.

This commit allows both the new `SpMv` test and the existing `GMRES`
case to pass.
@julien-fausty-gev
Copy link
Copy Markdown
Contributor Author

@WeiqunZhang please review this second iteration when you have the time. The GMRES test should pass as should the new SpMV test.

@WeiqunZhang
Copy link
Copy Markdown
Member

Yes, I believe this is the correct fix. There is a minor issue in the test setup. The row offset vector has nrows+1 elements. We need to set the last element too. So I fixed the test and I also fused the two ParallelFor loops.

@WeiqunZhang
Copy link
Copy Markdown
Member

/run-hpsf-gitlab-ci

@WeiqunZhang
Copy link
Copy Markdown
Member

Actually we don't even need to set row offsets because they are already set correctly in the constructor of SpMatrix.

@WeiqunZhang
Copy link
Copy Markdown
Member

/run-hpsf-gitlab-ci

WeiqunZhang added a commit that referenced this pull request Dec 11, 2025
This is why the HPSF Gitlab CI failed to run in #4844.
@WeiqunZhang
Copy link
Copy Markdown
Member

/run-hpsf-gitlab-ci

@github-actions
Copy link
Copy Markdown

@amrex-gitlab-ci-reporter
Copy link
Copy Markdown

GitLab CI 1344715 finished with status: failed. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1344715.

@WeiqunZhang WeiqunZhang merged commit f2f9dda into AMReX-Codes:development Dec 11, 2025
86 of 88 checks passed
@julien-fausty-gev
Copy link
Copy Markdown
Contributor Author

Thanks for your help @WeiqunZhang! And thanks for merging

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants