-
Notifications
You must be signed in to change notification settings - Fork 475
Make combined omega DF algorithm consistent with uncombined version. #2283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
There's a lot going on here, so bear with me. In DF algorithms we approximate In cases where the density involved is not totally symmetric, such as excited states, the symmetric DF intermediates are not used and instead the decomposition is asymmetric: |
|
Some tests from the original issue on a 48 core Intel Xeon Gold 5120 Linux box, with OneAPI compilers (version 2021.2.0) as well as GCC 7.1, with and without the bug fixed:
|
|
Discarding the near-singular values during eigendecomposition cleans things up a lot, so this PR should be merged. However, there's possibly more going on that I'm still investigating. The full inversion using OneAPI is pretty bad with 1 thread, but completely disastrous (like, billions of Hartrees bad) with 8. I tried to reproduce this by writing a standalone code to invert the exact same metric tensor using LAPACK and OneAPI's MKL implementation, but the two gave the same results, both with and without threads. Which strongly hints at a memory bug, as @susilehtola suggested on the original post. Valgrind takes a long time to run, but gives the following backtrace if I run with the setup described above This is a harmless copy construction of a vector that is well defined. I tried to remove the AVX2 code path using the MKL options but the problem persisted. Finding an old pre-AVX box and running on there did the trick, and Valgrind came back clean. It was clean for the GCC build also. Therefore it appears that there is a memory problem, but it's likely either an Intel compiler bug (I think OneAPI is still in beta technically) or a problem with me mixing |
loriab
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formidable detective work.
|
fwiw, no difference in test system behavior between the two originally reported environments. Both fail to converge with nthreads=8 and show the same small error with nthreads=1. Psi is the 1.4 conda package, so bugfix in the PR not applied. MKLs and iomp5's in the 2019–2021 range. Defaults-based env has gomp present (been so since the gcc 7.3->7.5 transition a couple months ago) but it's not linked into psi or showing trouble. |
|
Thanks, @loriab. I have tried a number of things, including ensuring that MKL and libiomp5 from the native compilers are used instead of conda versions of those libs, and nothing seems to change the outcome of the unpatched code with multiple threads. It's worth noting that running on an non-AVX platform removes the Valgrind errors, but doesn't change the answer so I think they're probably just coming from AVX memmove somehow confusing Valgrind. At this point, my only guess is that perhaps some weird nested thread issue is maybe causing diagonalization issues, but I have no idea why that would be a problem for the unpatched code and not for the patched code. In case anybody wants to check the results, here's a standalone code that diagonalizes the exact same matrix, using the exact same LAPACK calls as the unpatched code. So far it looks like things are working in Psi4 after the fix, but I just wanted to document all of the suspicious behavior on this PR, in case we see similar problems in future. |
|
FWIW I think all DF codes should just call the orthogonalization routines in |
|
Just to be clear, all other code that used |
|
Sorry, I missed your last question, @JonathonMisiewicz. Correct, there is a bug in MKL that is being fixed, so all |
|
That note was exactly what I was worried about. Thanks much, and approved. |
|
As of Friday, Andy said this "should be merged ASAP," so without further delay... |
Description
This addresses at least one of the problems (see details below) associated with #2279.
Todos
Checklist
Status