Skip to content

Conversation

@mattwthompson
Copy link
Collaborator

Improves molecule filtering in CI tests by

  • Default to running 50 unique molecules in SMIRNOFF energy tests
  • Default to filtering out molecules with > 40 atoms
  • Randomly selecting the molecule(s) to run instead of always filtering out the same molecule

Might fix #330

@codecov-commenter
Copy link

codecov-commenter commented Mar 21, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.41%. Comparing base (37bec91) to head (6162834).

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #370   +/-   ##
=======================================
  Coverage   71.41%   71.41%           
=======================================
  Files           5        5           
  Lines         822      822           
=======================================
  Hits          587      587           
  Misses        235      235           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mattwthompson
Copy link
Collaborator Author

@epretti
Copy link
Member

epretti commented Mar 21, 2025

I found you may also need to add another exception to exclude from the list of forcefields. spce should be excluded along with tip and opc in TestSMIRNOFFTemplateGenerator.test_energies (and maybe TestGAFFTemplateGenerator.test_parameterize too?). I guess it was added at some point. spce was excluded from only 1 of the 3 places where water models are checked and excluded. Might be good to have a list at the top of the test file of the strings to check instead of duplicating.

Looks like there are some other actual failures in there in terms of energies not matching up:
https://github.com/openmm/openmmforcefields/actions/runs/13994740439/job/39186960028?pr=370#step:6:6353
I can take a look unless you're already digging into it.

@epretti
Copy link
Member

epretti commented Mar 21, 2025

So just taking a look at the one molecule linked above that is failing, [H][S][C](=[S])[N]([C]([H])([H])[C]([H])([H])[H])[C]([H])([H])[C]([H])([H])[H], the conformer generated by the OpenFF toolkit is far from the equilibrium geometry predicted by smirnoff99Frosst-1.0.2. The large forces cause the molecule to disintegrate when dynamics are run, so one gets extremely large energy values. They then still match very well between the reference and test systems, but the test is absolute energy difference instead of relative, so it fails.

This could be fixed by changing the energy difference check

ENERGY_DEVIATION_TOLERANCE = 1.0e-2 * unit.kilocalories_per_mole
delta = template_energy["total"] - reference_energy["total"]
if abs(delta) > ENERGY_DEVIATION_TOLERANCE:

to also include a relative as well as an absolute tolerance, and/or running a minimization between these lines
context.setPositions(molecule.conformers[0].to_openmm())
integrator.step(nsteps)

Perhaps that would fix some other failures as well.

The minimized geometry for this molecule looks extremely suspect to me. The root issue for this particular molecule is that this force field is assigning a 4 angstrom (!) bond length.
Screenshot 2025-03-21 at 11 37 28

@mattwthompson
Copy link
Collaborator Author

Heh, I was just in the process of tidying up a small number of details in that file before passing it off to you for a more serious analysis. I hoped to find an easy way to get compare_energies to report more information but found it wasn't something I could do very quickly. Anyway -


I think I speak for the entirety of the initiative in saying that smirnoff99Frosst was a essential stepping stone in the development of our force fields but was never meant to be used in perpetuity. Sage force fields mostly (but not completely) replace Parsley force fields which themselves mostly replaced smirnoff99Frosst. Skipping over some details, there's just no reason for somebody to use it as a "starting point" force field in the middle of 2025.

I don't think we should necessarily make it hard for people to use it in downstream packages like this, but it's also not worth the trouble of digging deep into its particular issues or putting a ton of CI time into testing it. I suggest skipping over it either in tests (my preference) or altogether.

@epretti
Copy link
Member

epretti commented Mar 21, 2025

Happy to take a look at it whenever you're ready. Thanks for the note on the context of the development of smirnoff99Frosst.

@mattwthompson
Copy link
Collaborator Author

Seeing failures with (only) this molecule, so I'm looking to see if it's something that only happens with a particular force field

image

@mattwthompson
Copy link
Collaborator Author

Okay, the failure is with 2.2.1RC1 but it's the same as the released force field:

$ diff ~/software/openff-forcefields/openforcefields/offxml/openff-2.2.1-rc1.offxml ~/software/openff-forcefields/openforcefields/offxml/openff-2.2.1.offxml
4c4
<     <Date>2024-07-22</Date>
---
>     <Date>2024-09-11</Date>

@epretti what should we do here?

@epretti
Copy link
Member

epretti commented Mar 24, 2025

OK, I can reproduce, and I'm seeing this with a few force fields. It's the same kind of problem as above with the other molecule I looked at. In this case I think nothing's wrong with the force field, but the generated conformer is bad. A close contact between two atoms creates large forces, the dynamics become unstable, and the resulting configuration can have extremely large energies. Though they match to high precision in the test, the absolute value of the difference can be large.
image1
I think the best way to handle this (that appears to fix the problem for me) is to just minimize before running dynamics; changing

context.setPositions(molecule.conformers[0].to_openmm())
integrator.step(nsteps)

to

context.setPositions(molecule.conformers[0].to_openmm()) 
openmm.LocalEnergyMinimizer.minimize(context)
integrator.step(nsteps) 

molecules = molecules[:MAX_MOLECULES]
ligand_structures = ligand_structures[:MAX_MOLECULES]
print(f"{len(molecules)} molecules remain after filtering")
ligand_structures = self.filter_molecules(molecules, max_molecules=3 if CI else 6)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will cause a problem because the ligand_structures are ParmEd Structure objects, and not Molecule objects. Adding them to Structure objects below, as protein_structure + ligand_structure, puts ParmEd into an infinite recursion. This is a ParmEd bug as that should really raise TypeError, but in any case, the addition should be Structure + Structure.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, didn't even know that was the type of that object. Only was reading portions of the code

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should have caused some test(s) to fail

I think we can leverage an accident here in which these classes have similar and similarly-named attributes? ff351ba

In [1]: import parmed

In [2]: from openff.toolkit import Molecule

In [3]: molecule = Molecule.from_smiles(1000 * "C")

In [4]: structure = parmed.Structure()

In [5]: isinstance(molecule.atoms, list)
Out[5]: True

In [6]: isinstance(structure.atoms, list)
Out[6]: True

In [7]: len(molecule.atoms)
Out[7]: 3002

In [8]: len(structure.atoms)
Out[8]: 0

@mattwthompson mattwthompson force-pushed the improve-molecule-filtering branch from 89abb60 to 78550eb Compare March 28, 2025 19:16
@mattwthompson
Copy link
Collaborator Author

@epretti I think this is mostly ready for you to take over

@epretti epretti marked this pull request as ready for review April 1, 2025 23:19
@epretti
Copy link
Member

epretti commented Apr 1, 2025

This should be working now. I updated the troublesome GAFF partial charge test that we discussed recently to just create example user-defined charges that sum to the specified amount. I'm creating a separate issue for what to do about this problem in general, and if it eventually leads to a PR, I can add separate tests there.

@epretti epretti enabled auto-merge April 10, 2025 18:14
@epretti epretti merged commit ae76e9d into main Apr 10, 2025
14 checks passed
@epretti epretti deleted the improve-molecule-filtering branch April 10, 2025 19:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SMIRNOFF energy test failing

4 participants