Update CHARMM force field to July 2024 release #356

epretti · 2025-01-23T18:50:03Z

This is a work-in-progress and I am opening this pull request as a draft to keep track of issues and potentially solicit feedback on some issues I have encountered during this. This updates the CHARMM force field distributed in FFXML format to the most recent as of this writing (July 2024) release.

Files that used to be excluded from the conversion that now read in properly:
- toppar_all36_prot_aldehydes.str and toppar_all36_na_modifications.str appear to no longer cause the parameter inconsistency reported previously.
- toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.
- toppar_ions_won.str was excluded for an unknown reason from the water/ions files and is included again (one naming collision below could be worked around with a residue-specific exclusion).
Files that appear to contain duplicate sets of parameters, so one set must be chosen:
- toppar_all36_prot_heme_for_new_psf_gen_code_2022.str and toppar_all36_na_rna_modified_for_new_psf_gen_code_2022.str are excluded in favor of toppar_all36_prot_heme.str and toppar_all36_na_rna_modified.str. I can't find documentation on what this is about, but I am guessing that a backwards-incompatible change in CHARMM led to a need for two versions of these files: no parameters seem to change, just a few entries regarding topology modification. ParmEd doesn't seem to complain with either version.
- The lipid LJPME-optimized files are left out in favor of the standard lipid model. It seems like some of the lipid stream files have updated versions for this LJPME-optimized model, but others don't, so it was unclear that I would be generating a consistent set of parameters if I tried to include LJPME-optimized ones.
Inconsistencies in the CHARMM release that require exclusion of some files or residues:
- Inconsistent parameter values for CG2R61-CG301 bond (bond from 6-membered aromatic ring carbon to aliphatic carbon with no hydrogens). The offending values are at line 574 of par_all36_cgenff.prm and line 22869 of toppar_all36_carb_glycolipid.str. The glycolipid file is excluded as many other files depend on the CGenFF file, and I am proceeding under the assumption that until we know better, it is better to leave out a file than to make manual modifications to one.
- DMPR is defined as a residue at line 26903 of top_all36_cgenff.rtf (dimethylpropanamide) and as a patch at line 386 of toppar_all36_na_reactive_rna.str (thio-substitution of dimyristoyl-D-glycero-1-phosphatidic acid). This name collision prevents ParmEd from reading both files, so the reactive RNA file is excluded.
- TM3P is defined as a residue at line 7093 of top_all36_cgenff.rtf (4'-methyl,3'-phosphate tetrahydrofuran) and Tm3p is defined as a residue at line 341 of toppar_ions_won.str (Thulium(III) ion). ParmEd uppercases many names read which causes a collision when the force field and water/ion XML files are read together. The ion version of the residue is excluded.
- Nucleic acids remain excluded from the Drude conversion because ParmEd (still?) can't read the nucleic acid parameter file. It is not clear if the problems indicated in drude2019.yaml are still ones causing this issue, or if this is due to something else.
Some things whose use was unclear or that were effectively non-functional were removed. If someone knows that they are actually wanted in a working state, let me know:
- protein.yaml to ostensibly create a protein-only version of the force field, except that no corresponding XML file exists in the repository.
- Splitting out of "model compounds" into separate XML files. These toppar_all36_*_model.xml force fields don't actually contain any residues. Two are empty, containing nothing except some spurious references, and two contain a few atom types and parameters that are then never used.
Some missing references were added to charmm36.yaml.

Presently, I am trying to run the tests to ensure that this conversion is not completely incorrect. Unfortunately there are some issues because of distinct residue templates in the force field that OpenMM sees as topologically equivalent. Apparently this was not an issue before, but the test code may need modification. In the meantime, I wanted to document the current issues with the conversion for reference and in case anyone has encountered these exact problems before.

* Checked which excluded files that previously caused problems could now be included * Checked which included files caused inconsistencies or other problems and needed to be excluded * The protein-only force field didn't exist in the ffxml directory so I removed protein.yaml (not sure if this is desired) * Added toppar_ions_won.str to the water models (a citation was present but the file was missing) * Updated some missing citations * Previously, there were various files for model compounds that were split out, but these XML files didn't have any residues in them and only had a few random atom types (or were entirely empty other than containing irrelevant references). These have been removed.

codecov-commenter · 2025-01-23T18:55:33Z

⚠️ Please install the to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.41%. Comparing base (d5022d7) to head (8bb66ea).
Report is 1 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #356   +/-   ##
=======================================
  Coverage   71.41%   71.41%           
=======================================
  Files           5        5           
  Lines         822      822           
=======================================
  Hits          587      587           
  Misses        235      235

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

peastman · 2025-01-23T19:21:39Z

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

epretti · 2025-01-23T19:49:33Z

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

This is part of the problem that I am encountering now. ParmEd will read and write them as distinct residues (possibly because they use different atom classes for their chiral centers), and OpenMM will read a force field with them. But when you try to create a system, OpenMM issues a "Multiple non-identical matching templates found" error.

This is not as simple as just excluding the D-amino acids, as some residues have topological equivalents elsewhere. For instance, so far I've run into ILE which has not only DILE from the D-amino acids, but also IIL (the diastereomer alloisoleucine) from another stream file. Do we want to exclude all of these cases (this will prevent users from simulating them but perhaps there is not a lot of demand for them)? If so, I will try to figure out how to go through and find all of them, and make sure I understand why ParmEd isn't eliminating them like it eliminates other topologically equivalent residues. There might have to be a table somewhere where we manually specify which are the "standard" ones that we want since they aren't all localized to one file.

peastman · 2025-01-23T20:00:12Z

How about converting them, but keeping them as separate XML files? If you just include charmm36.xml you'll get only the standard amino acids. If you want the others (which should be pretty rare), you can include the second file. It will create conflicts, but you can resolve them with the residueTemplates argument to createSystem().

* Added functionality to split out residues from selected CHARMM parameter files * Allow specifying "fixes" to the generated XML files: can provide XPath to find elements, and a tree of XML elements to append as children to those found

* Copy improper objects before compression to work around ParmEd bug * Add an XML fix for harmonic improper dihedrals to account for periodicity of the improper angle (to support equilibrium angles of pi)

* Adds functionality to output impropers in a way that should be compatible with their explicit specification in the CHARMM force field files * Ensures that impropers spanning patch and residue atoms will not be neglected * Ensures that duplicate improper entries will not be written to split files

An improper in a patch with some atoms in the patch might need to find those atoms in a residue, but alternatively, the atoms might be in another patch that also modified the residue, so search the residues compatible with a patch for patches compatible with the residue during improper processing.

Goal is to replace test_charmm.py and energy.py with something cleaner and more flexible. Can select CHARMM (if installed), OpenMM-CHARMM, ParmEd-CHARMM, and OpenMM-FFXML modes and compare energies and forces across force groups. Can define tests in a directory containing a test.yaml file instead of having things hardcoded. Plan to port remaining tests over.

for more information, see https://pre-commit.ci

* Uses a force field script to correctly insert impropers. Should handle all cases except for ambiguous equivalent atoms (like OT1/OT2). * Removed some miscellaneous test code from the conversion script since the test script handles this more thoroughly.

* New test script to test against CHARMM, OpenMM (reading a PSF) and OpenMM (reading FFXML). * Converted old tests to new test format. * Several new tests (amino acids and simple peptides for validation). * Started reworking water box PSF generation as this was not placing the correct charges in the PSF files, leading to tests results that were not meaningful.

* Remove files left over from test system generation that are not being used by tests * Remove some old test scripts whose behavior has been incorporated into new test script * Note: 4- and 5-site water tests will be restored once it is determined how to generate the PSFs correctly with the virtual sites

for more information, see https://pre-commit.ci

* Use CHARMM to generate correct PSFs since ParmEd doesn't * Update test script to update virtual site positions in OpenMM and skip testing virtual site forces (since CHARMM and OpenMM report them differently but accumulate them on the atoms the same way).

epretti · 2025-03-06T23:31:49Z

OK, there's a lot here, but this should be ready to look at now. Notes on changes and issues:

Some residues and patches are split out into separate files due to various issues and patching collisions (this is explained in more detail in the README)
Major overhaul to conversion script to allow proper splitting (see above), handling of impropers and anisotropic polarizabilities (it is necessary to bypass incorrect handling in ParmEd and do this directly via a script embedded in the force fields), and automated post-generation patching of FFXML files (needed for disulfide patch and some nucleic acid patches).
There are issues in ParmEd requiring a custom modified version of ParmEd to be used for the force field generation until issues (Issues in parsing CHARMM topology/parameter files ParmEd/ParmEd#1395, Integral net charge check for patching is incompatible with CHARMM36 nucleic acid FF ParmEd/ParmEd#1396, ParmEd OpenMM FFXML writer doesn't correctly handle patches that can modify residues in multiple ways ParmEd/ParmEd#1397) are fixed. The patched version can be found at epretti/ParmEd:patched-for-charmm-conversion. Note that this does not prevent the force fields from being used or tested, though!
Major overhaul to test script. Supports checking against locally installed or Docker containerized CHARMM, OpenMM reading CHARMM files, ParmEd reading CHARMM files, and the generated FFXML. Performs full energy and force decomposition. Test specifications are no longer hard-coded list items but are read in from YAML files.
Reorganized test suite to remove some extraneous files not read by CHARMM or OpenMM and to locate all YAML specifications and associated input files in dedicated directories.
Script to use CHARMM to regenerate water box test input files, since ParmEd doesn't support the necessary options for some water models with lone pairs or Drude atoms.
Testing of FFXML files against OpenMM and CHARMM is now fully automated in CI using a private (GitHub container registry) Docker container.

peastman · 2025-03-14T21:27:51Z

This looks good as far as I can tell. I'm not familiar with the openmmforcefields code, though, so I'm just looking at the changes in isolation, not knowing the context. Perhaps @mikemhenry or @mattwthompson can take a look?

epretti · 2025-03-14T21:38:06Z

Even though this code doesn't really interact with the main openmmforcefields code and is effectively mostly standalone, another set of eyes would certainly be appreciated from anyone who was familiar with how this conversion used to work.

mikemhenry · 2025-03-14T22:22:12Z

Honestly @jchodera might be the best to take a look or know who used to handle the CHARMM conversion stuff -- but I can still look at it :)

epretti · 2025-03-21T21:11:08Z

@mikemhenry Wanted to see if you had a chance to look at this or not, or if you were planning to. No worries if you're busy with other things, as I know it's a substantial set of changes; just wanted to check.

mikemhenry · 2025-03-24T23:23:17Z

Thanks for the bump! I can review it!

mikemhenry

LGTM! It looks like it is working and very self-contained. Also thank you for using idiomatic python!

devtools/conda-envs/test_charmm_env.yaml

mikemhenry · 2025-03-26T19:09:31Z

charmm/Dockerfile

-        csh \
-        gfortran \
-        build-essential
+FROM ubuntu:latest AS build_charmm_base


You might want to pin the ubuntu version here to whatever latest is currently so when it updates

Thanks, will do!

mikemhenry · 2025-03-26T20:51:21Z

charmm/convert_charmm_anisotropy_script.txt

why .txt and not .py ?

I did that intentionally since they're not really valid Python (they have {format} fields in them that get filled in before being output to the force field: a somewhat ugly but simple solution to what I was trying to do). I didn't want an autoformatter/linter/whatever to complain or misinterpret these files, or a user to try to import or run them and be confused by cryptic errors. If you think those aren't problems and it would make more sense to have them as .py, I can do that.

I think that is really valid!

mikemhenry · 2025-03-26T20:51:28Z

charmm/convert_charmm_improper_script.txt

why .txt and not .py ?

See my comment on convert_charmm_anisotropy_script.txt.

mikemhenry · 2025-03-26T21:58:40Z

Once we merge in this PR, we can add the "CHARMM conversion validation" check as required

Co-authored-by: Mike Henry <[email protected]>

mikemhenry · 2025-03-26T23:35:29Z

Thanks! I will turn on auto-merge!

epretti added 4 commits January 21, 2025 11:49

Updated files for water models and ions

96eb471

Excluded an ion due to residue name collision

d451a16

Update main README

6caf871

This was referenced Jan 25, 2025

Update CHARMM files to July 2018 release #69

Closed

Update to July 2020 CHARMM c36 force field and CGenFF 4.4 #130

Closed

Increased flexibility of CHARMM conversion

ac01455

* Added functionality to split out residues from selected CHARMM parameter files * Allow specifying "fixes" to the generated XML files: can provide XPath to find elements, and a tree of XML elements to append as children to those found

epretti and others added 13 commits February 3, 2025 15:29

Work around some issues in ParmEd

3d253bf

* Copy improper objects before compression to work around ParmEd bug * Add an XML fix for harmonic improper dihedrals to account for periodicity of the improper angle (to support equilibrium angles of pi)

Split out CGenFF parameters to avoid conflicts

57f360e

Restore charmm36_protein for basic protein-only testing

b97b17b

[pre-commit.ci] auto fixes from pre-commit.com hooks

a3983b7

for more information, see https://pre-commit.ci

Corrections: typo and line too long caught automatically

aad2ce4

[pre-commit.ci] auto fixes from pre-commit.com hooks

a7243a4

for more information, see https://pre-commit.ci

[pre-commit.ci] auto fixes from pre-commit.com hooks

2d3b82d

for more information, see https://pre-commit.ci

epretti mentioned this pull request Feb 15, 2025

Problem with Simulating SP2 Patch openmm/openmm#4800

Closed

epretti added 2 commits February 18, 2025 10:34

Read coordinates from file instead of standard input for performance

a59a6be

epretti added 2 commits March 5, 2025 17:03

Merge branch 'main' into update-charmm-ff

78decae

Point to development OpenMM and correct CI image

6503664

epretti marked this pull request as ready for review March 6, 2025 23:31

epretti requested review from mattwthompson and mikemhenry March 14, 2025 21:35

mikemhenry approved these changes Mar 26, 2025

View reviewed changes

epretti and others added 3 commits March 26, 2025 15:28

Update devtools/conda-envs/test_charmm_env.yaml

b9daa50

Co-authored-by: Mike Henry <[email protected]>

Merge branch 'main' into update-charmm-ff

ce32d38

Pin to specific Ubuntu version instead of latest

8bb66ea

mikemhenry enabled auto-merge (squash) March 26, 2025 23:35

mikemhenry merged commit 675f14a into openmm:main Mar 26, 2025
14 checks passed

epretti mentioned this pull request Apr 30, 2025

Update CHARMM and Amber force field XML files openmm/openmm#4915

Closed

epretti deleted the update-charmm-ff branch August 6, 2025 23:25

epretti mentioned this pull request Aug 7, 2025

Release 0.15.0 #401

Merged

Update CHARMM force field to July 2024 release #356

Update CHARMM force field to July 2024 release #356

Uh oh!

Conversation

epretti commented Jan 23, 2025

Uh oh!

codecov-commenter commented Jan 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

peastman commented Jan 23, 2025

Uh oh!

epretti commented Jan 23, 2025

Uh oh!

peastman commented Jan 23, 2025

Uh oh!

epretti commented Mar 6, 2025

Uh oh!

peastman commented Mar 14, 2025

Uh oh!

epretti commented Mar 14, 2025

Uh oh!

mikemhenry commented Mar 14, 2025

Uh oh!

epretti commented Mar 21, 2025

Uh oh!

mikemhenry commented Mar 24, 2025

Uh oh!

mikemhenry left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mikemhenry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

epretti Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

epretti Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

epretti Mar 26, 2025

Choose a reason for hiding this comment

Uh oh!

mikemhenry commented Mar 26, 2025

Uh oh!

mikemhenry commented Mar 26, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

codecov-commenter commented Jan 23, 2025 •

edited

Loading