Skip to content

Conversation

@epretti
Copy link
Member

@epretti epretti commented Jan 23, 2025

This is a work-in-progress and I am opening this pull request as a draft to keep track of issues and potentially solicit feedback on some issues I have encountered during this. This updates the CHARMM force field distributed in FFXML format to the most recent as of this writing (July 2024) release.

  • Files that used to be excluded from the conversion that now read in properly:
    • toppar_all36_prot_aldehydes.str and toppar_all36_na_modifications.str appear to no longer cause the parameter inconsistency reported previously.
    • toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.
    • toppar_ions_won.str was excluded for an unknown reason from the water/ions files and is included again (one naming collision below could be worked around with a residue-specific exclusion).
  • Files that appear to contain duplicate sets of parameters, so one set must be chosen:
    • toppar_all36_prot_heme_for_new_psf_gen_code_2022.str and toppar_all36_na_rna_modified_for_new_psf_gen_code_2022.str are excluded in favor of toppar_all36_prot_heme.str and toppar_all36_na_rna_modified.str. I can't find documentation on what this is about, but I am guessing that a backwards-incompatible change in CHARMM led to a need for two versions of these files: no parameters seem to change, just a few entries regarding topology modification. ParmEd doesn't seem to complain with either version.
    • The lipid LJPME-optimized files are left out in favor of the standard lipid model. It seems like some of the lipid stream files have updated versions for this LJPME-optimized model, but others don't, so it was unclear that I would be generating a consistent set of parameters if I tried to include LJPME-optimized ones.
  • Inconsistencies in the CHARMM release that require exclusion of some files or residues:
    • Inconsistent parameter values for CG2R61-CG301 bond (bond from 6-membered aromatic ring carbon to aliphatic carbon with no hydrogens). The offending values are at line 574 of par_all36_cgenff.prm and line 22869 of toppar_all36_carb_glycolipid.str. The glycolipid file is excluded as many other files depend on the CGenFF file, and I am proceeding under the assumption that until we know better, it is better to leave out a file than to make manual modifications to one.
    • DMPR is defined as a residue at line 26903 of top_all36_cgenff.rtf (dimethylpropanamide) and as a patch at line 386 of toppar_all36_na_reactive_rna.str (thio-substitution of dimyristoyl-D-glycero-1-phosphatidic acid). This name collision prevents ParmEd from reading both files, so the reactive RNA file is excluded.
    • TM3P is defined as a residue at line 7093 of top_all36_cgenff.rtf (4'-methyl,3'-phosphate tetrahydrofuran) and Tm3p is defined as a residue at line 341 of toppar_ions_won.str (Thulium(III) ion). ParmEd uppercases many names read which causes a collision when the force field and water/ion XML files are read together. The ion version of the residue is excluded.
    • Nucleic acids remain excluded from the Drude conversion because ParmEd (still?) can't read the nucleic acid parameter file. It is not clear if the problems indicated in drude2019.yaml are still ones causing this issue, or if this is due to something else.
  • Some things whose use was unclear or that were effectively non-functional were removed. If someone knows that they are actually wanted in a working state, let me know:
    • protein.yaml to ostensibly create a protein-only version of the force field, except that no corresponding XML file exists in the repository.
    • Splitting out of "model compounds" into separate XML files. These toppar_all36_*_model.xml force fields don't actually contain any residues. Two are empty, containing nothing except some spurious references, and two contain a few atom types and parameters that are then never used.
  • Some missing references were added to charmm36.yaml.

Presently, I am trying to run the tests to ensure that this conversion is not completely incorrect. Unfortunately there are some issues because of distinct residue templates in the force field that OpenMM sees as topologically equivalent. Apparently this was not an issue before, but the test code may need modification. In the meantime, I wanted to document the current issues with the conversion for reference and in case anyone has encountered these exact problems before.

* Checked which excluded files that previously caused problems could now
  be included
* Checked which included files caused inconsistencies or other problems
  and needed to be excluded
* The protein-only force field didn't exist in the ffxml directory so I
  removed protein.yaml (not sure if this is desired)
* Added toppar_ions_won.str to the water models (a citation was present
  but the file was missing)
* Updated some missing citations
* Previously, there were various files for model compounds that were
  split out, but these XML files didn't have any residues in them and only
  had a few random atom types (or were entirely empty other than containing
  irrelevant references).  These have been removed.
@codecov-commenter
Copy link

codecov-commenter commented Jan 23, 2025

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 71.41%. Comparing base (d5022d7) to head (8bb66ea).
Report is 1 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #356   +/-   ##
=======================================
  Coverage   71.41%   71.41%           
=======================================
  Files           5        5           
  Lines         822      822           
=======================================
  Hits          587      587           
  Misses        235      235           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@peastman
Copy link
Member

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

@epretti
Copy link
Member Author

epretti commented Jan 23, 2025

toppar_all36_prot_d_aminoacids.str was excluded for an unknown reason; this file does not cause any naming conflicts or parameter inconsistencies.

Can these be distinguished from the regular amino acids? I would expect them to differ only in conformation, not topology, and therefore to match the same templates.

This is part of the problem that I am encountering now. ParmEd will read and write them as distinct residues (possibly because they use different atom classes for their chiral centers), and OpenMM will read a force field with them. But when you try to create a system, OpenMM issues a "Multiple non-identical matching templates found" error.

This is not as simple as just excluding the D-amino acids, as some residues have topological equivalents elsewhere. For instance, so far I've run into ILE which has not only DILE from the D-amino acids, but also IIL (the diastereomer alloisoleucine) from another stream file. Do we want to exclude all of these cases (this will prevent users from simulating them but perhaps there is not a lot of demand for them)? If so, I will try to figure out how to go through and find all of them, and make sure I understand why ParmEd isn't eliminating them like it eliminates other topologically equivalent residues. There might have to be a table somewhere where we manually specify which are the "standard" ones that we want since they aren't all localized to one file.

@peastman
Copy link
Member

How about converting them, but keeping them as separate XML files? If you just include charmm36.xml you'll get only the standard amino acids. If you want the others (which should be pretty rare), you can include the second file. It will create conflicts, but you can resolve them with the residueTemplates argument to createSystem().

* Added functionality to split out residues from selected CHARMM
  parameter files

* Allow specifying "fixes" to the generated XML files: can provide XPath
  to find elements, and a tree of XML elements to append as children to
  those found
epretti and others added 13 commits February 3, 2025 15:29
* Copy improper objects before compression to work around ParmEd bug

* Add an XML fix for harmonic improper dihedrals to account for
  periodicity of the improper angle (to support equilibrium angles of pi)
* Adds functionality to output impropers in a way that should be
  compatible with their explicit specification in the CHARMM force field
  files

* Ensures that impropers spanning patch and residue atoms will not be
  neglected

* Ensures that duplicate improper entries will not be written to split
  files
An improper in a patch with some atoms in the patch might need to find
those atoms in a residue, but alternatively, the atoms might be in
another patch that also modified the residue, so search the residues
compatible with a patch for patches compatible with the residue during
improper processing.
Goal is to replace test_charmm.py and energy.py with something cleaner
and more flexible.  Can select CHARMM (if installed), OpenMM-CHARMM,
ParmEd-CHARMM, and OpenMM-FFXML modes and compare energies and forces
across force groups.  Can define tests in a directory containing a
test.yaml file instead of having things hardcoded.  Plan to port
remaining tests over.
* Uses a force field script to correctly insert impropers.  Should
  handle all cases except for ambiguous equivalent atoms (like OT1/OT2).

* Removed some miscellaneous test code from the conversion script since
  the test script handles this more thoroughly.
* New test script to test against CHARMM, OpenMM (reading a PSF) and
  OpenMM (reading FFXML).

* Converted old tests to new test format.

* Several new tests (amino acids and simple peptides for validation).

* Started reworking water box PSF generation as this was not placing the
  correct charges in the PSF files, leading to tests results that were
  not meaningful.
* Remove files left over from test system generation that are not being
  used by tests

* Remove some old test scripts whose behavior has been incorporated into
  new test script

* Note: 4- and 5-site water tests will be restored once it is determined
  how to generate the PSFs correctly with the virtual sites
* Use CHARMM to generate correct PSFs since ParmEd doesn't

* Update test script to update virtual site positions in OpenMM and skip
  testing virtual site forces (since CHARMM and OpenMM report them
  differently but accumulate them on the atoms the same way).
@epretti epretti marked this pull request as ready for review March 6, 2025 23:31
@epretti
Copy link
Member Author

epretti commented Mar 6, 2025

OK, there's a lot here, but this should be ready to look at now. Notes on changes and issues:

  • Some residues and patches are split out into separate files due to various issues and patching collisions (this is explained in more detail in the README)
  • Major overhaul to conversion script to allow proper splitting (see above), handling of impropers and anisotropic polarizabilities (it is necessary to bypass incorrect handling in ParmEd and do this directly via a script embedded in the force fields), and automated post-generation patching of FFXML files (needed for disulfide patch and some nucleic acid patches).
  • There are issues in ParmEd requiring a custom modified version of ParmEd to be used for the force field generation until issues (Issues in parsing CHARMM topology/parameter files ParmEd/ParmEd#1395, Integral net charge check for patching is incompatible with CHARMM36 nucleic acid FF ParmEd/ParmEd#1396, ParmEd OpenMM FFXML writer doesn't correctly handle patches that can modify residues in multiple ways ParmEd/ParmEd#1397) are fixed. The patched version can be found at epretti/ParmEd:patched-for-charmm-conversion. Note that this does not prevent the force fields from being used or tested, though!
  • Major overhaul to test script. Supports checking against locally installed or Docker containerized CHARMM, OpenMM reading CHARMM files, ParmEd reading CHARMM files, and the generated FFXML. Performs full energy and force decomposition. Test specifications are no longer hard-coded list items but are read in from YAML files.
  • Reorganized test suite to remove some extraneous files not read by CHARMM or OpenMM and to locate all YAML specifications and associated input files in dedicated directories.
  • Script to use CHARMM to regenerate water box test input files, since ParmEd doesn't support the necessary options for some water models with lone pairs or Drude atoms.
  • Testing of FFXML files against OpenMM and CHARMM is now fully automated in CI using a private (GitHub container registry) Docker container.

@peastman
Copy link
Member

This looks good as far as I can tell. I'm not familiar with the openmmforcefields code, though, so I'm just looking at the changes in isolation, not knowing the context. Perhaps @mikemhenry or @mattwthompson can take a look?

@epretti
Copy link
Member Author

epretti commented Mar 14, 2025

Even though this code doesn't really interact with the main openmmforcefields code and is effectively mostly standalone, another set of eyes would certainly be appreciated from anyone who was familiar with how this conversion used to work.

@mikemhenry
Copy link
Collaborator

Honestly @jchodera might be the best to take a look or know who used to handle the CHARMM conversion stuff -- but I can still look at it :)

@epretti
Copy link
Member Author

epretti commented Mar 21, 2025

@mikemhenry Wanted to see if you had a chance to look at this or not, or if you were planning to. No worries if you're busy with other things, as I know it's a substantial set of changes; just wanted to check.

@mikemhenry
Copy link
Collaborator

Thanks for the bump! I can review it!

Copy link
Collaborator

@mikemhenry mikemhenry left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! It looks like it is working and very self-contained. Also thank you for using idiomatic python!

csh \
gfortran \
build-essential
FROM ubuntu:latest AS build_charmm_base
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want to pin the ubuntu version here to whatever latest is currently so when it updates

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, will do!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .txt and not .py ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did that intentionally since they're not really valid Python (they have {format} fields in them that get filled in before being output to the force field: a somewhat ugly but simple solution to what I was trying to do). I didn't want an autoformatter/linter/whatever to complain or misinterpret these files, or a user to try to import or run them and be confused by cryptic errors. If you think those aren't problems and it would make more sense to have them as .py, I can do that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that is really valid!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why .txt and not .py ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my comment on convert_charmm_anisotropy_script.txt.

@mikemhenry
Copy link
Collaborator

Once we merge in this PR, we can add the "CHARMM conversion validation" check as required

@mikemhenry
Copy link
Collaborator

Thanks! I will turn on auto-merge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants