Skip to content

Conversation

@peastman
Copy link
Contributor

This partially address #1136. This code has only been minimally tested! To fully convert a force field and do proper tests, we need to also update the openforcefields conversion script, and possibly add a feature to ParmEd so you can specify a prefix for atom types.

@swails
Copy link
Contributor

swails commented Feb 17, 2021

Looks like master needs to be merged first?

@peastman
Copy link
Contributor Author

What do you mean?

@swails
Copy link
Contributor

swails commented Feb 17, 2021

image

@peastman
Copy link
Contributor Author

I don't get that message! It just tells me merging is restricted to authorized users. I've hopefully done the merge it wants.

@swails
Copy link
Contributor

swails commented Feb 17, 2021

@jchodera - do you have a test for this?

@jchodera
Copy link
Contributor

jchodera commented Feb 20, 2021

@peastman: I'm a bit confused by the implementation. It looks like you go through the effort of tracking which atom types have nonstandard scee/scnb and modifying the 1-4 interactions for those atoms, but only for those that set scee/scnb to 1 exactly. Anything else still throws an exception. Am I missing something, or wouldn't it be straightforward to extend this to track any nonstandard scee/scnb and apply those nonstandard values in the script?

@jchodera - do you have a test for this?

@swails: I'm guessing that the easiest way to properly test this is to convert oldff/leaprc.ff10 and oldff/leaprc.GLYCAM_06j_10, export these to OpenMM ffxml file, and then try to parameterize a short glycosylated peptide such as 1KYJ. If we prepare Amber prmtop/inpcrd files with LEaP and compare the potential energy for OpenMM-parameterized peptide and the Amber prmtop/inpcrd system, we should be able to detect errors.

The difficulty here is that preparing even simple glycosylated peptide systems in Amber is an incredible pain. It would be much easier if there was a pre-prepped system ready to go.

Update: I think I may have found a pre-prepared system in amber20_src/AmberTools/test/leap/glycam/06j_10/Chains.{parm7,rst7}.save, which I can convert to a (someone noncompliant) PDB with CONECT records with

ambpdb -p Glycoprotein.parm7.save -c Glycoprotein.rst7.save -conect > Glycoprotein.pdb

I can work on this test, but I'll have to open a separate PR since this is contributed from @peastman's branch.

Note that the GLYCAM force field conversion may be a problem in general since they inexplicably seem to have made the choice to use a different definition of what constitutes a "residue" than the PDB, so while we can go from AMBER PDB -> OpenMM System, we won't be able to go from RCSB PDB -> OpenMM System with these definitions.

Really looking forward to OpenFF adding support for glycans so all of this will "just work"...

@jchodera
Copy link
Contributor

jchodera commented Feb 20, 2021

@peastman : I realize I'm running really short on time, and don't want to hold this PR up further, so here is a ZIP archive of a directory you can use in this test:
glycan.zip

  • This glycan test directory should be added to ParmEd/test/files/.
  • It looks like ambertools is included in the test environment, so you shouldn't have to include any force field files---just load oldff/leaprc.ff10 and oldff/leaprc.GLYCAM_06j_10 as parameter sets, write them to OpenMM ffxml files, and then use them to parameterize Glycoprotein.pdb and create a System object.
  • This can be compared with a System object created with the OpenMM app AmberPrmtop/Inpcrd loaders from Glycoprotein.rst7.save (inpcrd) and Glycoprotein.parm7.save (prmtop)

Note that the test uses the older ff10 force field and matching glycam force field because more modern test systems were not already pre-prepared in the AmberTools pool of tests.

@peastman
Copy link
Contributor Author

Am I missing something, or wouldn't it be straightforward to extend this to track any nonstandard scee/scnb and apply those nonstandard values in the script?

See the discussion at #1136 (comment).

@jchodera
Copy link
Contributor

@peastman: I'm not sure I quite follow. Couldn't we throw an exception only if we run into an ambiguous case, such as those cases you enumerate? This PR seems to do the opposite, and only handles the special case where the scee/scnb are exactly unity.

@swails
Copy link
Contributor

swails commented Feb 22, 2021

This PR seems to do the opposite, and only handles the special case where the scee/scnb are exactly unity.

This PR is quite narrow in that it supports any combination of force fields where there is only one non-unity set of scaling factors supported. While narrow, it does provide support for all Amber force fields and combinations out there: GLYCAM uses unity scaling factors and the protein, RNA, and Lipid force fields all have the same scaling factors (1/1.2 electrostatic and 1/2 Lennard-Jones).

So this PR should support GLYCAM + any/all of the other Amber force fields, since it will support the single scaling factor from the protein, RNA, and Lipid force fields along with the unity (unscaled) GLYCAM force field.

@jchodera
Copy link
Contributor

OK, let's run with it, then.

@peastman: Here are the files you can use in a test!
#1149 (comment)

@jchodera
Copy link
Contributor

@peastman: Could you add the test that @swails requested using the test input files I provided in #1149?

I'm unfortunately swamped by grant proposals right now, so wouldn't be able to get to this for a couple of weeks.

@peastman
Copy link
Contributor Author

I'll try to get to it within the next few days. I've been busy with other things.

@jchodera
Copy link
Contributor

Thanks! Once this is done, I can quickly convert the GLYCAM force field, which will really help our group prepare a bunch of the new SARS-CoV-2 mutants and get them on Folding@home ASAP so we can assess their impact on ACE2 binding! The lack of GLYCAM is blocking them.

@peastman
Copy link
Contributor Author

I'm not sure what I'm supposed to do with those files? There's a PDB file, and a bash script that just sets a single variable, and then some other files of types I'm not familiar with.

I'm attaching the converted force field. I edited the unscaled_types array by hand to add the glycan- prefix to all the types, but otherwise it's exactly what you get from this code and your convert_amber.py script.

GLYCAM_06j-1.xml.txt

@jchodera
Copy link
Contributor

I'm not sure what I'm supposed to do with those files? There's a PDB file, and a bash script that just sets a single variable, and then some other files of types I'm not familiar with.

@peastman: Apologies for the miscommunication. Here's a summary:

  • @swails asked that a test be included in your PR
  • I suggested a simple test could be to convert the GLYCAM force field and then have OpenMM ForceField parameterize a PDB file that contains a few amino acids and a few glycans, computing the energy both with the ForceField parameterized System and a System created from the prmtop/inpcrd provided by AMBER.
  • I found a GLYCAM test included with the AmberTools20 distribution that includes a short peptide with a few glycans, and ZIPped up a directory containing the prmtop/inpcrd provided by AMBER and a PDB file with CONECT records.
  • I posted this here with a description of the test suggestion since I'm swamped with grant proposals to fund COVID-19 drug discovery work, and asked if you could add the test to your PR in the meantime.

@peastman: Can you also review the important details in openmm/openmmforcefields#156 where my group attempted to use this PR? There are two main issues:

  • The GLYCAM force field uses some Amber protein atom types, so the prefix glycam- will not work. We need to unify the namespace of types or classes with the protein force field. This requires we make some decisions regarding the still-open issue Harmonize prefixes for AMBER protein force field additions openmm/openmmforcefields#99
  • Given that you had to add the prefix by hand, it seems we will also want to extend the ParmEd ffxml parameter writing method to allow us to specify the prefix so that we don't have to reprocess the XML afterwards. This will significantly simplify things.

Thanks so much for your help in wrapping this up, @peastman---the group is trying to get these simulations of SARS-CoV-2 mutants set up ASAP so we can help our experimental collaborators determine whether we need to worry about emerging variants in developing therapies.

@peastman
Copy link
Contributor Author

and ZIPped up a directory containing the prmtop/inpcrd provided by AMBER and a PDB file with CONECT records.

There are no prmtop or inpcrd files in your zip. Can you check that you included the right set of files?

Can you also review the important details in openmm/openmmforcefields#156 where my group attempted to use this PR? There are two main issues:

You left out the biggest problem. Quoting from that issue,

Currently, the primary showstopper for the test is that ffxml/GLYCAM_06j-1.xml does not contain residues that are actually part of glycans. The only residues that are present are the modified protein residues to which glycans can bond (e.g. NLN, which is the modified version of ASN).

I can confirm that only those residues are present in GLYCAM_amino_06j_12SB.lib. I don't know enough about the force field to comment on whether others should be present as well, or where it should get them from.

Given that you had to add the prefix by hand, it seems we will also want to extend the ParmEd ffxml parameter writing method to allow us to specify the prefix so that we don't have to reprocess the XML afterwards.

Possibly so. Alternatively it could be done with a modification to convert_amber.py. See my comment at the top of this thread. That's a separate feature though.

@jchodera
Copy link
Contributor

There are no prmtop or inpcrd files in your zip. Can you check that you included the right set of files?

Sorry about the information being spread out in multiple comments---I mentioned this here. Older versions of amber sometimes use different file naming conventions, so Glycoprotein.parm7.save is an Amber 7 format prmtop file (the current generation of prmtop files we read, I believe), and Glycoprotein.rst7.save is an inpcrd file.

You left out the biggest problem. Quoting from that issue,
I can confirm that only those residues are present in GLYCAM_amino_06j_12SB.lib. I don't know enough about the force field to comment on whether others should be present as well, or where it should get them from.

I think I can fix that problem!

The residue definitions are loaded by leaprc.GLYCAM_06j-1 from a PREP file:

##
#
#       load all prep files for polysaccharides
#
loadamberprep GLYCAM_06j-1.prep

The issue appears to be that ParmEd does not implement the loadamberprep command for LEaP---something I hadn't realized before. It must assume that all old-style .prep files have been converted to lib files, but that has only happened with the modified amino acids for GLYCAM.

The GLYCAM leaprc files seem to be the only files in the current AmberTools distribution that use loadamberprep.

It appears I can manually create a .lib file from this .prep file using a leap script such as this:

loadamberprep GLYCAM_06j-1.prep
saveoff { UYB 4YB VMB 2MA 0YB 0FA ... } GLYCAM_06j-1.lib

where I have to list the whole set of 100+ residue names in {...}.

It looks like I can do this and include the modified lib file and parameters in openmmforcefields, so I should be able to solve that problem there.

@swails: How would we get such a change eventually made upstream in the AmberTools distribution?

Possibly so. Alternatively it could be done with a modification to convert_amber.py. See my comment at the top of this thread. That's a separate feature though.

I think it would simplify things greatly to add this feature to ParmEd so we don't have to hack XML files.

@swails
Copy link
Contributor

swails commented Feb 28, 2021

I think we can simply do a regression test here. Convert a GLYCAM force field into an OpenMM XML file and then just compare the generated XML with what this version of the code produces.

@swails: How would we get such a change eventually made upstream in the AmberTools distribution?

It's not really fair to try and force the GLYCAM maintainers to abandon PREP files to make this particular task easier, although they may be amenable if it increases adoption of GLYCAM.

It may be time to simply bite the bullet and add a PREP file parser (but no writer, since I think Amber is actively trying to kill off that format). I'll also need to break down and actually write code (i.e., copy code from somewhere) to interconvert between internal and cartesian coordinates.

@jchodera
Copy link
Contributor

It's not really fair to try and force the GLYCAM maintainers to abandon PREP files to make this particular task easier, although they may be amenable if it increases adoption of GLYCAM.

The .prep format is 27 years old, isn't it? A pre-LEaP remnant.

@drroe
Copy link

drroe commented Feb 28, 2021

The .prep format is 27 years old, isn't it? A pre-LEaP remnant.

The PDB format is over 40 years old... 🤷‍♂️

@jchodera
Copy link
Contributor

The PDB format is over 40 years old... 🤷‍♂️

That's why the RCSB was sensible enough to formally deprecate it and force everyone to use a more modern, flexible format. :)

Supporting old formats should be "best effort", but can't drive development into a corner.

@drroe
Copy link

drroe commented Feb 28, 2021

Supporting old formats should be "best effort", but can't drive development into a corner.

Absolutely. If everyone stopped using the PDB format tomorrow I would clap my hands and jump for joy. My point however is that as of right now it is still one of the most widely-used formats, so supporting it remains a necessary evil.

If this issue can be solved without adding support for the PREP format, that would be great. However, if supporting PREP is the only (or most straightforward) way to address this issue, then as Jason said it might be time to bite the bullet. I think the next step is to reach out to the Glycam devs directly and get their take. If they are willing to abandon the PREP format, then this becomes a non-issue and that old format can finally be put out to pasture...

@jchodera
Copy link
Contributor

If this issue can be solved without adding support for the PREP format, that would be great. However, if supporting PREP is the only (or most straightforward) way to address this issue, then as Jason said it might be time to bite the bullet. I think the next step is to reach out to the Glycam devs directly and get their take. If they are willing to abandon the PREP format, then this becomes a non-issue and that old format can finally be put out to pasture...

My understanding is that a two-line LEaP script is all that is needed to generate the requisite lib file for inclusion in AmberTools, but that this step hasn't been undertaken because it was not recognized as being useful. Now, we have a clear use case, and all we need to do is persuade the AmberTools/GLYCAM devs to add this two-line LEaP script and run tleap with each new release. Even better would be to also make a one-line change in the GLYCAM leaprc files to use the lib file instead.

In the meantime, I can run the two-line LEaP script manually and bundle the files with openmmforcefields, and eliminate those files once AmberTools is updated. I already do this with all gaff*.dat versions, since AmberTools does not bundle them all (but should).

@zhang-ivy
Copy link
Contributor

@peastman Yes, this is the first issue I encountered when trying to convert the forcefields.
For the short term (testing purposes), I've been generating my system using ffxml/ff14SB.xml here

For the long term, we'll have to make a decision on whether we should enforce protein atom names in ffxml/GLYCAM_06j-1.xml to contain the protein- prefix (to be used with protein.ff14SB)? Or if we should leave ffxml/GLYCAM_06j-1.xml as is and use ff14SB as the protein ff

@peastman
Copy link
Contributor Author

The problem seems to be from missing definitions. I examined one particular bond that is present in amber_after.xml but not in omm_after.xml: the bond between atom 155 (atom N of NLN) and atom 157 (atom CA of NLN). The types of those atoms are N (which has class N) and glycan-CX (which has class CX). Looking at the <HarmonicBondForce> tag, there are no definitions at all involving class CX.

For the long term, we'll have to make a decision on whether we should enforce protein atom names in ffxml/GLYCAM_06j-1.xml to contain the protein- prefix (to be used with protein.ff14SB)?

The prefix is essential. The force field won't work correctly without it. See my comments at #1136 (comment).

@zhang-ivy
Copy link
Contributor

The prefix is essential.

Isn't the prefix only essential when using protein.ff14SB.xml? As I mentioned above, I was using ff14SB.xml to parametrize the above systems, so the atom types do not contain protein- prefixes

@peastman
Copy link
Contributor Author

Looking at GLYCAM_06j.dat, I see that it also contains no parameters for bonds involving CX atoms. Where are those parameters coming from?

@peastman
Copy link
Contributor Author

The prefix is essential when combining multiple types of molecules. The protein and DNA force fields each define a type called "C", for example. Without the prefixes, OpenMM will report an error when you try to combine them, since you aren't allowed to define two different types with the same name.

@zhang-ivy
Copy link
Contributor

At the top of GLYCAM_06j.dat, CX is defined as:

CX 12.01                             protein C-alpha (new to ff10)  copied from parm10

I also don't see bond parameters defined for CX, but I do see angle and torsion parameters. Not sure why this is. Perhaps we CXs need to be interpreted as CAs?

@peastman
Copy link
Contributor Author

That's defining the atom type. Scroll down to the next section which defines bond parameters. There are no definitions involving CX.

@peastman
Copy link
Contributor Author

Sorry, missed the second half of your post. I don't think Amber supports any kind of subclassing for atom types? If an atom has type CX, so far as I know it will only use definitions for that specific type.

@zhang-ivy
Copy link
Contributor

zhang-ivy commented Jun 14, 2021

The prefix is essential when combining multiple types of molecules. The protein and DNA force fields each define a type called "C", for example. Without the prefixes, OpenMM will report an error when you try to combine them, since you aren't allowed to define two different types with the same name.

For the test case above, I noted that I am only using ff14SB.xml and GLYCAM_06j-1_trunc.xml, so there won't be any naming clashes for parametrizing this system.
For long term/general use cases, I can add the protein- prefix to protein atoms in the glycan ff, but I just want to clarify that there should not be any prefix clashes in the test case we are discussing here.

@peastman
Copy link
Contributor Author

I think I see. The problem is in the prefixes. CX is not a glycam type. It's the standard type used for protein CA atoms. So the type of that atom should be protein-CX, not glycan-CX.

In the definition of the NLN residue, there's a mix of atoms with and without the glycan- prefix. How did you add the prefixes?

@zhang-ivy
Copy link
Contributor

@peastman
Copy link
Contributor Author

Ok, I think that clears it up. We already knew that method would need to change to support this force field.

@zhang-ivy
Copy link
Contributor

Ok, it seems like there are a lot of atom types that appear in both GLYCAM_06j.dat and ff14SB.xml. Is the desired behavior: iterate through all atom types in GLYCAM_06j.dat and if the atom type is present in ff14SB, add the protein- prefix, otherwise add the glycan- prefix?

@peastman
Copy link
Contributor Author

I suggest we merge this, then open a separate issue to discuss prefixes. There are several issues that need to be considered, and it may require changes in both ParmEd and openmmforcefields.

@zhang-ivy
Copy link
Contributor

zhang-ivy commented Jun 14, 2021

Sure, I'd be happy to open an issue and start the discussion. My only hesitation with merging this PR is that the nonbonded energies are still very discrepant: #1149 (comment)
So it's unclear whether this PR actually has the right fixes.

@peastman
Copy link
Contributor Author

That's expected. The code to handle scaling of 1-4 interactions depends on atom types. Since the prefixes are currently being applied inconsistently, it won't work correctly.

@zhang-ivy
Copy link
Contributor

Right, but we should probably wait until the atom type prefixes have been fixed (and then run my test again to confirm that the energy discrepancy is no longer present) before merging this. Otherwisse, we cannot assess whether the changes in this PR are scaling energies appropriately.

@zhang-ivy
Copy link
Contributor

zhang-ivy commented Jun 14, 2021

@peastman: I did not open a new issue to discuss the prefix issue, but I tagged you in an existing PR in openmmforcefields here. This PR contains the changes I've made in order to get the glycan ff testing working.

@peastman
Copy link
Contributor Author

This PR wasn't intended to fix all problems with GLYCAM, just one particular one. Getting it working will take multiple PRs across multiple repositories. That's going to be hard to do if we don't merge any of them until all of them are finished.

I already verified that this produces correct scaling when given a force field where all atom types are defined correctly.

@peastman
Copy link
Contributor Author

Is this ok to merge?

@zhang-ivy
Copy link
Contributor

Yes go ahead!

@peastman
Copy link
Contributor Author

@swails is the one who can merge it.

@swails swails merged commit dfb7305 into ParmEd:master Jun 22, 2021
@swails
Copy link
Contributor

swails commented Jun 22, 2021

Merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants