Skip to content

Conversation

@jchodera
Copy link
Member

@jchodera jchodera commented Oct 29, 2021

This PR adds support for espaloma small molecule parameters.

Addresses #181

  • Refactor SMIRNOFFResidueTemplateGenerator to separate out OpenMM System conversion into a mixin
  • Add EspalomaResidueTemplateGenerator
  • Add tests for EspalomaResidueTemplateGenerator
  • Add espaloma tests for SystemGenerator
  • Update revision history
  • Update docs in README.md
  • Ensure tests are passing
  • Update conda recipe to include validators (espaloma will be optional)

@jchodera
Copy link
Member Author

The current test failure is because I don't have a way to tell espaloma to use user-provided charges yet:

___________ TestEspalomaTemplateGenerator.test_charge_from_molecules ___________

self = <test_template_generators.TestEspalomaTemplateGenerator testMethod=test_charge_from_molecules>

    def test_charge_from_molecules(self):
        """Test that user-specified partial charges are used if requested"""
        # Create a generator that does not know about any molecules
        generator = self.TEMPLATE_GENERATOR()
        # Create a ForceField
        from openmm.app import ForceField
        forcefield = ForceField()
        # Register the template generator
        forcefield.registerTemplateGenerator(generator.generator)
    
        # Check that parameterizing a molecule using user-provided charges produces expected charges
        import numpy as np
        from openmm import unit
        molecule = self.molecules[0]
        charges = np.random.random([molecule.n_particles])
        total_charge = molecule.total_charge
        if type(total_charge) is unit.Quantity:
            # Handle openforcefield >= 0.7.0
            total_charge /= unit.elementary_charge
        charges += (total_charge - charges.sum()) / molecule.n_particles
        molecule.partial_charges = unit.Quantity(charges, unit.elementary_charge)
        assert (molecule.partial_charges is not None) and not np.all(molecule.partial_charges / unit.elementary_charge == 0)
        # Add the molecule
        generator.add_molecules(molecule)
        # Create the System
        from openmm.app import NoCutoff
        openmm_topology = molecule.to_topology().to_openmm()
        system = forcefield.createSystem(openmm_topology, nonbondedMethod=NoCutoff)
>       assert self.charges_are_equal(system, molecule)
E       AssertionError: assert False
E        +  where False = <bound method TestGAFFTemplateGenerator.charges_are_equal of <test_template_generators.TestEspalomaTemplateGenerator testMethod=test_charge_from_molecules>>(<openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7efcfff1d450> >, Molecule with name 'DrugBank_2791' and SMILES '[H]c1c(c(c(c(c1C([H])([H])[H])[H])[H])S(=O)(=O)O[H])[H]')
E        +    where <bound method TestGAFFTemplateGenerator.charges_are_equal of <test_template_generators.TestEspalomaTemplateGenerator testMethod=test_charge_from_molecules>> = <test_template_generators.TestEspalomaTemplateGenerator testMethod=test_charge_from_molecules>.charges_are_equal

openmmforcefields/tests/test_template_generators.py:248: AssertionError

@yuanqing-wang: Is there a way to optionally tell espaloma to use charges from the Molecule object?
And how do I select whether it should use the GCN charges or OpenFF AM1-BCC charges?

@jchodera
Copy link
Member Author

This is waiting on a new release following the merger of choderalab/espaloma#95, where we will be able to add support for retaining the charges provided in Molecule and selecting between AM1-BCC or espaloma charges.

Hopefully, we can also get espaloma onto conda-forge.

@jchodera
Copy link
Member Author

jchodera commented Nov 14, 2021

The espaloma charge API in openmm_system_from_graph() has been updated in choderalab/espaloma#95:

    charge_method : str, optional, default='nn'
        Method to use for assigning partial charges:
        'nn' : Assign partial charges from the espaloma graph net model
        'am1-bcc' : Allow the OpenFF toolkit to assign AM1-BCC charges using default backend
        'gasteiger' : Assign Gasteiger partial charges (not recommended)
        'from-molecule' : Use partial charges provided in the original `Molecule` object

I'll update this PR to test!

@jchodera
Copy link
Member Author

@yuanqing-wang: It looks like espaloma is failing (perhaps due to the charge model?) here. Can you take a look at the exception and stack trace?

@yuanqing-wang
Copy link

yuanqing-wang commented Nov 16, 2021

@yuanqing-wang: It looks like espaloma is failing (perhaps due to the charge model?) here. Can you take a look at the exception and stack trace?

It looks like it's because your testing molecule doesn't have an improper. Let me quickly make a PR indicating that if you don't have improper we're gonna let it go.

@yuanqing-wang
Copy link

@jchodera could you rerun the tests? it should work now

@jchodera
Copy link
Member Author

Tests have been re-run! It seems I'm running into a large discrepancy between the espaloma -> ffxml -> System energy and the System energy:

=================================== FAILURES ===================================
_________________ TestEspalomaTemplateGenerator.test_energies __________________

self = <test_template_generators.TestEspalomaTemplateGenerator testMethod=test_energies>

    def test_energies(self):
        """Test potential energies match between openff-toolkit and OpenMM ForceField"""
        # DEBUG
        from openff.toolkit.topology import Molecule
        molecule = Molecule.from_smiles('C=O')
        molecule.generate_conformers(n_conformers=1)
        from openmm import unit
        molecule.conformers[0][0,0] += 0.1*unit.angstroms
        self.molecules.insert(0, molecule)
        # Test all supported SMIRNOFF force fields
        for small_molecule_forcefield in EspalomaTemplateGenerator.INSTALLED_FORCEFIELDS:
            print(f'Testing energies for {small_molecule_forcefield}...')
            # Create a generator that knows about a few molecules
            # TODO: Should the generator also load the appropriate force field files into the ForceField object?
            generator = EspalomaTemplateGenerator(molecules=self.molecules, forcefield=small_molecule_forcefield)
            # Create a ForceField
            import openmm
            openmm_forcefield = openmm.app.ForceField()
            # Register the template generator
            openmm_forcefield.registerTemplateGenerator(generator.generator)
            # Parameterize some molecules
            for molecule in self.molecules:
                # Create OpenMM System using OpenMM app
                from openmm.app import NoCutoff
                openmm_system = openmm_forcefield.createSystem(molecule.to_topology().to_openmm(), removeCMMotion=False, onbondedMethod=NoCutoff)
    
                # Retrieve System generated by Espaloma
                espaloma_system = generator.get_openmm_system(molecule)
    
                # Compare energies and forces
>               self.compare_energies(molecule, openmm_system, espaloma_system)

openmmforcefields/tests/test_template_generators.py:988: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cls = <class 'test_template_generators.TestEspalomaTemplateGenerator'>
molecule = Molecule with name '' and SMILES '[H]C(=O)[H]'
openmm_system = <openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7fda23cd3d50> >
espaloma_system = <openmm.openmm.System; proxy of <Swig Object of type 'OpenMM::System *' at 0x7fda23c1a9f0> >

    @classmethod
    def compare_energies(cls, molecule, openmm_system, espaloma_system):
        """Compare energies between Open Force Field Initiative and OpenMM ForceField objects
    
        The OpenMM System object created internally by the EspalomaTemplateGenerator is used to
        avoid any issues with stochasticity of partial charges due to conformer generation.
    
        Parameters
        ----------
        molecule : openff.toolkit.topology.Molecule
            The Molecule object to compare energy components (including positions)
        openmm_system : openmm.System
            System generated by OpenMM ForceField
        espaloma_system : openmm.System
            System generated by Espaloma engine
    
        """
    
        # Compute energies
        openmm_energy, openmm_forces = cls.compute_energy(openmm_system, molecule.conformers[0])
        espaloma_energy, espaloma_forces = cls.compute_energy(espaloma_system, molecule.conformers[0])
    
        from openmm import unit
    
        def write_xml(filename, system):
            import openmm
            with open(filename, 'w') as outfile:
                print(f'Writing {filename}...')
                outfile.write(openmm.XmlSerializer.serialize(system))
    
        # Compare energies
        ENERGY_DEVIATION_TOLERANCE = 1.0e-2 * unit.kilocalories_per_mole
        delta = (openmm_energy['total'] - espaloma_energy['total'])
        if abs(delta) > ENERGY_DEVIATION_TOLERANCE:
            # Show breakdown by components
            print('Energy components:')
            print(f"{'component':24} {'OpenMM (kcal/mol)':>20} {'Espaloma (kcal/mol)':>20}")
            for key in openmm_energy['components'].keys():
                openmm_component_energy = openmm_energy['components'][key]
                espaloma_component_energy = espaloma_energy['components'][key]
                print(f'{key:24} {(openmm_component_energy/unit.kilocalories_per_mole):20.3f} {(espaloma_component_energy/unit.kilocalories_per_mole):20.3f} kcal/mol')
            print(f'{"TOTAL":24} {(openmm_energy["total"]/unit.kilocalories_per_mole):20.3f} {(espaloma_energy["total"]/unit.kilocalories_per_mole):20.3f} kcal/mol')
            write_xml('system-espaloma.xml', espaloma_system)
            write_xml('system-openmm.xml', openmm_system)
>           raise Exception(f'Energy deviation for {molecule.to_smiles()} ({delta/unit.kilocalories_per_mole} kcal/mol) exceeds threshold ({ENERGY_DEVIATION_TOLERANCE})')
E           Exception: Energy deviation for [H]C(=O)[H] (-109.24874862885405 kcal/mol) exceeds threshold (0.01 kcal/mol)

openmmforcefields/tests/test_template_generators.py:878: Exception
------------------------------ Captured log call -------------------------------
INFO     openmmforcefields.generators.template_generators:template_generators.py:1547 Using espaloma model cached at /home/runner/.espaloma/espaloma-0.2.0.pt
INFO     openmmforcefields.generators.template_generators:template_generators.py:270 Requested to generate parameters for residue <Residue 0 () of chain 0>
INFO     openmmforcefields.generators.template_generators:template_generators.py:1597 Generating a residue template for [H]C(=O)[H] using espaloma-0.2.0
INFO     openff.toolkit.typing.engines.smirnoff.parameters:parameters.py:3121 0 torsions added
INFO     openff.toolkit.typing.engines.smirnoff.parameters:parameters.py:3382 1 impropers added, each applied in a six-fold trefoil
INFO     openff.toolkit.typing.engines.smirnoff.parameters:parameters.py:2858 3 bonds added (0 skipped due to constraints)
INFO     openff.toolkit.typing.engines.smirnoff.parameters:parameters.py:2960 3 angles added (0 skipped due to constraints)
=============================== warnings summary ===============================

I'll have to run this locally and manually inspect the differences between the serialized XML files written during the failure.

@jchodera
Copy link
Member Author

I've figured out why the energy test is failing: The improper ffxml entries we create are not being applied to the molecular system because the impropers defined by espaloma do not match the expected smirnoff 3-fold atom ordering scheme used in the improper matching code in openmm.app.ForceField. In fact, espaloma appears to depart from what SMIRNOFF does in two ways:

  1. It defines a different 6-fold matching scheme
  2. It uses all periodicities 1 through 6 rather than just periodicity 1 terms like OpenFF force fields

The espaloma 6-fold scheme compared with the SMIRNOFF 3-fold scheme is illustrated here:
PXL_20211221_032333383
I would have thought the inclusion of both directions (e.g. 1023 and 1032) would cancel out, but it appears that there may be some reason (???) these do not cancel.

To remedy this, we will either have to produce a new generation of espaloma models that use the SMIRNOFF 3-fold consistent impropers or add a new improper matching mode to openmm.app.ForceField in the next release.

@davidlmobley
Copy link

Yuanqing tagged us on this. We discussed this in a call with John and the details from that must not have gotten propagated back here. I'm dealing with something on deadline, but to summarize very briefly: If you include all six permutations of the relevant atoms, impropers can only be used to force atomic centers to be planar (essentially, one set of three gives force vectors which follow the right hand rule, basically, whereas the other set follows the left-hand rule, so the only configuration of atoms where you don't have a restoring force towards "flat" is when all of the atoms are co-planar). In OpenFF we also allow impropers which are NOT attempting to ensure the atoms involved are co-planar (e.g. such as for trivalent nitrogens), so it's important to ensure you only use the three with the same handedness.

@davidlmobley
Copy link

In terms of periodicity: I don't know why you would want these to have periodicities up to 6. If the minima is away from zero, the best way to deal with this would be to move the minima of the potential away from zero (e.g. applying symmetric potentials but with minima shifted away from zero). But chemically, these should not have high periodicities since there is always only one or two minima (either at "flat" or at "somewhat puckered").

@yuanqing-wang
Copy link

If you include all six permutations of the relevant atoms, impropers can only be used to force atomic centers to be planar

Sorry I didn't quite follow this argument. If one set of parameters force atomic centers to be planar, then if you take the negative of all those parameters, they would force atomic centers to not be planar?

@yuanqing-wang
Copy link

In terms of periodicity: I don't know why you would want these to have periodicities up to 6. If the minima is away from zero, the best way to deal with this would be to move the minima of the potential away from zero

if we have infinitive periodicities, we should be able to approximate any arbitrary functions of the dihedral angles. we just picked a number that's close enough to infinity.

it would be nice if we could also optimize periodicities but that is a discrete variable and is very difficult to optimize

@davidlmobley
Copy link

Sorry I didn't quite follow this argument. If one set of parameters force atomic centers to be planar, then if you take the negative of all those parameters, they would force atomic centers to not be planar?

No. The issue arises if you TRY and force atomic centers to not be planar (as we are doing in some of our fitting experiments, as some centers SHOULD be nonplanar). The direction of the force in such a situation can be computed from the cross product of relevant vectors (per OpenMM docs) but, if the atoms are puckered up (for example) then the cross products of three of the permutations will point up, and the other three will point down. Essentially, they fight with one another. The magnitudes differ in the different directions (up vs down), and the end result is that the force becomes zero (and the center stabilizes) ONLY if it is planar.

To put it another way, it's impossible to impropers that have minima away from zero if you're applying all six permutations; it's only possible to have minima away from zero if you use only terms with the same "handedness".

if we have infinitive periodicities, we should be able to approximate any arbitrary functions of the dihedral angles. we just picked a number that's close enough to infinity.

Of course you could, yes. However, that can introduce a lot of terms. More conventionally force fields have used a SINGLE periodicity for these, though. Impropers are certainly non-periodic so it's unclear to me why you would want to use an expansion with many different periodicities. The two most important scenarios you have to deal with are:

  1. You have a single minimum at an improper angle of 0, and it is roughly harmonic around that, and
  2. You have two symmetric minima at improper angles somewhat away from zero (e.g. +/- 30 degrees) and the potential is roughly harmonic near that

A smal fraction of the time you may see a third scenario where the potential is roughly flat between +/-30 degrees.

@yuanqing-wang
Copy link

yuanqing-wang commented Feb 22, 2022

To put it another way, it's impossible to impropers that have minima away from zero if you're applying all six permutations; it's only possible to have minima away from zero if you use only terms with the same "handedness".

Sorry I still don't quite understand what's wrong with this little thought experiment of mine:

Imagine I used all six permutations. This argument says that it's impossible to have minima away from zero. So the minima has to be at zero.

Now I flip the sign of all of my parameters---zero is now a maxima. I don't know where my minima is. But it sure isn't at zero.

@jchodera
Copy link
Member Author

@davidlmobley : Thanks for the detailed points here!

I don't think there is a concern with using multiple periodicities here---they're all symmetric and periodic, and there is no reason to stick to the lowest-order (n=1) term like the openff-* force fields do.

Instead, the question is whether six-fold impropers can be used for the same class of interactions that the SMIRNOFF three-fold impropers are used (for enforcing planarity or pyramidality). Because the six-fold impropers almost cancel (and would cancel exactly if bond lengths and angles are symmetric), I'm not even sure they can even be used to enforce planarity. I think they end up adding a kind of local multibody term that espaloma may be exploiting advantageously.

@kaminow is adding the standard SMIRNOFF three-fold impropers for us to refit and compare both three-fold and six-fold models, but now you have me wondering if we may not want both: By including three-fold and six-fold impropers (with separate parameters generated for each), we could potentially have the best of both worlds. We should also try ablating impropers entirely to see how much this degrades performance.

For now, I'll proceed by adding support for six-fold impropers to OpenMM as well, so that we can deploy all of these combinations to test the real impact on binding free energies.

@jchodera jchodera added this to the 0.11.0 milestone Mar 8, 2022
@jchodera
Copy link
Member Author

jchodera commented Apr 7, 2022

@yuanqing-wang : Looks like this is finally working! I'll get this merged and then we can do some testing with perses.

@jchodera jchodera marked this pull request as ready for review April 7, 2022 15:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants