Skip to content

Releases: openforcefield/openforcefield-forcebalance

Version 1.3.0 "Parsley"

21 Oct 17:49
f689096

Choose a tag to compare

Description

This minor release contains a fix of amide-related issues; (1) a poor performance of v1.2 in reproducing amide torsional energy profiles and (2) absence of appropriate torsion parameters for dialkyl amides.

1. A poor performance of v1.2 in reproducing amide torsional energy profiles

Torsional energy profile of the amide bond of N-methyl acetamide generated using v1.2.0 failed to reproduce the QM torsional energy profile, forming a small hump at the QM minimum energy point (180 degree), which doesn’t appear in the QM profile.

drawing

Torsional energy profiles of N-methyl acetamide from v1.2.0

MM energy decoupling showed that torsion term contributed the most to the formation of the small peak, which indicated the need for the improvement of the amide torsion parameters.

From a closer inspection, we have found that the training set used in v1.2.0 fitting includes molecules whose geometries are not planar at a point where the amide bond is planar(dihedral angle = 0 or 180 degree) due to their large steric interactions or other chemical interactions, having a small peak at the point in their QM torsional energy profiles. This led the parameter set to be fitted to force the formation of a small peak at around their planar geometries even for simple molecules which don't have strong steric hindrance, like N-methyl acetamide.

So to avoid this problem, we carefully selected a training set for the amide torsion parameters with simple molecules which don’t have strong chemical interactions.

2. Absence of appropriate torsion parameters for dialkyl amides

Before the amide issue was raised, I found that there was a missing torsion term for dialkyl amides in the current parameter set. Due to the lack of appropriate torsion parameters for the dialkyl amides, it gives a low rotational barrier height, resulting in a bad reproduction of QM optimized geomeotries.

drawing

QM optimized geometry of Cc1ccc(=O)n(n1)CC(=O)N2CCCc3c2cccc3 (Magenta: MM optimized geometry with v1.2.0.)

To resolve this, we’ve decided to add new amide-specific torsion parameters(child parameters of t69 and t70). The preliminary study showed that the addition of the new torsion terms improved the performance of reproducing QM optimized geometries of dialkyl amides.

Here's the list of newly added amide-specific torsion parameters:

  • t69 [*:1]~[#7X3,#7X2-1:2]-!@[#6X3:3]~[*:4]
    • t69a [*:1]-[#7X3:2]-[#6X3$(*=[#8,#16,#7]):3]~[*:4]
  • t70 [#1:1]-[#7X3:2]-[#6X3:3]=[#8,#16,#7:4]
    • t70b [*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]
    • t70c [#1:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#6,#1]
    • t70d [*:1]-[#7X3:2]-!@[#6X3:3](=[#8,#16,#7:4])-[#7X3]


With this experience, we re-fitted the parameter set with the new amide torsion parameters to the carefully selected training set and confirmed it fixed the two issues.

Fitting Data and Method

1. Fitting Data

  • Torsiondrive target selection scheme: planar geometries (C-center improper dihedral angle, N-center improper dihedral angle between -5.0, 5.0 degree at the minimum energy point) + geometries which have local minima at 0 and 180 degree, selected from Gen 1 Roche torsiondrive dataset

    • # of selected targets: 62 1-D torsions
  • Optimized geometry, vibrational frequencies targets selection: molecules having t69s or/and t70s, selected from Gen 2 Optimization datasets

    • # of selected targets: 2347 optimized geometries, 532 vibrational frequencies targets

2. Method

Two stage fitting was carried out:
(1) fitting torsion parameters to the torsion training targets, then (2) reoptimizing bond and angle parameters to the optimized geometry, vibrational frequencies targets along with the torsion training targets used in the first stage.

First stage fitting was designed mainly to train the torsion parameters to the QM torsional energy profiles. The prior width of bond and angle parameters were intentionally set to 1/10 of the default scaling factor not to allow big perturbations during the optimization, and it used the default prior width of torsion parameters, so that it could be adjusted with more flexibility than other parameters.

The goal of the second stage fitting is to reoptimize the bond and angle parameters on top of the fitted torsion parameters in the first stage. The parameters were fittied to the QM optimized geometries, vibrational frequencies and torsional energy profiles, while keeping the torsional parameters from changing too much from the first stage by setting the prior width of torsional parameters 1/10 of the default width.

3. Fitting Result

In the first stage fitting, objective function decreased from 2.16e+02 to 6.29e+01 in 6 steps.

In the second stage fitting, objective function decreased from 1.72e+03 to 1.58e+03 in 14 steps.

Test Calculation Result

List of test sets:

  • Test set 1: N-methyl acetamide torsional energy profile;

  • Test set 2: All amide torsional energy profiles in the 2nd generation training dataset;

  • Test set 3: Optimized geometry, relative conformational energy benchmark full set;

  • Test set 4: Torsional energy profile benchmark primary set

The following table lists the single point calculation objective function values from different parameter sets for each test calculation, which give a rough measure of an overall quality of the performance.

test 1 test 2 test 3 test 4
v1.2.0 2.69 24.99 16681.14 384.88
v1.3.0 1.63 23.15 16436.44 404.51

1. Improvement in reproducing QM torsional energy profile of N-methyl acetamide

drawing

Torsional energy profiles of N-methyl acetamide

  • Clear improvement in reproducing the basis depth of QM energy surface near the minimum energy point (180 degree) over v1.2.0

2. Improvement in describing dialkyl amides

drawing

Torsional energy profiles of a dialkyl amide, C[N:3]([CH3:4])[C:2](=O)[C:1]1(CC1)c2ccccc2

drawing

QM optimized geometry of Cc1ccc(=O)n(n1)CC(=O)N2CCCc3c2cccc3 (Magenta: MM optimized geometry with v1.2.0. Green: MM optimized geometry with v1.3.0)

  • It also shows a better reproduction of rotational barriers and optimized geometries of dialkyl amides compared to v1.2.0.

Note

v1.3.0 inherited the manual change made in v1.2.1 to resolve an issue with propyne substituents.

Version 1.2.0 "Parsley" Update

29 May 18:45
67e0010

Choose a tag to compare

Description

We are pleased to announce the release of Parsley version 1.2.0! The major change we made in the new release is a new quantum chemical dataset, which is utilized in training valence parameters in the force field. Parsley 1.2.0 has been fitted to the new, carefully designed quantum chemical dataset and the successive benchmark results showed significantly improved performance in reproducing QM optimized geometries especially for certain phosphonate-containing molecules and for exocyclic divalent and trivalent nitrogen-containing molecules over Parsley 1.1.0.

Fitting Data and Results

  • Fitting targets: 2nd generation training sets (For the details of the training set generation scheme: Training dataset selection from OFF May virtual meeting 2020.

  • Input force field : SMIRNOFF99Frosst w/ some parameter modification (same initial guess used in Parsley 1.1.0 fitting)

  • The objective function decreased from 3.619e+04 to 6.877e+03 in 57 steps. After the optimization, additional Removal of redundancy in t108 SMIRKS pattern was done: [#6X3:1]-@[#16X2,#16X1-1,#16X3+1:2]-@[#6X3,#7X2;r5:3]=@[#6,#7;r5:4][#6X3:1]-@[#16X2,#16X3+1:2]-@[#6X3,#7X2;r5:3]=@[#6,#7;r5:4]

Note: forcebalance v1.7.1 and openforcefield v0.6.0 were used for the target generation and the optimization.

Benchmark Results

For the calculation, full benchmark set was used (25168 optimized geometries, relative energies of 2005 molecules). Details of the test set can be found Here.

Two types of benchmarks were done to compare the performances: (1) QM vs MM optimized geometries and (2) the relative energies between conformers at “QM optimized geometries”. The final objective function value(X2) from FB single point calculation gives a brief overview of the agreement between QM and MM. The lower X2 is, the better the force field reproduces QM structures and energetics.

(1) Reproduction of QM optimized geometries

To investigate the improved performance in reproducing QM optimized geometries, the residual of each molecule, a sum of squared weighted residual of internal coordinates was calculated and plotted.

drawing Equation of residual (demon. for bond, angle, improper torsion are set to be 0.05 Angstrom, 8 degrees and 20 degrees respectively.)

drawing scatter plot of residual from `v1.2.0` - residual from `v1.1.0`

Each y value in the plot (delta residual) is residual from v1.2.0 - residual from v1.1.0. Negative y value means residual from v1.2.0 is lower than that from v1.1.0, indicating an improved performance over v1.1.0. The mean of residual difference is -1.364.

Especially, molecules having deprotonated phosphonate group(RP(=O)(OH)(O^-)) showed significant improvement in v1.2.0, fixing a known issue with protonated phosphorus-connected oxygens which has plagued AMBER-family force fields.

drawing

QM optimized geometry of CC(O)(P@@(O)[O-])P@(O)[O-]. ( transparent orange: MM optimized geometry with v1.1.0, transparent green: v1.2.0)

Here’s one of molecules which has two deprotonated phosphonate groups. In the optimized geometry from the v1.1.0(transparent orange in the figure), the hydroxyl hydrogen is located much closer to the negatively charged oxygen (1.10 Å) than in the QM optimized geometry (> 2.4 Å), forming an overly strong intramolecular hydrogen bond. The v1.2.0 corrects this error and gives a closer agreement with QM optimized geometry by having a larger equilibrium angle for phosphorus-centered angle term(a38: [*:1]~[#15:2]~[*:3]).

drawing QM optimized geometry of CC1(O[C@@h]2CO[C@@]3([C@H]([C@@h]2O1)OC(O3)(C)C)COS(=O)(=O)N)C. ( transparent orange in the left figure: MM optimized geometry with v1.1.0, transparent green in the right figure: MM optimized geometry with v1.2.0))

For some conformers of CC1(O[C@@H]2CO[C@@]3([C@H]([C@@H]2O1)OC(O3)(C)C)COS(=O)(=O)N)C, v1.1.0 introduced a weird spatial orientation of covalent bonds to tetravalent sulfur, resulting in a much smaller angle for N-S-O(63.3 degree) than in QM optimized geometry (97.0 degree). This issue is fixed in v1.2.0.

(2) Reproduction of QM relative energies between conformers at QM optimized geometries

To investigate the improved performance of the new parameter set in reproducing QM relative energies between conformers, QM vs MM relative energies between conformers “at QM optimized geometries” were calculated. Two different ways to calculate MM relative energies were used. For the left figure, MM relative energies were calculated by taking a difference between MM energy at each point and MM energy at the QM minimum. And for the right figure, MM relative energies were obtained by taking a difference between MM energy at each point and MM energy at the MM minimum. Both distribution plots show lower MAD(mean absolute deviation) compared to v1.1.0, indicating a slight, but not notable improvement of the new parameter set in reproducing QM energetics.

drawing QM vs MM relative energies between conformers at QM optimized geometries.

Benchmark analysis done by Victoria Lim shows that the v1.1.0 fails to describe (1) molecules having `#7X2-#7X3` (single bond between divalent nitrogen and trivalent nitrogen); (2) azetidines; and (3) octahydrotetracenes. For further details of the benchmark analysis, please see [here](http://doi.org/10.5281/zenodo.3774680)

We speculated that the poor performances on the chemical moieties are due to the poor chemical coverage of the training set used in Parsley fitting. In the previous release, periodicities for some N-N rotations were modified to properly describe N-N bonds which are under conjugation effect and the change included the modification of components t128. However since there were no exocyclic #7X2-#7X3 rotations in the training set, a proper fitting of t128 couldn’t be carried out, resulting in unphysical optimized k values.

drawing ddE w.r.t. RMSD and ddE w.r.t. TFD of divalent and trivalent nitrogen-containing molecules

Our new training set for this release now includes exocyclic rotations about #7X2-#7X3 and the new parameter set fitted to this new set shows clear improvement in lowering TFD and RMSD which indicates it is improved in reproducing QM optimized geometries while showing slight sacrifice in ddE.

drawing QM optimized geometry of Cc1c(sc(n1)N/N=C/c2ccccn2)C. ( transparent orange: MM optimized geometry with v1.1.0, transparent green: MM optimized geometry with v1.2.0)

One example of molecules with exocyclic #7X2-#7X3. v1.2.0 could successfully reproduce planar structure of Cc1c(sc(n1)N/N=C/c2ccccn2)C.

Version 1.1.0 "Parsley" Update

23 Jan 06:50
67e0010

Choose a tag to compare

Description

This is version 1.1.0 Parsley.

This release contains results of valence parameter re-fitting, with small modifications in the input force field. result.offxml in Assets is the resulting force field file,version 1.1.0 Parsley from the re-fitting. The modifications include the following;

(1) Addition of three new bond and angle terms, a22a, b14a and b36a;

a22a, b14a and b36a were added to effectively consider conjugation effect.

Separating a22a from a22 was intoduced to effectively describe conjugation effect of N=C=S, which makes the equilibrium angles of a22a relatively larger than a22.

		<Angle smirks="[*:1]~[#7X2+0:2]~[*:3]" angle="120.0 * degree" k="140.0 * mole**-1 * radian**-2 * kilocalorie" id="a22" parameterize="k,angle"/>
                <Angle smirks="[*:1]~[#7X2+0:2]~[#6X2:3](~[#16X1])" angle="120.0 * degree" k="140.0 * mole**-1 * radian**-2 * kilocalorie" id="a22a" parameterize="k,angle"/>

Separating b14a from b14 was intoduced to effectively describe resonance of single bonds between sp2 C and O with negative 1 charge.

		<Bond smirks="[#6:1]-[#8:2]" length="1.41 * angstrom" k="640.0 * angstrom**-2 * mole**-1 * kilocalorie" id="b14" parameterize="k,length"/>
                <Bond smirks="[#6X3:1]-[#8X1-1:2]" length="1.41 * angstrom" k="640.0 * angstrom**-2 * mole**-1 * kilocalorie" id="b14a" parameterize="k,length"/>

Separating b36a from b36 was intoduced to effectively describe resonance of double bonds between N with positive 1 charge and N with negative 1 charge.

		<Bond smirks="[#7:1]=[#7:2]" length="1.3 * angstrom" k="680.0 * angstrom**-2 * mole**-1 * kilocalorie" id="b36" parameterize="k,length"/>
                <Bond smirks="[#7+1:1]=[#7-1:2]" length="1.27 * angstrom" k="760.0 * angstrom**-2 * mole**-1 * kilocalorie" id="b36a" parameterize="k,length"/>

(2) Modification of periodicity for N-N rotations;

Periodicity of t128 has been changed to be represented by twofold and threefold periodicity, while t129, t130 and t131 has been changed to have twofold periodicity.

        <Proper smirks="[*:1]~[#7X2:2]-[#7X3:3]~[*:4]" periodicity1="3" periodicity2="2" phase1="0.0 * degree" phase2="180.0 * degree" k1="0.625 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t128" idivf1="1.0" idivf2="1.0" parameterize="k1,k2"/>
		<Proper smirks="[*:1]=[#7X2:2]-[#7X2:3]=[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.4 * mole**-1 * kilocalorie" id="t129" idivf1="1.0" parameterize="k1"/>
		<Proper smirks="[*:1]~[#7X2:2]=,:[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="6.0 * mole**-1 * kilocalorie" id="t130" idivf1="1.0" parameterize="k1"/>
		<Proper smirks="[*:1]~[#7X3+1:2]=,:[#7X2:3]~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="3.0 * mole**-1 * kilocalorie" id="t131" idivf1="1.0"  parameterize="k1"/>

(3) Addition of missing improper torsions (i2a, i3a, i3b, i5a) with the associated torsions (t51a,t51b, t51c).

New torsion t51a was added to describe sp3 C connected to trivalent N which is connected to N=C and t51b was added for sp3 C connected to trivalent N which is connected to N=N or N=O. For sp3 C and trivalent N bond connected to 5-membered hetero-aromatic ring, the torsion was separately assigned to t51c.

		<Proper smirks="[*:1]-[#6X4:2]-[#7X3:3]-[*:4]" periodicity1="3" phase1="0.0 * degree" k1="0.3 * mole**-1 * kilocalorie" id="t51" idivf1="1.0" parameterize="k1"/>	        
		        <Proper smirks="[*:1]-[#6X4:2]-[#7X3:3]-[#7X2:4]=[#6]"  periodicity1="3" periodicity2="2" phase1="0.0 * degree" phase2="180.0 * degree" k1="0.3 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t51a" idivf1="1.0" idivf2="1.0" parameterize="k1,k2"/>
                <Proper smirks="[#1:1]-[#6X4:2]-[#7X3:3]-[#7X2:4]=[#6]" periodicity1="3" periodicity2="2" phase1="0.0 * degree" phase2="180.0 * degree" k1="0.3 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t51ah" idivf1="1.0" idivf2="1.0" parameterize="k1,k2"/>
                <Proper smirks="[*:1]-[#6X4:2]-[#7X3:3]-[#7X2:4]=[#7X2,#8X1]" periodicity1="3" periodicity2="2" phase1="0.0 * degree" phase2="180.0 * degree" k1="0.3 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t51b" idivf1="1.0" idivf2="1.0" parameterize="k1,k2"/>
                <Proper smirks="[#1:1]-[#6X4:2]-[#7X3:3]-[#7X2:4]=[#7X2,#8X1]" periodicity1="3" periodicity2="2" phase1="0.0 * degree" phase2="180.0 * degree" k1="0.3 * mole**-1 * kilocalorie" k2="0.0 * mole**-1 * kilocalorie" id="t51bh" idivf1="1.0" idivf2="1.0" parameterize="k1,k2"/>
                <Proper smirks="[*:1]-[#6X4:2]-[#7X3$(*@1-[*]=,:[*][*]=,:[*]@1):3]-[*:4]" periodicity1="2" phase1="180.0 * degree" k1="2.0 * mole**-1 * kilocalorie" id="t51c" idivf1="1.0" parameterize="k1"/>
                <Proper smirks="[#1:1]-[#6X4:2]-[#7X3$(*@1-[*]=,:[*][*]=,:[*]@1):3]-[*:4]" periodicity1="2" phase1="180.0 * degree" k1="2.0 * mole**-1 * kilocalorie" id="t51ch" idivf1="1.0" parameterize="k1"/>	

There was an addition of new improper torsions as well. i2a was newly added to describe trivalent N centers connected to pi-bond forming S or P. i3a was added for trivalent N centers connected to a pi-bond-forming N and i3b was separately defined for trivalent N centers inside 5-membered hetero-aromatic ring.

                <Improper smirks="[*:1]~[#7X3$(*~[#15,#16](!-[*])):2](~[*:3])~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.1 * mole**-1 * kilocalorie" id="i2a" />

                <Improper smirks="[*:1]~[#7X3$(*~[#7X2]):2](~[*:3])~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="1.1 * mole**-1 * kilocalorie" id="i3a" />
                <Improper smirks="[*:1]~[#7X3$(*@1-[*]=,:[*][*]=,:[*]@1):2](~[*:3])~[*:4]" periodicity1="2" phase1="180.0 * degree" k1="10.5 * mole**-1 * kilocalorie" id="i3b" />


(4) t19 removal of redundant periodicity.
A redundant threefold periodicity component of t19 has been removed.

		<Proper smirks="[#1:1]-[#6X4:2]-[#6X3:3]=[#6X3:4]" periodicity1="3" periodicity2="1" phase1="180.0 * degree" phase2="0.0 * degree" k1="0.38 * mole**-1 * kilocalorie" k2="1.15 * mole**-1 * kilocalorie" id="t19" idivf1="1.0" idivf2="1.0"  parameterize="k1,k2"/>

Fitting Data

  • Roche set: 686 1-D torsion profiles; 936 optimized geometries; 669 vibrational frequencies
  • Coverage set: 412 1-D torsion profiles; 842 optimized geometries; 265 vibrational frequencies
  • outside of the two sets: 19 1-D torsion profiles

Results

The objective function decreased from 2.63784e+04 to 4.63632e+03 in 18 steps.

1. Fitting Results

1-1. Addition of three new bond and angle terms, a22a, b14a and b36a

The plots below shows the result of addition of the subgroup a22a .The optimized equilibrium angle values for a22 and a22a are quite different (118 and 143 degree respectively) which validates the sub-grouping.

drawing

Same results were observed in b14 and b36 cases. The final equilibrium bond values are quite different between supersets and their subsets. (Bad initial guesses on b14a and b36a are suspected to be the major factor of slightly off fitting results.)

1-2. Modification of periodicity for N-N rotations

While inspecting the benchmark result of Parsley, significantly large RMSD of MM optimized geometry from QM optimized geometry was found in the molecule Cc1ccc(cc1)N/N=N/c2ccccn2 and the main contribution of the high RMSD value came from dihedral angles. t130 (SMIRKS: [*:1]~[#7X2:2]=,:[#7X2:3]~[*:4]) assigned to 10-13-14-15 torsion was found to have its periodicity 1 and the issue was resolved by simply modifying the periodicity to 2. Also it was confirmed that the modification of the periodicity led to a slight decrease in the final objective value in valence parameter fitting.

drawing

1-3. Addition of missing improper torsions (i2a, i3a, i3b, i5a) with the associated torsions (t51a,t51b, t51c)

We found that Parsley didn't have an improper torsion for the trivalent nitrogen center in tetrazole. But simply Adding the missing improper terms (trivalent nitrogen centers connected to pi-bond-forming atom) couldn't make the nitrogen center planar even with unphysically large k value due to the contribution from torsions with inappropriate periodicity. The torsions assigned to the bond between the trivalent nitrogen and its attached sp3 carbon were t51, whose periodicity is set to 3. Threefold periodicity is reasonable when both ends of central bond are threefold axis. However, when the nitrogen in the center bond is conjugated, forming the nitrogen end twofold axis, like the trivalent nitrogen in tetrazole, threefold periodicity is not able to describe the planar nitrogen center successfully. In other words, the t51 term in the previous force field needed to be seperated into subgroups.

Note: since 5-membered hetero aromaticity is pretty common and one of the important functional groups in drug molecules, we separately defined proper and improper torsions for the trivalent nitrogens inside 5-membered heteroaromatics.

drawing

2. Benchmark Results

2-1. Benchmark Primary Set

Objective value of single point calculation on Primary set for v1.1.0 is 936.17, which is lower than v1.0.0 (948.01) and mu...

Read more

OpenFF "parsley" 1.0.0-RC2: Fitting Valence with QM Data

15 Oct 23:21

Choose a tag to compare

Description

The content of this release is the same as v0.0.9 (the stage-1 fitting of the previous Release 1.0 RC1). link

We decided to use this version as it has a better performance in QM fitting and benchmarks.

The benchmarks on the physical properties also suggested the use of the original LJ parameters.

Please refer to our blog post for details.

Refit of Select VdW Parameters

15 Oct 23:15

Choose a tag to compare

Pre-release

Note

This release was previously created by @SimonBoothroyd and I just ported it from https://github.com/j-wags/release-1-fitting/releases/tag/v0.0.10 to ensure consistency.

Description

This release contains results for re-fitting a select number of Lennard-Jones VdW parameters against a set of experimental density and heat of vaporization data points. The starting point for this fit is the v0.0.9 release.

OpenFF "parsley" 1.0.0-RC1: Refitting Valence with Fitted LJ parameters

30 Aug 02:05

Choose a tag to compare

Description

This is the version 1 release candidate of optimized OpenFF parameters, version number 1.0.0-RC1, code name parsley.

This release includes the refitting of the valence parameters, with the new set of LJ parameters that was fitted to thermodynamic properties with PropertyEstimator here: https://github.com/openforcefield/release-1-fitting/releases/tag/v0.0.10

Method

This release is the final stage of a three-stage fitting strategy.

  • In the first stage, we fit the valence parameters to QM data. This is done in release 0.0.9

  • In the second stage, we performed fitting on selected LJ parameters, against thermodynamic properties (density, heat of vaporization). This is done in [release 0.0.10] (https://github.com/openforcefield/release-1-fitting/releases/tag/v0.0.10)

  • In the final stage, we re-fit the valence parameters with the new LJ parameters from the 2nd stage.
    Note: In ForceBalance, we prepared the forcefield file with the same initial valence parameters as the 1st-stage fitting, together with the fitted LJ parameters from the 2nd-stage fitting. Then we used the load_mvals feature to apply the fitted valence parameters from the 1st-stage. This setting allows us to preserve the "Regularization" term, which is the restrain on parameters that presents overfitting.

Fitting Data

The third stage uses the same QM data as the first stage, and the valence parameters been fitted is the same as the first stage.

Roche set: 669 1-D torsion profiles; 936 optimized geometries; 660 vibrational frequencies;
Coverage set: 417 1-D torsion profiles; 831 optimized geometries; 235 vibrational frequencies;

The parameter coverage is 481/500.

Results

Start fitting with replaced LJ parameters

The fitting started at an objective value of 5198, which is 676 or 15% higher than the final objective of the first stage (4522). This is because the new LJ parameters cause the agreement to QM become worse. Most of the optgeo targets have higher objective values, and a few torsion profile targets have a sizable increase in objective value. This indicates that the LJ parameters have a non-negligible impact on the valence fitting.

One example of such torsion profile is shown below. Left is from the last step of the 1st-stage fitting, and right side shows the profile after the new LJ parameter is applied (1st iteration from the 3rd-stage fitting)
image
Here is the structure of this molecule:
image (4)

The entry label of the torsion profile is c1cc[c:1](cc1)[C:2](=O)[NH:3][c:4]2ccccc2 of the OpenFF Group1 Torsions (Roche torsion) dataset. Note: This example torsion profile was able to reach a similar agreement to the QM data after the 3rd-stage fitting.

Fitting Results

The final objective value is 4893, still higher than the objective value in stage-1. This indicates that the This Further investigations and benchmarks may reveal more insights for this.

The parameters have relatively small changes in this 3rd-stage. The k for angle a13 become higher, and a21 become lower. A few torsion ks changed more, but most of them are similar.

Note: Since the "initial parameters" in the forcefield is the same as the 1st-stage (and we used the load_mvals feature to apply the fitted valence parameters in the 1st iteration), the "parameter_change" script will show the "combined changes" of both 1st stage and 3rd stage. I manually modified the output file to plot the change of "just 3rd stage", and moved the "combined change" plots to a subfolder.

Known issues

parameter a38 matching to all 6 angle in tetrahedral structure

The fitting changed the angle parameter of a38 by a lot, however the agreement of optimized angle barely improved, which is shown in this plot:
image

This means changing this parameter does not really affect the corresponding optimized angle. After looking into this, I found it's because all 6 angles contained in a tetrahedral structure all matches to this angle, so it's effectively not a "real" degree of freedom.

One example molecule that utilizes this angle, C(N)[P@H](=O)O (molecule_id 1458128)
The highlighted tetrahedral structure contains six [X-P-X] angles that all matches to this parameter.

image

To fix this, we may need to add new SMIRKs terms that are more specific to different angles, or we need to freeze the angle of a38 at 109.47 degree.

Notes

  • In an earlier fitting attempt, I tried to provide the fitted parameters from 1st stage directly in the force field (without using load_mvals). This effectively "mutes" the penalty function, and allows the parameters to change even more. Since this setting have a higher risk of overfitting, we do not use that result.

To-do

Future fitting attempts will have these items:

  1. Try fitting with original initial valence parameters without "load_mvals". This setting is effectively just manually replaced the LJ parameters in the original forcefield and run a new fitting. The result of this test could be interesting that we can compare the final objective to the 1st stage here and get an idea of "which set of LJ parameters makes the valence fitting easier", and "Does the 3rd stage end up with a higher objective than 1st stage because of a barrier in the parameter space?"

  2. Improve SMIRKs terms in force field and re-run fitting.

  3. Improve vibrational frequencies fitting. We tried to use "overlap" method to reassign vibrational modes in fitting, but it leads to a worse fitting quality, so we did not use that in the current release. "Internal coordinate hessian fitting" might be the way to go.

Fitting with Screened Torsion Profile (skip hydrogen bonds)

22 Aug 19:48

Choose a tag to compare

Description

This release contains results for fitting with screened torsion profiles, based on hydrogen bonds. The result is highly based on work from @hyejang.

In the current version of the force field, the charge parameters are determined by AM1BCC calculations for each molecule. Since it is not guaranteed that these charge parameters can correctly reproduce strong electrostatic interactions, especially hydrogen bonds, we determined that the torsion profiles with hydrogen bonds should not be included in the fitting.

To detect the hydrogen bonds, we tried two approaches. First is to use the energies of electrostatic interactions. We tried to look at the absolute energies and relative energies along the torsion profile, and we found that this approach is able to find out most of the hydrogen-bonding systems, however it may also have false positive detections on peptides and other molecules with larger charges. Then we tried the geometry-based method, namely Baker-Hubbard (𝜃>120 and 𝑟<2.5A) and found that it gives a good result. Therefore we went for the second approach. (mdtraj package is used for the baker_hubbard function)

For implementation details, see PR #2. @hyejang has also put together a detailed document on the investigations about detecting hydrogen bonds, which is attached in this release.

Fitting Targets

This release contains the same fitting targets for optgeo and vibfreq targets as previous. The torsion profile targets are re-generated.

Roche set: 669 1-D torsion profiles; 936 optimized geometries; 660 vibrational frequencies;
Coverage set: 417 1-D torsion profiles; 831 optimized geometries; 235 vibrational frequencies;

The number of torsion profiles for Roche set decreased from 798 to 669, indicating that about 130 torsion profiles are filtered out because of hydrogen bonds.
The number of torsion profiles for the "Coverage Set" increased from 378 to 417. This is because some tasks previously in an "error" state got fixed on the server.

The parameter coverage is 481/500.

Results

The objective function decreased from 25708 to 4522. The convergence behavior is similar to previous releases, and the relative amount of decrease is a little higher than previous release, indicating that the screened torsion profiles overall are fitted better.

The fitting results for the bond and angle parameters are generally consistent with previous releases. The torsion ks have some differences, such as k1 of t127 and t155 now become negative.

Notes

In this release, the scripts are updated to be compatible with the new qcportal 0.9 and qcelemental 0.6.1.

Fitting with Added Parameter `a34a`

20 Aug 17:45

Choose a tag to compare

Description

We added a new term, a34a, to deal with certain sulfurs with two connections. Particularly, CCc1ccc(cc1)N=S=O (id "1468317") was found to utilize term a34 [*:1]=[#16X2:2]=[*:3], which has an 180 degree equilibrium angle. However, the geometry for this molecule is not linear, and many R=S=O angles also seem to not be linear . Therefore, we introduced a new term a34a, with SMIRKS [*:1]=[#16X2:2]=[#8:3] and initial equilibrium angle of 120 degrees, to handle this case and other related cases while preserving the existing generic.

The new term added in the input force field is:

<Angle smirks="[*:1]=[#16X2:2]=[#8:3]" angle="120.0 * degree" k="140.0 * mole**-1 * radian**-2 * kilocalorie" id="a34a" parameterize="k,angle"/>

As a result, the original term a34 has equilibrium angle frozen together with a16, a21, a26 described in last release 0.0.8.2.

Fitting Targets

The fitting targets are the same as release v0.0.8

Roche set: 798 1-D torsion profiles; 936 optimized geometries; 660 vibrational frequencies;
Coverage set: 378 1-D torsion profiles; 831 optimized geometries; 235 vibrational frequencies;

The new parameter coverage is 481/500.

Results

The objective function decreased from 25738 to 4769 in 12 steps. The slight difference to last release is because the new term a34a have a different initial angle value of 140 instead of 180 of a34.

The final equilibrium angle of a34a and the minimized geometry both show a better agreement with the QM value:

image

Fitting with Frozen Designed Linear Angles

20 Aug 16:46

Choose a tag to compare

Description

This release contains results for fitting which several angles designed to be linear are kept frozen at 180 degree.

Background

When investigating on the ForceBalance fitting result, I noticed that the final total gradient magnitude with respect to all parameters is still quite large, such as 662 in release 0.0.8.1. Although this total gradient is much smaller than the initial total gradient 25580, we still want to know what is preventing the final gradient to further decrease.

After some investigation, I found that the gradients for several parameters are relatively large and do not get reduced well in the last few iterations of fitting. These parameters are the angle of a16, a21, a26, which all have initial value of 180 degree, and fluctuate a bit higher like 180.2.

After discussing with Lee-Ping, we think the angles fluctuating around 180 degree with a high gradient is indicating that the optimizer may want to move the angle slightly above 180 but by definition an angle higher than 180 should not exist, so the fitting may get some weird behavior.

Since these angles are designed to be linear, the best solution is to freeze these angles in fitting.

Results

After freezing the three linear angles a16, a21, a26, the fitting now reaches a much lower total gradient of 163, and the total objective value 4751 is a little lower than previous release 4793.

The fact that this fitting have less parameters but reached a lower objective value, indicates that the freezing of the three linear angles is a successful change.

New validation tools

As discussed in the release-1 meeting on Aug. 15th, C. Baley suggested that the release should come with a dataset for validating future changes. The first dataset would be the single point energies of molecules in the "coverage set".

The implementation is done in two parts.

In part 1, I implemented a script to generate the validation dataset, this includes:

  1. Read the molecules and their QM minimized geometry from the fitting targets.
  2. Load molecule into openforcefield toolkit, compute the partial charges using AM1BCC method.
  3. Evaluate the MM energy using the result forcefield, at the QM minimized geometry.
  4. Minimize the geometry in MM and evaluate the MM energy again.
    After these steps, we store the partial charges, QM minimized geometry, MM minimized geometry, MM potential energy at each geometry.

In part 2, I implemented a script to use the data generated in step 1 to perform a validation. If we want to validate the single point energies of a new force field file "test.offxml", we can run

python run_validate_single_point.py test.offxml

In the validate_single_point/ folder. While running, this script will print a table such as

*** Single point benchmark on QM and MM minimized geometries ***
- Number of molecules: 831
- Test forcefield : test.offxml
- Start Time      : Mon Aug 19 15:07:55 2019
- Unit            : kcal/mol 

    # | Molecule ID     |                   QM Geometry                   |                   MM Geometry                  
      |                 |       E_release          E_test            Diff |       E_release          E_test            Diff
    0 | 1456192         |           0.295           0.295           0.000 |           0.295           0.295           0.000
    1 | 1456193         |          80.510          79.901          -0.609 |          73.103          72.293          -0.810
    2 | 1456206         |         168.696         169.197           0.501 |         158.466         158.742           0.276
    3 | 1456220         |          79.705          78.994          -0.711 |          72.013          71.065          -0.948
    4 | 1456236         |          80.089          78.095          -1.993 |          73.364          71.763          -1.602
    5 | 1456252         |         168.333         168.830           0.497 |         158.633         158.952           0.320
    6 | 1456266         |          79.322          78.484          -0.838 |          71.902          70.999          -0.903
    7 | 1456283         |          61.569          61.715           0.145 |          53.340          53.487           0.147
    8 | 1456291         |         -37.159         -36.370           0.789 |         -39.570         -38.736           0.834

which E_release is from the result forcefield of this release, and E_test is from test.offxml.

Notes

1. a34 is linear but not frozen in this release

In the input forcefield file, there are a total of 4 linear angles, including the a16, a21, a26 that is frozen here, and also the a34 parameter. I found that the angle of a34 changed a lot from 180 degree to about 147 degree in previous fittings, and identified molecules like CCc1ccc(cc1)N=S=O having the non-linear N=S=O matching to a34. This indicates that the term a34 may be too general and covered the unwanted chemistry, so we will fix this by adding new terms in the next release.

  1. Starting from this release, the result forcefield file will be copied to result.offxml in the release folder for convenience.

Fitting with more robust MM geometry optimizer

15 Aug 07:28

Choose a tag to compare

Description

This release contains results for fitting with most robust MM geometry optimization.

Background

  • In ForceBalance, the geometry optimization is carried out by calling the OpenMM simulation.minimizeEnergy() function. This function support two parameters tolerance and maxIterations.
  • In an early approach, I found that when calling this function with tolerance=1e-4 kJ/mol but no maxIterations, the OpenMM engine has a small chance to "freeze", which it stopped responding to any signal until the process is being killed.
  • Then maxIterations is added with a default value of 10000. However, the simulation.minimizeEnergy() function does not have any return value to indicate if the minimization converged or not.
  • I found that in practice, even it converged to a tolerance like 1e-8, it is still possible that re-running the simulation.minimizeEnergy() will cause the energy to decrease more than 1e-6.
  • The "hidden not converged" and "converged but still change" cases described above become a potential source of stochastic error in the fitting. It has been observed that some times re-running a fitting can lead to slightly different results.
  • The "silent failure" of the minimization could lead to issues in fitting. For optgeo targets, because it starts MM minimization at QM geometry, early failure may result in better agreement. Although it's not observed directly, I think it's risky that the optimizer may favor parameters that lead to such early failure. Another issue is the vibrational frequency targets, which are sensitive to the MM energy minimization and rely on the MM gradients.

Fix Implementation

The fix is implemented in ForceBalance to retry the minimization 10 times and check the energy decrease to make sure the geometry optimization really converges.
Detailed implementation can be found here: leeping/forcebalance#156

Fitting Targets

The fitting uses exactly the same targets and settings.

Fitting Results

The fix lead to a small but not negligible difference in the fitting. The objective function decreased from 25753 to 4793 in 14 iterations. In the previous fitting release 0.0.8, it decreased from 25753 to 4737.

Most of the parameters are fitted to the same value. A few parameters undergoes different changes, one example is b28/length:
image
Left: release-0.0.8; Right: release-0.0.8.1

Conclusion

It is believed that the new MM minimization is more robust and will be used in future fittings.