Orientational Potentials Extracted From Protein Structures Improve Native Fold Recognition
Orientational Potentials Extracted From Protein Structures Improve Native Fold Recognition
Abstract
We develop coarse-grained, distance- and orientation-dependent statistical potentials from the growing
protein structural databases. For protein structural classes (␣, , and ␣/), a substantial number of back-
bone–backbone and backbone–side-chain contacts stabilize the native folds. By taking into account the
importance of backbone interactions with a virtual backbone interaction center as the 21st anisotropic site,
we construct a 21 × 21 interaction scheme. The new potentials are studied using spherical harmonics
analysis (SHA) and a smooth, continuous version is constructed using spherical harmonic synthesis (SHS).
Our approach has the following advantages: (1) The smooth, continuous form of the resulting potentials is
more realistic and presents significant advantages for computational simulations, and (2) with SHS, the
potential values can be computed efficiently for arbitrary coordinates, requiring only the knowledge of a few
spherical harmonic coefficients. The performance of the new orientation-dependent potentials was tested
using a standard database of decoy structures. The results show that the ability of the new orientation-
dependent potentials to recognize native protein folds from a set of decoy structures is strongly enhanced
by the inclusion of anisotropic backbone interaction centers. The anisotropic potentials can be used to
develop realistic coarse-grained simulations of proteins, with direct applications to protein design, folding,
and aggregation.
Keywords: side-chain packing; statistical coarse-grained potentials; harmonic analysis; Boltzmann device;
protein-fold recognition
Development of low-resolution models for proteins is es- obtaining residue–residue interaction potentials (Miyazawa
sential in protein structure prediction and protein design. To and Jernigan 1985, 1999b; Godzik et al. 1995; Bahar and
achieve this goal, we need an effective interaction potential. Jernigan 1996; Lee et al. 1999). Tanaka and Scheraga
Starting with the seminal work of Tanaka and Scheraga (1976) proposed that the frequencies of side-chain pairing
(1976), there has been a growing interest in obtaining rea- can be used to determine potential interaction parameters.
sonably accurate force fields. The accelerated growth of Since then, with the exception of a few studies (Sippl 1990),
structural data on thousands of proteins in the Protein Data most of the knowledge-based potentials have been obtained
Bank (PDB; Berman et al. 2000) has been a source for solely in terms of residue–residue contacts (Skolnick et al.
1997, 2000; Miyazawa and Jernigan 1999b; Tobi et al. 2000).
Reprint requests to: John E. Straub, Department of Chemistry, Boston Sippl (1990) used a statistical method, known as the Boltz-
University, Boston, MA 02215, USA; e-mail: [email protected]; fax: (617) mann device, to obtain the distance-dependent mean force
353-6466; or Devarajan Thirumalai, Institute for Physical Science and potentials. This method relies on the assumption that the
Technology, University of Maryland, College Park, MD 20742, USA;
e-mail: [email protected]; fax: (301) 314-9404. known X-ray or NMR-resolved protein structures represent
3
Present address: Laboratory of Chemical Physics, National Institute of classical equilibrium states and, therefore, the distribution
Diabetes and Digestive and Kidney Diseases, National Institutes of Health of distances between two side chains should correspond to
(NIH), Bethesda, MD 20892-0520, USA.
Article and publication are at http://www.proteinscience.org/cgi/doi/ the equilibrium Boltzmann distribution. Other structural pa-
10.1110/ps.03488704. rameters, such as dihedral angles, can also be used in this
862 Protein Science (2004), 13:862–874. Published by Cold Spring Harbor Laboratory Press. Copyright © 2004 The Protein Society
Orientational statistical potentials for proteins
treatment. The statistical potentials developed using this ap- assess the effectiveness of these potentials in recognizing
proach and other methods (Tobi et al. 2000; Tobi and Elber the native folded states. The results show that the newly
2000; Meller et al. 2002) have also focused only on extract- derived orientation-dependent potentials for side chains and
ing and analyzing probability density functions dependent the protein backbone are successful in a vast majority of
solely on distance. cases.
Previous studies have shown that the relative orientation
and packing of side chains in proteins is an important de-
Results
terminant of the local (secondary structure) geometry as
well as three-dimensional (tertiary structure) topology (Ba-
The importance of side chain–backbone and
har and Jernigan 1996; Bagci et al. 2002a,b). By analyzing
backbone–backbone interactions
various families of protein structures, we had previously
shown that certain orientational order parameters are promi- A major improvement of the potentials that are constructed,
nent (Buchete et al. 2003). From a quantitative analysis of presented, and tested in this study is due to the inclusion of
the statistical data on orientational distributions of side a virtual interaction center (Pep) located on the backbone in
chains extracted from PDB structures, we determined ori- the middle of the peptide link (see Materials and Methods).
entation-dependent interactions. This approach is supported In our previous study, we considered the inter-residue in-
by the results of Bahar and Jernigan (1996), who proposed teractions (Buchete et al. 2003) using 20 × 20 side chain–
a backbone-dependent coordination system for studying the side chain (SC–SC) orientation-dependent potentials. We
statistical distribution of residue–residue interactions in pro- were motivated to include explicitly the side chain–back-
teins. They showed that residue-specific coordination loci bone (SC–BB) and backbone–backbone (BB–BB) interac-
and packing characteristics can be extracted by statistical tions in a new 21 × 21 interaction scheme by the results of
analysis of protein structures, and knowledge-based orien- the structural analysis performed on the main protein
tational potentials can be constructed. However, using the classes.
backbone-dependent coordination system employed by Ba- These results, summarized in Table 1, show that back-
har and Jernigan (1996), it is difficult to include variations bone–backbone and backbone–side chain interactions rep-
in the sizes of side chains and their rotational degrees of resent substantial fractions of the total number of contacts
freedom. These effects are important, especially for large for all the main classes of proteins, mainly-␣, mainly-, and
side chains that have several probable rotameric states with ␣/ (Orengo et al. 1997; Pearl et al. 2000, 2003). The pro-
respect to the backbone. Therefore, we use a coordination tein structures analyzed here are taken from the most recent
system based on side-chain local reference frames (LRFs, CATH database (Pearl et al. 2003; http://www.biochem.
see Materials and Methods) that are backbone independent ucl.ac.uk/bsm/cath_new). For short-range interactions (2.0
for most amino acids. On the basis of earlier estimations → 5.6 Å), the fraction of SC–BB contacts is more prevalent
(Buchete et al. 2003) and on statistical corrections for data in structures with -sheet topology (especially for |i – j| ⱖ
sparsity, we can also use smaller orientational bin sizes then 4), but these contacts cannot be ignored even for ␣-helical
used by Bahar and Jernigan (1996). In this study, we present proteins (see Table 1). Interestingly, as shown in Table 1,
a new method for extracting coarse-grained distance- and for medium- (5.6 → 9.2 Å) and long-range (9.2 → 12.8 Å)
orientation-dependent residue–residue statistical potentials. interactions, all of the three interaction types occur with
We first show that in many protein structures, there is a similar frequencies, regardless of the protein architecture.
substantial number of contacts between side chains and This is an effect of the more uniform averaging at larger
backbone. The near globularity of protein structures (i.e., distances when orientational preferences are not that impor-
they are nearly maximally compact) implies that such con- tant. The results presented in Table 1 show that the back-
tacts should be prevalent. The necessity to include explicitly bone interaction centers play an important role in the sta-
the backbone interactions is also supported by the results of bility of the secondary structures of proteins for all the
previous statistical derivations of backbone potentials that protein classes. Thus, these interaction centers must be in-
used virtual bond and torsion angles (Bahar et al. 1997) and cluded in representing the coarse-grained potentials for low-
secondary structure information (Miyazawa and Jernigan resolution structure prediction, as well as for analyzing pro-
1999a). To capture the presence of the large number of tein-folding pathways. This observation is in agreement
side-chain backbone contacts in protein structures, we in- with the previous study of Bahar and Jernigan (1997), where
clude an extra anisotropic backbone interaction center lo- distance-dependent SC–SC, SC–BB, and BB–BB statistical
cated at the peptide bond. Spherical harmonic analysis interactions were investigated in globular proteins. In this
(SHA) and spherical harmonic synthesis (SHS) methods study, the peptide (Pep) backbone moiety is treated as a 21st
express the orientation-dependent potentials in a continuous type of interaction center. By including Pep as an interaction
manner for short-, medium-, and long-range interactions. site, we determine the parameters for the statistical potential
Multiple decoy sets (Samudrala and Levitt 2000) are used to using a 21 × 21 interaction scheme.
www.proteinscience.org 863
Buchete et al.
Table 1. Fractions of side chain–side chain (SC–SC), side chain–backbone (SC–BB), and backbone–backbone (BB–BB) contacts
calculated for the three most typical protein classes, namely ␣, , and mixed ␣/
For |i − j| ⱖ 3
SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB
␣: 44.5% 25.2% 30.3% 30.9% 34.7% 34.4% 31.6% 31.4% 37.0%
␣/: 38.9% 26.0% 35.1% 31.2% 34.2% 34.6% 32.4% 32.3% 35.3%
: 40.0% 25.9% 34.1% 31.0% 34.6% 34.4% 32.2% 32.3% 35.5%
For |i − j| ⱖ 4
SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB SC–SC SC–BB BB–BB
␣: 61.4% 25.9% 12.7% 30.0% 34.7% 35.3% 31.3% 31.1% 37.6%
␣/: 43.1% 26.2% 30.7% 31.2% 34.7% 34.1% 32.1% 32.0% 35.9%
: 40.6% 25.9% 33.5% 31.6% 35.5% 32.9% 31.9% 32.0% 36.1%
The fractions of contacts were calculated both for interaction centers separated by at least three peptide bonds along the sequence (|i − j| ⱖ 3) and by at
least four (|i − j| ⱖ 4).
Orientational probability density maps and results for drophobic core of the protein structures. A related effect is
applying the Boltzmann device noticeable for Arg, which is one of the most hydrophilic
residues. There is a region of high interaction probability
The orientational density maps were calculated by dividing
toward its south pole, which can be explained by the pref-
the interaction range into three regions as described above.
erence of Arg to be exposed to solvent. The orientational
Figure 1 displays the probability density maps in the short
probability maps are also shown on the bottom of Figure 1
range (2.0 → 5.6 Å) for the relative orientations of other side
for Gly and Trp, and they present the same qualitative fea-
chains (denoted by the term “all”, and obtained by averag-
tures that are expected when considering their size and hy-
ing data from all the amino acids types) with respect to Ile
dropathic properties. Data for all the 21 × 21 possible types
(Fig. 1A), Arg (Fig. 1B), Gly (Fig. 1C), and Trp (Fig. 1D).
of relative residu–residue and residue–backbone orienta-
The grayscale mapping is directly proportional to the prob-
tions was collected and analyzed in this work.
abilities of finding another side chain at various orienta-
The orientational probability density maps were further
tions, as shown in the grayscale bar. Note that for Ile, one of
used (see Materials and Methods) to construct the corre-
the most hydrophobic residues, the highest probability of
sponding potentials. The values in the orientational prob-
finding a neighboring residue is toward its north pole,
ability density map for Arg residues around Ile (Fig. 2A) are
which, according to our definitions of local reference frames
divided by the values of the reference map (Fig. 2A) that is
(LRFs, see Materials and Methods), is situated away from
obtained by averaging over all the 20 side-chain types (see
the local backbone. This effect is a manifestation of the high
Materials and Methods). In this study, the reference average
statistical probability of finding Ile residues within the hy-
probabilities are not side chain-specific, but this is closer to
the random mixing approximation of side chains. The nega-
tive logarithm of the result gives the statistical potential for
the relative Ile–Arg orientations (Fig. 2C) in units of kT. In
this work, we derive orientation-dependent potentials for
short- [i.e., rij 僐 (2 Å, 5.6 Å)], medium- [rij 僐 (5.6 Å, 9.2
Å)], and long-range interaction shells [rij 僐 (9.2 Å, 12.8 Å)].
The maps shown here correspond to the second shell of
interactions (i.e., medium range).
We used the same procedure to obtain the orientational
potentials for all the types of SC–SC and SC–BB interac-
tions.
Figure 1. Examples of probability density maps for the relative residue– The spherical harmonic analysis (SHA) of inter-residue
residue orientations in proteins for short range interactions (2.0→5.6 Å).
The grayscale mapping is directly proportional to the probabilities of find-
orientational potentials
ing any other side chain at various orientations with respect to Ile (A), Arg
(B), Gly (C), and Trp (D). The normalized probability values shown on the The statistical potentials derived using the Boltzmann de-
grayscale scale have units of 10−3. vice were further analyzed using SHA (see Materials and
Figure 2. The Boltzmann device. Statistical potentials for the relative residue–residue orientations can be derived from probability
density maps. The orientational probability density map (in units of 10−3) for Arg residues around Ile (A) and the reference map (B)
corresponding to all of the side-chain types are used to build the statistical potential for the relative Ile–Arg orientations (C) in units
of kT. These maps correspond to the second interaction shell [rij僐(5.6 Å,9.2 Å)]. The smoothing effect of spherical harmonic synthesis
(see Materials and Methods) is shown in D.
Methods). We adapted Spherepack routines (Adams and Fig. 3 for an example), suggesting that further filtering
Swarztrauber 1997, 1999) to analyze the potential data, methods can be applied, and that efficient computational
which was first constructed on a 12 × 24 equiangular grid methods using the new smooth potentials resulting from
on spherical domains corresponding to the three (i.e., short, SHS can be developed. The dominance of only a few ex-
middle, and long) interaction ranges. pansion coefficients (Fig. 3) is consistent with our earlier
For example, in Figure 3 we show the amn and bmn co- finding (Buchete et al. 2003) that, in proteins with different
efficients (see eq. 10 in Materials and Methods) computed architectures, only a few orientational order parameters are
for long-range Ile–Arg interactions, up to order n ⳱ 13 (m relevant.
ⱕ n). The analysis of all 21 × 21 types of orientational We show in Figure 4 the reconstructed Ile–Arg orienta-
potentials was performed and the amn and bmn coefficients tional potential using 12 × 24 equiangular bins (up) and a
were stored. Calculation of the expansion coefficients (amn 92 × 184 grid (down), for short-range (left), middle-range
and bmn; see eq. 10) is vital, because it permits rapid (middle), and long-range (right) interactions. When com-
calculation of each specific orientational potential by paring the SHS potential values reconstructed on the
spherical harmonic synthesis (SHS) for any value of the 92 × 184 grid to the original orientational potential values
LRF orientational parameters and . Importantly, not for Ile–Arg shown in Figure 4 (left), the smoothing effect of
many a and b coefficients have large amplitudes (see the SHA/SHS procedure is evident.
The results of the same type of SHA/SHS process are
shown in Figure 5 for the anisotropic virtual backbone in-
teraction centers (Pep) located in the middle of the peptide
bond (see Materials and Methods). The smooth Pep–Pep
orientational potentials are represented for short-range
(left), middle-range (middle), and long-range (right) inter-
actions. The grayscale coding used in this figure is similar
to the one used in Figure 4. From both Figures 4 and 5 we
can see that, for each interaction range, there are specific
anisotropic features of the orientation-dependent statistical
Figure 3. Example of spherical harmonic amn (A) and bmn (B) coefficients potentials. Some of the attractive or repulsive angular re-
(m ⱕ n ⱕ 13) for Ile–Arg orientational statistical potentials. This is a typi-
gions are conserved from one interaction shell to the other,
cal situation for long-range interactions, in which only a few dominant
eigenvalues exist. These values can be used in the spherical harmonics yet some present significant changes that could explain the
synthesis process for reconstructing smooth orientational potentials for all specific features of residue–residue, residue–backbone, and
of the possible relative orientations, with high accuracy. backbone–backbone interactions.
www.proteinscience.org 865
Buchete et al.
Figure 4. The smooth Ile–Arg orientational potentials represented for short-range (top), middle-range (center), and long-range
(bottom) interactions. The potential values, calculated originally for a 12 × 24 equiangular grid are shown in A, C, and E. The
corresponding smooth potentials computed for a 92 × 184 grid using spherical harmonic synthesis are shown in B, D, and F. Dark
regions correspond to attractive (i.e., negative) potentials, whereas white regions are positive, thus less likely to correspond to
interaction loci.
The three-dimensional contour maps (Fig. 6) of continu- presence of other Pep particles is favored at these orienta-
ous and smooth reconstructed statistical potential fields il- tions. The light-gray regions represent unfavorable and re-
lustrate the relative LRF orientation of the “virtual back- pulsive regions around Pep. This type of three-dimensional
bone interaction center” backbone particle (Pep). For the representation can be used to investigate all of the possible
objects shown here, the grayscale is directly proportional to side chain–side chain, side chain–Pep, and Pep–Pep orien-
the amplitude of the potentials. The negative attractive po- tation-dependent statistical potential fields and their unique
tential values are indicated as dark angular regions. The features.
Figure 5. The smooth Pep–Pep orientational potentials represented for short-range (left), middle-range (center), and long-range (right)
interactions. The potential values, calculated originally for a 12 × 24 equiangular grid are shown in A, C, and E. The corresponding
smooth potentials computed for a 92 × 184 grid using spherical harmonic synthesis are shown in B, D, and F. Dark regions correspond
to attractive (i.e., negative) potentials, whereas white regions are positive.
x−x
Zx = (1)
x
www.proteinscience.org 867
Buchete et al.
a very negative C␣ RMSD Z score (ZRMSD) to the decoy Samudrala and Levitt 2000; Fain et al. 2001; Keasar and
structure that has the lowest energy. It is clear from Fig- Levitt 2003) lmds, fisa, fisa_casp3, and 4state are shown in
ure 7 that the UDO–21s (i.e., smooth, distance- and orienta- Figure 8. To assess the performance of the present poten-
tion-dependent potentials that are reconstructed by SHA tials, we compare the results using UDO–20 (dark bars) and
using the 21 × 21 interaction model) are very successful in UDO–21 (white bars). The cases in which the new UDO–21
correctly identifying the native state in the 2cro decoy set. potentials perform better in discriminating the native state
The UDO notation is used for statistical potentials that are from decoys are emphasized by the arrows on the left. For
both distance- and orientation-dependent, whereas the a large majority of decoy sets (84% when considering the
UD potentials depend solely on distance. The numerical energy score ZE), the performance is improved by including
ZE score in this case is lower than what is found using the backbone interaction centers. The values of ZE com-
the UD–20 potential. In this instance, UDO–21s identifies the puted using UDO–21 are more negative then for UDO–20 for
native state as the one with the lowest energy (Fig. 7). On all decoy sets except the 4state (Fig. 8).
the other hand, for the UD–20 potential, there are decoy The results in Figure 8 show an important number of
structures with lower energies than the native state. In improvements for the ZE score. However, our results show
our tests, we use ZE scores to assess the efficacy of the that the ZRMSD score is a noisy quantity, in agreement with
potentials. the observations of Samudrala and Levitt (2000), showing a
We have computed both the energy and the RMSD Z less systematic behavior. As a consequence, we are focusing
scores (ZE and ZRMSD) for the distribution of the total en- on the energy ZE score as the main quantitative performance
ergies for each protein decoy set. Assuming pairwise addi- indicator of the ability of our potentials to identify native-
tivity, the total potential for the residue pair ij is like protein conformations in decoy tests.
ij
UDO 共rij, ij, ij, ji, ji兲 = UDO
ij
共rij, ij, ij兲
Effect of the SHA/SHS method on the
+ UDOji
共rji, ji, ji兲. (2) smoothed potentials
The results for ZE and ZRMSD for the multiple decoy sets To assess the effect of smoothing (see Materials and Meth-
(Park and Levitt 1996; Park et al. 1997; Simons et al. 1997; ods) on the results, we compare in Figure 9 ZE scores cal-
Figure 8. Results from decoy tests on the orientation-dependent potentials. The energy (ZE) and C␣ RMSD Z scores (ZRMSD) calculated
for multiple decoy sets (Park and Levitt 1996; Park et al. 1997; Simons et al. 1997; Samudrala and Levitt 2000; Keasar and Levitt 2003)
lmds, fisa_casp3, fisa, and 4state are compared before (UDO–21) and after (UDO–21s) the SHA/SHS method is applied. The PDB code
for each protein and the number of decoys in its corresponding set are shown in the middle. The dark bars correspond to UDO–21 and
the white bars are for UDO–21s. The cases in which the UDO–21 potentials perform better in discriminating the native state from decoys
are emphasized by the arrows on the left. For a majority of decoy sets (84% for ZE and 56% for ZRMSD), the performance is improved.
www.proteinscience.org 869
Buchete et al.
even further enhanced by including the 21st anisotropic ability of the smoothed orientation- and distance-dependent
backbone interaction site. potentials in recognizing the native structures from a large
number of decoy sets is greatly enhanced by the inclusion of
the virtual interaction center Pep for the backbone. For all
Discussion
decoy sets except for the 4state, the energy Z-scores im-
The widely used statistical procedure for building distance- prove with the potentials used here.
dependent coarse-grained inter-residue statistical potentials The results of testing the reconstructed potentials on the
(Sippl 1990; Godzik et al. 1995; Lee et al. 1999; Miyazawa decoy sets of Samudrala and Levitt (2000) showed that only
and Jernigan 1999b) can be generalized to include orienta- small differences are introduced by the continuous spherical
tional effects (Bahar and Jernigan 1996; Buchete et al. harmonics treatment with respect to the case when the raw-
2003). The large and continually increasing number of ex- binned orientational potentials are used. The discriminatory
perimentally derived protein structures available from pro- power of the new orientational potentials is strongly im-
tein databases (Berman et al. 2000) can be used to extract proved by considering the 21st anisotropic interaction site in
both distance- and orientation-dependent statistical poten- the middle of the peptide bond and it is enhanced even more
tials. Our investigation of relative side chain–side chain by the SHA/SHS procedure in a majority of cases. From a
orientations in proteins, permits the identification of statis- computational point of view, there are potential benefits
tically preferred interaction loci for side chains, indepen- both for free energy calculations and coarse-grained dy-
dently of their neighboring C␣ backbone atoms. In accor- namical simulations that use statistical potentials as follows:
dance with previously reported results for a backbone-de- (1) The memory requirements for storing the spherical har-
pendent coordination system (Bahar and Jernigan 1996) monic coefficients instead of the raw orientational data are
glycine, serine, and threonine present less complex orienta- smaller, and (2) the values of the potentials can be recon-
tional probability distributions, with fewer peaks than other structed for any specific values of the and orientational
amino acids (e.g., arginine or isoleucine). Our orientational parameters as smooth and continuous functions over the
probability density maps present, in general, more peaks entire spherical domain. The parameters of the new orien-
than the orientational distributions of Bahar and Jernigan tation-dependent potentials are available on request. The
(1996), but a detailed comparison is not feasible due to the novel smoothed distance- and orientation-dependent poten-
different residue coordination system used in that study. tials constructed by spherical harmonic analysis of the sta-
Our previous tests (Buchete et al. 2003) showed that ori- tistical data are suited not only for use in protein fold rec-
entation-dependent potentials are better in discriminating ognition studies, but also could help the development of
the native folded state from large sets of alternative decoy new large-scale coarse-grained protein simulations using
structures (Samudrala and Levitt 2000), than the radial-only molecular dynamics or Monte Carlo methods.
dependent statistical potentials.
In this study, we demonstrated that further improvements
can be obtained by including backbone interactions explic- Materials and methods
itly, and by analyzing and describing coarse-grained, orien-
tation-dependent inter-residue potentials in terms of the co- Local reference frames of side chains
efficients resulting from a spherical harmonic analysis. The
statistical data extracted from the PDB structures is used to To get parameters for the orientational dependence of the coarse-
build orientation-dependent potentials that have sufficient grained potentials, it is useful to define local reference frames
continuity properties to make possible their spherical har- (LRFs) for each amino acid (Fig. 12). For any amino acid, a LRF
monic analysis. We have constructed and studied a novel can be constructed by considering at least three noncollinear points
(P1, P2, and P3) that uniquely define the orientation of the LRF,
distance- and orientation-dependent potential for the 20 and a fourth point (usually denoted by Si for the i-th side chain)
amino acid set and an extra anisotropic backbone interaction that specifies the location of the LRF origin. In the coarse-grained
center. The explicit inclusion of the 21st interaction center representation, the Si points can be considered as the interaction
on the backbone was motivated by the important backbone– centers, as all of the relative side chain–side chain distances and
backbone and backbone–side chain contact fractions that orientations are measured with respect to them. The choice of the
three points P1, P2, and P3 is important for the following reasons:
can be observed in a representative set of the main protein (1) The reference points should be unambiguously identifiable in
classes (Pearl et al. 2003). The smooth, continuous potential all of the side chains, regardless of their particular conformation;
is described using its spherical harmonic coefficients for (2) they must not be arranged in a collinear configuration; and (3)
short-, medium-, and long-range interactions. the positions of the side-chain atoms in the LRF should vary as
Because the native structures, regardless of their archi- little as possible for various side-chain conformations of the same
amino acid type. The choice of these three reference points is easy
tecture (␣, , or ␣/), are stabilized by a substantial number for small or relatively rigid amino acids; however, it is more dif-
of backbone–side chain and backbone–backbone interac- ficult for the relatively large ones that are expected to be quite
tions, it is crucial to account for them in any force field. The mobile, with long side chains such as lysine or arginine.
www.proteinscience.org 871
Buchete et al.
C␣i , and Ci. (2) Because Gly and Ala do not have C␥ atoms, we
used the position of the backbone atom Ci as P3. In this way, the
local Oy axis is pointing in the direction defined by the backbone
atoms C␣i and Ci. (3) For Cys and Ser, the corresponding coordi-
nates of the S and O atoms are substituted for the coordinates of
the missing C␥ and are used, therefore, for defining P3. (4) For Ile
and Val, the coordinates of the midpoint between the two C␥ atoms
are used for P3.
These definitions have the advantage that, whereas being side–
chain dependent, the positive Oz axis is always oriented away from
the local backbone, whereas the positive Oy axis points toward
more remote C␥ atoms in the SC. For small side chains, Oy points
toward the next SC on the backbone sequence.
The most important advance made in this study is to introduce
a virtual backbone interaction center (Pep) in the middle of the
peptide bond. We were motivated to include this as the 21st in-
teraction center on the basis of the analysis of proteins structures
that revealed (see Results) that folded structures are stabilized by
a substantial number of side chain–backbone contacts. For Pep, the
positions of the three reference points P1, P2, and P3 are identified
with the positions of the carbonyl C atom, its O atom, and the
peptide bond N atom. The interaction center Si for Pep is placed in
the middle of its C–N peptide link.
These definitions of the LRFs permit the investigation of rela-
tive coordination probabilities (e.g., for hydrogen bonding) as well
as of hydropathic effects in side-chain packing.
structed by averaging over all the interactions corresponding to all erties over the entire angular range. In practice, it is convenient to
of the types of side chains j that are observed around all of the side use a series with real even and odd eigenfunctions, namely,
chains i. The parameter m⬘ is related to the actual number of
measurements obtained for the ij pair. If m is the number of mea-
surements for the ij pair (Sippl 1990), for considering the orien-
U共, 兲 = 兺关a
m,n
mnYnm共,
o
兲 + bmnYnm
e
共, 兲兴. (10)
冋 册
mented in the program Spherepack. Although they were initially
gij共r兲 developed for geophysical processes, we found that the Spherepack
ij
UD 共r兲 = −kT ln (7) routines are general enough and can be used successfully for ana-
gref共r兲
lyzing the data that we extracted from protein structures. We pre-
for the distributions depending only on distances. We define a sent below a short review of the numerical spherical harmonic
more general distance- and orientation-dependent potential: analysis procedure that was performed using Spherepack 3.0 (Ad-
ams and Swarztrauber 1999).
ij
UDO 共r, , 兲 = −kT ln 冋 Pij共r, , 兲
册
Pref共r, , 兲
. (8)
Let N be the number of grid points corresponding to sampling
the data along the angle. We use 2(N – 1) grid points for . These
sampling points are placed on the following equiangular grid:
As mentioned earlier, we use UDO for the statistical potentials that
are both distance- and orientation-dependent, and UD for potentials i = i⌬ − Ⲑ 2, i = 0, 1, . . . , N − 1, ⌬ =
that depend solely on inter-residue distances. As in previous stud- N−1
ies (Buchete et al. 2003), we consider the reference pair distribu- i = j⌬, j = 0, 1, . . . , 2N − 1, ⌬ = ⌬. (11)
tion function gref and reference probability Pref to correspond to Assuming that the angular dependent potential function is suffi-
radial or angular pair distributions averaged over all 20 residue ciently smooth, one can perform its spherical harmonic analysis
types. Databases of nonhomologous proteins are necessary for and find the corresponding coefficients
estimating the pair distributions and for extracting amino acid-
specific interaction potentials that are consistent with various pro- 2 Ⲑ 2
tein architectures. amn = ␣mn 兰 兰 U共, 兲 P 共cos 兲 ⳯ cos共m兲 cos d d
0 − Ⲑ 2
m
n (12)
www.proteinscience.org 873
Buchete et al.
The prime notation (Adams and Swarztrauber 1997) on the sum simulations. I. Functional forms and parameters of long-range side-chain
indicates that the first term corresponding to m ⳱ 0 must be mul- interaction potentials from protein crystal data. J. Comput. Chem. 18: 849–
873.
tiplied by 0.5. Liwo, A., Pincus, M.R., Wawak, R.J., Rackovsky, S., Oldziej, S., and Scheraga,
H.A. 1997b. A united-residue force field for off-lattice protein-structure
simulations. II. Parameterization of short-range interactions and determina-
Acknowledgments tion of weights of energy terms by Z-score optimization. J. Comput. Chem.
18: 874–887.
This work was supported by the National Institutes of Health R01 Liwo, A., Kazmierkiewicz, R., Czaplewski, C., Groth, M., Oldziej, S., Wawak,
NS41356-01 (J.E.S. and D.T.), the National Science Foundation R.J., Rackovsky, S., Pincus, M.R., and Scheraga, H.A. 1998. United-residue
CHE-9975494 (J.E.S.), and CHE-0209340 (D.T.). The authors are force field for off-lattice protein-structure simulations: III. Origin of back-
grateful to the Center for Scientific Computing and Visualization bone hydrogen-bonding cooperativity in united-residue potentials. J. Com-
put. Chem. 19: 259–276.
at Boston University for computational resources. The data visu- Meller, J., Wagner, M., and Elber, R. 2002. Maximum feasibility guideline in
alization was carried out using VMD (Humphrey et al. 1996), the design and analysis of protein folding potentials. J. Comput. Chem. 23:
Raster3D (Merritt and Bacon 1997), and Matlab (The Mathworks, 111–118.
Inc.). Merritt, E.A. and Bacon, D.J. 1997. Raster3D: Photorealistic molecular graph-
The publication costs of this article were defrayed in part by ics. Method. Enzymol. 277: 505–524.
Miyazawa, S. and Jernigan, R.L. 1985. Estimation of effective interresidue
payment of page charges. This article must therefore be hereby contact energies from protein crystal structures: Quasi-chemical approxi-
marked “advertisement” in accordance with 18 USC section 1734 mation. Macromolecules 18: 534–552.
solely to indicate this fact. ———. 1999a. Evaluation of short-range interactions as secondary structure
energies for protein fold and sequence recognition. Proteins 36: 347–356.
———. 1999b. Self-consistent estimation of inter-residue protein contact en-
References ergies based on an equilibrium mixture approximation of residues. Proteins
34: 49–68.
Adams, J.C. and Swarztrauber, P.N. 1997. Spherepack 2.0: A model develop- Orengo, C.A., Michie, A.D., Jones, S., Jones, D.T., and Thornton, J.M. 1997.
ment facility. NCAR Tech. Note NCAR/TN-436-STR. CATH—A hierarchic classification of protein domain structures. Structure
———. 1999. Spherepack 3.0: A model development facility. Monthly Weather 5: 1093–1108.
Rev. 127: 1872–1878. Park, B. and Levitt, M. 1996. Energy functions that discriminate X-ray and near
Arfken, G.B. and Weber, H.J. 1995. Mathematical methods for physicists, 4th native folds from well-constructed decoys. J. Mol. Biol. 258: 367–392.
ed. Academic Press, San Diego, CA. Park, B., Huang, E.S., and Levitt, M. 1997. Factors affecting the ability of
Bagci, Z., Jernigan, R.L., and Bahar, I. 2002a. Residue coordination in proteins energy functions to discriminate correct from incorrect folds. J. Mol. Biol.
conforms to the closest packing of spheres. Polymer 43: 451–459. 266: 831–846.
———. 2002b. Residue packing in proteins: Uniform distribution on a coarse- Pearl, F.M.G., Lee, D., Bray, J.E., Sillitoe, I., Todd, A.E., Harrison, A.P.,
grained scale. J. Chem. Phys. 116: 2269–2276. Thornton, J.M., and Orengo, C.A. 2000. Assigning genomic sequences to
Bahar, I. and Jernigan, R.L. 1996. Coordination geometry of nonbonded resi- CATH. Nucleic Acids Res. 28: 277–282.
dues in globular proteins. Fold. Des. 1: 357–370. Pearl, F.M.G., Bennett, C.F., Bray, J.E., Harrison, A.P., Martin, N., Shepherd,
———. 1997. Inter-residue potentials in globular proteins and the dominance of A., Sillitoe, I., Thornton, J.M., and Orengo, C.A. 2003. The CATH data-
highly specific hydrophilic interactions at close separation. J. Mol. Biol. base: An extended protein family resource for structural and functional
266: 195–214. genomics. Nucleic Acids Res. 31: 452–455.
Bahar, I., Kaplan, M., and Jernigan, R.L. 1997. Short-range conformational Samudrala, R. and Levitt, M. 2000. Decoys ‘R’ Us: A database of incorrect
energies, secondary structure propensities, and recognition of correct se- conformations to improve protein structure prediction. Protein Sci. 9: 1399–
quence-structure matches. Proteins 29: 292–308. 1401.
Berman, H.M., Westbrook, J., Feng, Z., Gilliland, G., Bhat, T.N., Weissig, H., Simons, K.T., Kooperberg, C., Huang, E.S., and Baker, D. 1997. Assembly of
Shindyalov, I.N., and Bourne, P.E. 2000. The Protein Data Bank. Nucleic protein tertiary structures from fragments with similar local sequences using
Acids Res. 28: 235–242. simulated annealing and Bayesian scoring functions. J. Mol. Biol. 268:
Buchete, N.-V., Straub, J.E., and Thirumalai, D. 2003. Anisotropic coarse- 209–225.
grained statistical potentials improve the ability to identify native-like pro- Sippl, M.J. 1990. Calculation of conformational ensembles from potentials of
tein structures. J. Chem. Phys. 118: 7658–7671. mean force. J. Mol. Biol. 213: 859–883.
Fain, B., Xia, Y., and Levitt, M. 2001. Determination of optimal Chebyshev- ———. 1995. Knowledge-based potentials for proteins. Curr. Opin. Struct.
expanded hydrophobic discrimination function for globular proteins. IBM J. Biol. 5: 229–235.
Res. Dev. 45: 525–532. Skolnick, J., Jaroszewski, L., Kolinski, A., and Godzik, A. 1997. Derivation and
Godzik, A., Kolinski, A., and Skolnick, J. 1995. Are proteins ideal mixtures of testing of pair potentials for protein folding. When is the quasichemical
amino-acids? Analysis of energy parameter sets. Protein Sci. 4: 2107–2117. approximation correct? Protein Sci. 6: 1–13.
Hendlich, M., Lackner, P., Weitckus, S., Floeckner, H., Froschauer, R., Gotts- Skolnick, J., Kolinski, A., and Ortiz, A. 2000. Derivation of protein-specific pair
bacher, K., Casari, G., and Sippl, M.J. 1990. Identification of native protein potentials based on weak sequence fragment similarity. Proteins 38: 3–16.
folds amongst a large number of incorrect models: The calculation of low Sneeuw, N. 1994. Global spherical harmonic analysis by least-squares and
energy conformations from potentials of mean force. J. Mol. Biol. 216: numerical quadrature methods in historical perspective. Geophys. J. Int.
167–180. 118: 707–716.
Humphrey, W., Dalke, A., and Schulten, K. 1996. VMD—Visual Molecular Sneeuw, N. and Bun, R. 1996. Global spherical harmonic computation by
Dynamics. J. Mol. Graphics 14: 33–38. two-dimensional Fourier methods. J. Geodesy 70: 224–232.
Keasar, C. and Levitt, M. 2003. A novel approach to decoy set generation: Swarztrauber, P.N. 1979. On the spectral approximation of discrete scalar and
Designing a physical energy function having local minima with native struc- vector functions on the sphere. SIAM J. Numer. Anal. 16: 934–949.
ture characteristics. J. Mol. Biol. 329: 159–174. Tanaka, S. and Scheraga, H.A. 1976. Medium- and long-range interaction pa-
Lee, J., Liwo, A., and Scheraga, H.A. 1999. Energy-based de novo protein rameters between amino acids for predicting three-dimensional structures of
folding by conformational space annealing and an off-lattice united-residue proteins. Macromolecules 9: 945–950.
force field: Application to the 10–55 fragment of staphylococcal protein A Thomas, P.D. and Dill, K.A. 1996. Statistical potentials extracted from protein
and to Apo Calbindin D9K. Proc. Natl. Acad. Sci. 96: 2025–2030. structures: How accurate are they? J. Mol. Biol. 257: 457–469.
Levitt, M. 1992. Accurate modeling of protein conformation by automatic seg- Tobi, D. and Elber, R. 2000. Distance-dependent, pair potential for protein
ment matching. J. Mol. Biol. 226: 507–533. folding: Results from linear optimization. Proteins 41: 40–46.
Liwo, A., Oldziej, S., Pincus, M.R., Wawak, R.J., Rackovsky, S., and Scheraga, Tobi, D., Shafran, G., Linial, N., and Elber, R. 2000. On the design and analysis
H.A. 1997a. A united-residue force field for off-lattice protein-structure of protein folding potentials. Proteins 40: 71–85.