| Coarse-Grained Modeling of the Actin Filament Derived from Atomistic-Scale Simulations Biophysical Journal, Volume 90, Issue 5, 1 March 2006, Pages 1572-1582 Jhih-Wei Chu and Gregory A. Voth Abstract A coarse-grained (CG) procedure that incorporates the information obtained from all-atom molecular dynamics (MD) simulations is presented and applied to actin filaments (F-actin). This procedure matches the averaged values and fluctuations of the effective internal coordinates that are used to define a CG model to the values extracted from atomistic MD simulations. The fluctuations of effective internal coordinates in a CG model are computed via normal-mode analysis (NMA), and the computed fluctuations are matched with the atomistic MD results in a self-consistent manner. Each actin monomer (G-actin) is coarse-grained into four sites, and each site corresponds to one of the subdomains of G-actin. The potential energy of a CG G-actin contains three bonds, two angles, and one dihedral angle; effective harmonic bonds are used to describe the intermonomer interactions in a CG F-actin. The persistence length of a CG F-actin was found to be sensitive to the cut-off distance of assigning intermonomer bonds. Effective harmonic bonds for a monomer with its third nearest neighboring monomers are found to be necessary to reproduce the values of persistence length obtained from all-atom MD simulations. Compared to the elastic network model, incorporating the information of internal coordinate fluctuations enhances the accuracy and robustness for a CG model to describe the shapes of low-frequency vibrational modes. Combining the fluctuation-matching CG procedure and NMA, the achievable time- and length scales of modeling actin filaments can be greatly enhanced. In particular, a method is described to compute the force-extension curve using the CG model developed in this work and NMA. It was found that F-actin is easily buckled under compressive deformation, and a writhing mode is developed as a result. In addition to the bending and twisting modes, this novel writhing mode of F-actin could also play important roles in the interactions of F-actin with actin-binding proteins and in the force-generation process via polymerization. Abstract | Full Text | PDF (352 kb) |
| Usefulness and Limitations of Normal Mode Analysis in Modeling Dynamics of Biomolecular Complexes Structure, Volume 13, Issue 3, 1 March 2005, Pages 373-380 Jianpeng Ma Summary Various types of large-amplitude molecular deformation are ubiquitously involved in the functions of biological macromolecules, especially supramolecular complexes. They can be very effectively analyzed by normal mode analysis with well-established procedures. However, despite its enormous success in numerous applications, certain issues related to the applications of normal mode analysis require further discussion. In this review, the author addresses some common issues so as to raise the awareness of the usefulness and limitations of the method in the general community of structural biology. Summary | Full Text | PDF (346 kb) |
| Rigid-Cluster Models of Conformational Transitions in Macromolecular Machines and Assemblies Biophysical Journal, Volume 89, Issue 1, 1 July 2005, Pages 43-55 Moon K. Kim, Robert L. Jernigan and Gregory S. Chirikjian Abstract We present a rigid-body-based technique (called rigid-cluster elastic network interpolation) to generate feasible transition pathways between two distinct conformations of a macromolecular assembly. Many biological molecules and assemblies consist of domains which act more or less as rigid bodies during large conformational changes. These collective motions are thought to be strongly related with the functions of a system. This fact encourages us to simply model a macromolecule or assembly as a set of rigid bodies which are interconnected with distance constraints. In previous articles, we developed coarse-grained elastic network interpolation (ENI) in which, for example, only atoms are selected as representatives in each residue of a protein. We interpolate distance differences of two conformations in ENI by using a simple quadratic cost function, and the feasible conformations are generated without steric conflicts. Rigid-cluster interpolation is an extension of the ENI method with rigid-clusters replacing point masses. Now the intermediate conformations in an anharmonic pathway can be determined by the translational and rotational displacements of large clusters in such a way that distance constraints are observed. We present the derivation of the rigid-cluster model and apply it to a variety of macromolecular assemblies. Rigid-cluster ENI is then modified for a hybrid model represented by a mixture of rigid clusters and point masses. Simulation results show that both rigid-cluster and hybrid ENI methods generate sterically feasible pathways of large systems in a very short time. For example, the HK97 virus capsid is an icosahedral symmetric assembly composed of 60 identical asymmetric units. Its original Hessian matrix size for a coarse-grained model is >(300,000). However, it reduces to (84) when we apply the rigid-cluster model with icosahedral symmetry constraints. The computational cost of the interpolation no longer scales heavily with the size of structures; instead, it depends strongly on the minimal number of rigid clusters into which the system can be decomposed. Abstract | Full Text | PDF (988 kb) |
Copyright © 2008 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 94, Issue 9, 3461-3474, 1 May 2008
doi:10.1529/biophysj.107.115956
Biophysical Theory and Modeling
Lei Zhou*,
,
and Steven A. Siegelbaum*, †
* Department of Neuroscience, Howard Hughes Medical Institute, Columbia University, New York, New York
† Department of Pharmacology, Howard Hughes Medical Institute, Columbia University, New York, New York
Address reprint requests to Lei Zhou, Department of Neuroscience, Columbia University, 1051 Riverside Drive, New York, NY 10032. Tel.: 212-543-5584; Fax: 212-795-7997.One major goal of studies of protein structure-function relationships is to identify their macroscopic correlated motions, and how these motions change in response to various external perturbations, such as ligand-binding. A variety of experimental approaches, including x-ray crystallography, NMR spectroscopy, and single-molecule biophysical techniques, have provided insights into macroscopic protein motions by monitoring the structural alterations of the same protein under different conditions. On the other hand, theoretical studies, such as molecular dynamics (MD) simulations and normal mode analysis (NMA), can also provide valuable information about internal protein motions 1,2.
Standard MD simulations sample the conformational space of a protein using the definitions for atomic interactions from various force fields and usually include explicitly treated water to reproduce solvent effects 3,4. Correlated protein motions can then be extracted from the MD simulations through diagonalizing the covariance matrix obtained from a section of the MD trajectory. This is also referred to as essential dynamics 5, principal component analysis (PCA) 6, or quasiharmonic analysis 7,8, due to the complex and anharmonic nature of protein dynamics. However, the size of the system, especially with explicitly treated water molecules, has provided a great computational challenge, generally limiting the timescale of MD simulations for large macromolecules to the nanosecond range, significantly shorter than the biologically relevant timescale of conformational changes that may require milliseconds or longer. Therefore, inefficient sampling is still a significant obstacle to extracting meaningful correlated motions from MD simulations 9,10.
Classical all-atom normal mode analysis (AANM) offers the ability to overcome some of the computational cost of MD simulations. AANM makes the simplifying assumption that protein motions can be described by harmonic motions around a local minimum on the protein energy surface. Starting with an initial protein structure, standard AANM requires an extensive minimization of the system's potential energy followed by the calculation of the Hessian matrix, whose 3N×3N (N=the number of atoms) elements represent the second derivative of the potential energy function along the Cartesian coordinates. Diagonalization of the mass-weighted Hessian matrix can then be used to generate the eigenvectors and eigenvalues of the matrix, which provide, respectively, information about the directions of the various correlated motions within the protein and their amplitudes at a given frequency 11,12,13,14.
However, the application of AANM to biological macromolecules has been limited by the requirements of physical memory to store the all-atom Hessian matrix and the significant CPU time to diagonalize the very large matrix. Therefore, in practice, AANM is normally applied to protein systems containing at most a few hundred residues, in most cases without explicitly treated water molecules. However, since solvent has important and complex interactions with the solute molecule, the explicit treatment of solvents is thought to be essential to faithfully reproduce protein dynamics. For MD simulations, it has been a standard practice to simulate a protein molecule in a box filled with explicitly treated water molecules and use periodic boundary conditions. However, performing AANM on such a system to extract protein motions is usually beyond the capabilities of currently available computational hardware and software.
To date, there have been only a few published AANM studies involving explicitly treated water molecules within a distance of several Angstroms of the protein surface 15,16,17. These studies showed that incorporating surface water is helpful to reproduce experimental observations, including the B-factors determined by x-ray crystallography. However, given the scarcity of these studies, novel techniques, with the ability to efficiently incorporate solvent effects and provide a complete survey of the vibrational spectrum, are still needed to improve the efficiency of AANM for large systems.
Technically, even though the storage of a Hessian matrix has become less of an obstacle due to the introduction of sparse matrix techniques, the diagonalization of the all-atom matrix is still a challenge and new algorithms are being continuously added to various linear algebra packages. These include the method of diagonalization in a mixed basis 18,19 implemented in CHARMM 20 and the iterative Lanczos/Arnoldi factorization method 21 implemented in GROMACS 22, two widely used simulation packages. Nevertheless, these iterative numerical methods are still very time consuming and can only yield a small fraction of the total eigenvectors, usually those corresponding to the lowest vibrational frequencies.
Fortunately, the low-frequency vibrational modes are closely related to large-amplitude correlated protein motions with minimum energy costs, which usually reflect the conformational changes relevant to protein function 1,23. Indeed, the collective motions represented by these eigenvectors are in good agreement with independent experimental measurements 24,25,26. However, pinpointing the most functionally relevant individual mode is not a trivial task. In addition, it has been suggested that a combination of modes is required for a reasonable mapping of the correlated motions 1,13,27. Moreover, recent studies showed that the modes of higher frequencies are also important, because energy input from external perturbations can shift the distribution of different modes to higher frequencies 13,28. Thus, a complete survey of the eigenvector space and corresponding eigenvalues is important for various theoretical applications, such as the calculation of thermodynamic configuration entropy and heat capacity.
As an alternative to classical AANM, coarse-grained approaches have been pursued to reduce the size of the system and improve computational efficiency 14,29,30,31. The block normal mode method (BNM) is an effective coarse-grained NMA approach that treats proteins as a system of rigid blocks 32,33,34. However, BNM still relies on a complex all-atom representation and starts from the same all-atom Hessian matrix as AANM. An important breakthrough came with the introduction of the elastic network model (ENM), which simplifies the complex atomic interactions to potential energy functions with only a single parameter (1kcal/mol/Å2) for C-α atoms, thus bypassing the time-consuming energy minimization steps 35,36. ENM (or isotropic Gaussian network model) reflects the intrinsic protein dynamics embedded in the overall molecular topology and effectively reproduces certain aspects of the atomic fluctuations determined by NMR and x-ray crystallography 37,38,39. The corresponding model used in NMA is referred to as the anisotropic network model (ANM) 40. Despite the dramatic simplifications, ENM is widely applied to large macromolecules and assemblies beyond the reach of traditional methods 41,42,43,44,45.
However, there is a trade-off between accuracy and speed in these coarse-grained methods. Much effort has gone into comparing results from these approximate methods with the results of classical AANM, the parent method, or the results of MD simulations, as a reference 32,33,42,46. Based on the degree of similarity between the low-frequency eigenvectors of AANM and the corresponding eigenvectors of coarse-grained methods, BNM has been found to produce more accurate results than ENM 32,33,45. This is not surprising because BNM starts from an extensively energy-minimized system described by an all-atom force field and then projects the all-atom Hessian matrix to the space of predefined blocks. In the limit, this method allocates only one residue in each block, providing the highest possible resolution in the implementation of the BNM method (however, at the greatest computational cost). Such an approach is implemented in the most recent version of CHARMM. Nonetheless, even BNM results show significant deviations from the AANM approach. Moreover, no coarse-grained method is able to incorporate the contributions from explicitly treated water molecules.
Here we implemented a novel coarse-grained normal mode method (CGNM) based on a partition scheme of the all-atom Hessian matrix to extract the correlated motions in the subspace of C-α atoms. We carried out our initial analysis on the 120-amino-acid cyclic nucleotide-binding domain (CNBD) and adjacent upstream 90-amino-acid cytoplasmic C-linker region from the HCN2 hyperpolarization-activated cyclic nucleotide-regulated cation channel 47. High resolution x-ray crystallography has shown that in the presence of cyclic nucleotides, this isolated soluble protein domain forms a fourfold symmetric tetrameric assembly with one cyclic nucleotide bound in the CNBD of each of the four subunits 48.
In this study, we report that CGNM provides a more accurate description of the motions of the HCN2 CNBD, as well as that of four other proteins, compared to two other coarse-grained methods, ENM and BNM, based on the degree of similarity of the results from the three coarse-grained approaches with the results of a full AANM analysis. It is important to note that we found that CGNM also allowed us to incorporate explicitly treated surface water molecules into protein motions projected in the subspace of the relevant atoms (C-α atoms in this study). Furthermore, a comparison of our CGNM results containing a layer of surface water with MD results on the same protein in a water-filled box demonstrates the importance of incorporating such surface structural water in studying protein dynamics.
We used a representative snapshot from MD simulations based on the x-ray crystal structure of the HCN2 channel C-terminus protein (PDB ID 1Q5O) 48 as the starting structure for the NMA described here. Briefly, the MD simulations, which we described in detail elsewhere 49, were performed as follows. We used the GROMOS96 force field from the GROMACS package 22,50. The whole system contains four subunits and each subunit contains 8636 protein atoms, 4 cAMP molecules, 23,654 water molecules, and 12 chloride ions to balance the charge in the system. For the bound ligand, cAMP, we used the topology generated by the PRODRG server and the partial charges defined in the GROMOS96 force field 50. We used the flexible SP3 water model in the simulation 51. The distance between the protein and each side of the rectangle box was set at 10Å. The particle mesh Ewald method 52, with a cutoff distance of 10Å, was used for the electrostatic potential energy. Before MD simulations, we applied basic energy minimization steps (steepest descent (SD) and conjugate gradient (CG)) to optimize the starting system and remove any nonphysical contacts. During the first 500ps of the MD simulation, the positions of the heavy atoms in the protein were fixed so that the system, especially the explicit water molecules, can be further optimized. After these steps, we carried out a normal MD simulation with a time step of 2fs and collected the trajectory every 0.5ps.
To carry out the normal mode calculations based on the all-atom force fields, we first performed an extensive energy minimization to ensure that the starting structure represents a local minimum on the energy surface. To achieve this, we applied SD and CG followed by the limited-memory Broyden-Fletcher-Goldfarb-Shanno method 22 at double precision numerical accuracy to the representative snapshot structure from the MD simulations obtained above. During these energy minimization steps, the electrostatic energy was described by a switch function with the distance for normal treatment set at 15Å and the cut-off distance set at 18Å 53.
The key step in the analysis was then to calculate the Hessian matrix for the entire system, containing the second derivatives of the potential energy functions (
). The matrix was then partitioned into four sections to extract the C-α components according to the equation
![]() | (1) |
![]() | (2) |
Here, Hxx, Hyy, Hxy, and Hyx submatrices contain the elements representing the interactions of, respectively, relevant to relevant atoms, nonrelevant to nonrelevant atoms, relevant to nonrelevant atoms, and nonrelevant to relevant atoms. In the CGNM method, the energetic contributions of all interactions with and between nonrelevant atoms (the non-C-α atoms here) are incorporated into a simplified Hessian matrix for the relevant-atom subspace, H′xx, using Eq. (2). The theoretical basis for deriving the C-α atom motions based on the atomic fluctuations from classical AANM was published by Berendsen and colleagues 54. A similar equation was used to extract the effective force constant matrix for C-α atoms 55 and discussed in the GROMACS discussion board (www.gromacs.org) in 2005 for the purpose of comparing AANM and MD-based PCA in the C-α subspace. Moreover, a recent study by Eom et al. 56 used a very similar method to obtain a coarse-grained approximation to ENM. After basic matrix manipulations, we found that our partitioning approach (Eqs. (1) and (2)) is identical to that of Eom et al. The major difference between our study and that of Eom et al. is that we have applied a coarse-grained approximation to classical NMA based on all-atom force fields, whereas Eom et al. aimed to improve the efficiency of ENM (which they referred to as the Gaussian network model).
To sort the Hessian into relevant and nonrelevant parts, we first converted the sparsely-stored mass-weighted Hessian matrix into a double precision ASCII file. We then generated an index file in which the indices for all C-α atoms (relevant atoms) were arranged at the beginning followed by non-C-α atoms. Each entry in the sparse Hessian matrix was read into the program and allocated to a new position, using the index file as a key for sorting. The C-α component (xx part) was stored in a dense matrix format. The symmetrical xy and yx parts were stored in a coordinate format for a sparse matrix. Non-C-α components (yy) were stored in a row-major format for a sparse matrix. We used a direct solving routine from the PARDISO package 57 and standard LAPACK and BLAS routines for matrix calculations.
Based on the eigenvectors and the corresponding eigenvalues, the following equation was used to calculate the mean-square fluctuation (MSF)(Å2):
![]() | (3) |
where k is the atom index, i is the eigenvector index, mk is the atom mass, and ω is vibrational frequency.
The following equation was used to calculate the configurational entropy based on the eigenvalues from ENM, CGNM, or PCA 58,59:
![]() | (4) |
where h is Planck's constant, kB is Boltzmann's constant, and ω is the vibrational frequency.
C-α atom coordinates from the energy minimized structures were directly used in the NMA based on the potential energy function defined by the elastic network model (ENM or anisotropic network model (ANM)) 40. For ENM, we used the default settings of the force constant (1 kcal/mol/Å2) and cutoff distance (13Å).
The same all-atom Hessian matrix was projected onto a subspace of rigid blocks, each of which contained a single residue for the protein or a single cAMP molecule for the bound ligand, to pursue the highest resolution possible with this method. The degrees of freedom equal six times the number of blocks. The Fortran code of DIAGRTB (v2.52) was used in this research with a modification of the size of the array, LRWORK, from 32,000,000 to 200,000,000, so that larger systems could be accommodated 32,33.
We used g_covar from GROMACS to perform PCA on a section of the MD trajectory. Overall rotational and translational motions were removed by fitting the protein structure of each time frame to a reference structure (starting frame). For the MD simulations at low temperatures, we reduced the system temperature with a simulated annealing protocol and then collected the MD trajectories after a 200-ps equilibration at the corresponding temperature. For each PCA, we used a 2-ns-long MD trajectory containing 4000 frames. The eigenvalue outputs from the PCA analysis represent the vibrational amplitude and were converted into the square of angular velocity by the equation
![]() | (5) |
The anharmonic factor for each eigenvector from PCA was calculated by the equation
![]() | (6) |
where i is the index for PCA eigenvectors, j is the index for NMA eigenvectors,
is the harmonic mean-square fluctuation as described by the NMA eigenvectors, and Mij is the dot product between the ith PCA eigenvector and jth NMA eigenvector 60.
The reference structures from two eigenvector systems were first aligned to the mass center of the molecule. A rotation matrix was then calculated based on two aligned reference structures. The second set of eigenvectors was rotated by the equation
![]() | (7) |
where k is the eigenvector index, l is the atom index, and m and n are the indices of xyz dimension 12. The following parameters for the overlap analysis, including dot product (Eq. (8)), spanning coefficient (Eq. (9)) 33,46, and cumulative overlap factor (COF) (Eq. (10)) 54, were calculated based on these aligned eigenvector sets.
![]() | (8) |
![]() | (9) |
![]() | (10) |
AANM, CGNM, and BNM use the same mass-weighted Hessian matrix; therefore, the corresponding orthogonal eigenvector output should still be mass-weighted. However, the default output of eigenvectors is not mass-weighted in the GROMACS program and not strictly orthogonal. We modified the source code of GROMACS to generate mass-weighted orthogonal eigenvectors for AANM analysis. We converted the GROMACS eigenvalues (
based on the mass-weighted Hessian matrix, in units of kJ/mol/nm2/amu) into the square of angular velocity (in units of s−2) by multiplying the Gromacs eigenvalues by the conversion factor of 10−24, based on the relation
![]() | (11) |
The following equation was used for calculating the MSF (Å2):
![]() | (12) |
where
is the eigenvalue of the Gromacs unit, k is the atom index, and i is the eigenvector index.
Since we compared the results of different methods in the subspace of C-α atoms, mass-weighting will not affect the eigenvector results of ENM. However, a mass factor is needed for the calculation of vibrational frequencies and atomic fluctuation amplitudes. To our knowledge, there is no standard method for converting the units to compare the ENM results directly with other calculations (e.g., AANM, BNM, etc.) without scaling. Here, we tentatively added a mass factor corresponding to the mass of C-α atom so that the angular velocity in units of s−1 and MSF in units of Å2 can be generated using a force constant of 1 kcal/mol/Å2. The eigenvalues were converted into the square of the angular velocity by multiplying by the factor
![]() | (13) |
The eigenvalue output from PCA analysis (default GROMACS in units of nm2; no mass weighting; C-α only) was converted into the square of angular velocity by the equation
![]() | (14) |
The experimental B-factor obtained through x-ray crystallography can be directly converted to atomic fluctuation (MSF, in Å2) using the equation 61
![]() | (15) |
Our goal in this study was to develop a coarse-grained approximation to classical all-atom normal mode (AANM) analysis 11,12. We have implemented a novel coarse-grained normal mode analysis (CGNM) that decreases the computational cost associated with AANM by partitioning the all-atom Hessian matrix containing the second derivative of the potential energy function into relevant and nonrelevant components, here focusing on the C-α atoms (see Methods, Eq. (2)). To assess the accuracy of our method, we first compared the results of AANM, the standard for these comparisons, with those of CGNM, as well as with results from two other coarse-grained approaches, ENM and BNM 32,40. As ENM and BNM treat proteins in a vacuum in the absence of water, we first compared the four methods under these dehydrated conditions. In the following section, we consider the effects on CGNM results of adding surface water.
As NMA is based on a harmonic approximation of the protein energy surface near an ideally global minimum, it first requires an extensive minimization of the potential energy of the starting protein structure. Here we used a representative structure of the HCN2 C-terminus protein obtained from a 20-ns-long MD trajectory based on the original crystal structure 48. This procedure allows the protein structure to be efficiently equilibrated in the same force field used by subsequent NMA (GROMOS96) 50, as reasonably long MD simulations should optimize the loop conformations and allow for small-scale rearrangement of secondary structures 62,63.
We first removed all water molecules from the representative MD snapshot structure. After an extensive minimization of the system, the final structure containing only the protein and cAMP atoms was used to generate the all-atom Hessian matrix, which was then iteratively diagonalized to produce the AANM result, providing the reference for comparison with the coarse-grained methods. Due to the limitation of computational resources, only a small fraction of the total eigenvectors and corresponding eigenvalues were calculated (2000, or 8% of 26,232). All technical details are given in Methods and Table 1. Briefly, ENM starts from the C-α atom coordinates and generates a complete set of orthogonal eigenvectors. BNM and CGNM methods started from the same all-atom Hessian matrix used by AANM. Whereas BNM simplifies the calculation through projecting the all-atom Hessian matrix into predefined rigid blocks, CGNM relies on a matrix-partitioning scheme to integrate the energetic contributions from non-C-α atoms into the motions of C-α atoms. Since the eigenvector outputs of BNM are in the all-atom space, they were projected to the C-α atom subspace for comparison purposes. This was followed by a normalization step that makes each eigenvector unitary (
) but not strictly orthogonal (Vi · Vj=0, i≠j). The eigenvector outputs of CGNM are naturally orthogonal in the C-α subspace and thus were directly used in the overlap analysis.
| Table 1 Comparison of parameters used in different NMA approaches |
| AANM | ENM | BNM | CGNM | |||
|---|---|---|---|---|---|---|
| Residues | 804 | 804 | 804 | 804 | ||
| Atoms | 8636 | 804 | 8636 | 8636 | ||
| Starting Hessian matrix size | 25,9082 | 24122 | 25,9082 | 25,9082 | ||
| Working Hessian matrix size | 25,9082 | 24122 | 48242 | 24122 | ||
| Practical/theoretical eigenvector set | 2000/25,908 | 2412/2412 | 4824/4824 | 2412/2412 | ||
| Eigenvector dimension | 25,908 | 2412 | 25,908 | 2412 | ||
| C-α only component extraction | Yes | No | Yes | No | ||
| Orthogonality of C-α component | No, but normalized | Yes | No, but normalized | Yes | ||
| CPU time (3.4 Ghz Xeon, sequential implementation) | ∼67h | ∼1h | ∼5h | ∼7h | ||
| Peak physical memory (Mbyte) | ∼1577 | 44 | ∼1400 | ∼1900 | ||
The results of ENM, BNM, and CGNM were compared to the results of AANM in terms of the overlap of the resulting eigenvectors, representing the direction of correlated motion, and eigenvalues, representing the amplitude or the frequency of each motion. Three different methods were used to check the overlap between the eigenvectors from AANM versus a given coarse-grained method. First, a direct view of overlap was obtained from a plot of the inner product between each pair of eigenvectors (Eq. (8)). Such plots confirm previous studies that BNM generates results closer to those of AANM than does ENM; this is shown by the tighter clustering of points near the ideal diagonal relationship for the BNM versus AANM plot 33,46 (Figure 1AB). It is important to note that CGNM provides an even better match (tighter diagonal clustering) with the AANM results than does BNM (Figure 1C). Second, we quantified the overlap between two sets of eigenvectors using the spanning coefficient (Eq. (9)), representing the overlap between each AANM eigenvector with a group of eigenvectors from each coarse-grained analysis 33,45,54. The nearly straight line of the spanning coefficient curve of CGNM up to a frequency of 10cm−1 indicated that the 70 or so lowest-frequency AANM eigenvectors can be almost completely mapped by the first 100 eigenvectors of CGNM (Figure 2A). However, this close mapping only extends as far as the first ∼10 or ∼15 AANM eigenvectors for ENM or BNM, respectively (Figure 2A).
A potential bias of using spanning coefficients is that an arbitrary number (100 here) of eigenvectors needs to be predefined, because the spanning coefficient involving all eigenvectors is theoretically equal to 1. This makes the spanning coefficient less meaningful when comparing systems of different dimensions of freedom. To circumvent this difficulty, we calculated COF, a factor for the overlap between two pools of eigenvectors as a function of pool size 54 (Eq. (10), Figure 2B). Consistent with the other methods of comparison, the COF results show that CGNM significantly outperforms the other two methods: the space represented by the first 100 eigenvectors from AANM overlaps 95% of that of CGNM versus 85% of BNM and only 65% of ENM.
Based on the results shown above, it is clear that the CGNM method generates a more accurate set of eigenvectors than does BNM or ENM. Next, we checked the accuracy of different coarse-grained methods through calculating the atomic fluctuations based on the eigenvalues and eigenvectors of the Hessian matrices, still using the results from AANM as a reference. MSF or root mean-square fluctuation (RMSF) was used to provide a direct measure of the atomic vibrational amplitude. Both MSF and RMSF values can be used to compare computational results with experimental measures of motion, such as B-factors (see Methods, Eq.(15)).
To gain insight into atomic fluctuations we first plotted the eigenvalues from individual coarse-grained methods against the corresponding values from AANM (Figure 2C). Over a large range of eigenvalues, there is a nearly linear relationship between the results of AANM and those of CGNM or BNM, suggesting a close relationship. In contrast, such ENM results did not show a close agreement with those of AANM. Next, we used the eigenvalues and the eigenvectors of the vibrational modes to calculate the atomic fluctuations for each atom (Eq. (3)) (Figure 2D). As predicted from the eigenvalues, the ENM fluctuation results (blue circles) deviate significantly from the other results. Both the CGNM (red circles) and BNM results (green circles) are in good agreement with the results from AANM on a residue-by-residue basis. Based on the correlation coefficient R values, the CGNM results (0.996) show a slightly better agreement with AANM compared to BNM (0.969). In contrast, both CGNM and BNM correlations are significantly better than that of ENM (0.838). To exclude possible errors introduced by the extraction of C-α components and the normalization step associated with AANM and BNM, we performed independent calculations using the original all-atom eigenvectors, which yielded identical results (Supplementary Material, Fig. S1 ).
Next, we expanded CGNM to incorporate the effect of explicitly treated surface water molecules on protein dynamics, an area that to date has not been addressed by other coarse-grained normal mode methods and is computationally expensive for classical AANM. Previous experimental studies showed that the thickness of the surface structural water layer ranged from 3Å for lysozyme, determined by x-ray and neutron scattering 64,65, to 5Å for lactose, determined by terahertz spectroscopy 66. Here, we treated the case of a 4-Å-thick layer of explicit water on the protein surface, a compromise that enables the calculations to stay within the limits of currently available computational resources (Figure 3AB). The all-atom Hessian matrix used in the CGNM calculations incorporated the interactions among all protein atoms, cAMP ligands, and explicitly treated water molecules. The corresponding hydration level is 0.56 (water mass/protein mass) and the system is of significant size (8636 protein atoms, 108 cAMP atoms, and 8817 water atoms). As a result, we could solve for <0.5% (250) of a total of over 52,683 AANM eigenvectors and eigenvalues using local computational resources. The BNM method could not be tested under these conditions because it does not currently include a method for allocating surface water molecules to specific blocks.
Inclusion of surface water led to a significant difference in AANM results compared to those obtained using AANM in the absence of water (Figure 3CD). It was not surprising that AANM results with water also diverged from those obtained with CGNM or ENM in the absence of water. We were impressed that CGNM results in the presence of surface water showed a good agreement with those obtained using AANM in the presence of surface water, with an increased overlap of eigenvectors indicated by a twofold increase in spanning coefficients (∼80%) compared to values obtained with the other methods in the absence of water (∼40% (Figure 3C)). Improvement was also observed in the larger COF values with CGNM (∼93%) versus those with the other methods (∼75% using the first 100 eigenvectors (Figure 3D)).
Using AANM with surface water results as a standard, we next checked the accuracy of the RMSF values for each C-α atom using ENM, BNM, or CGNM (Fig. 4). CGNM with water (red curve) faithfully reproduced the pattern of the corresponding AANM results (black curve, Figure 4A). The shift in absolute amplitude is due to the different number of eigenvectors used (2406, or 99.7%, for CGNM; 244 modes, or 0.5%, for AANM). The striking similarity is reflected in the high R-factor of the CGNM data versus the AANM data (0.925 (Figure 4D, left)), which is much greater than that for ENM (0.780) and slightly greater than BNM (0.915, limited to calculations without water).
The effect of solvent molecules on protein dynamics is an important issue that has been addressed by experimental and computational approaches. Previous studies using AANM revealed that inclusion of surface water dampened the amplitude of atomic fluctuations 16,67. We found a similar effect of surface water using CGNM, in which the average fluctuations of C-α atoms with surface water (0.11Å2) is significantly smaller than that of protein alone (0.16Å2), providing further support for the ability of CGNM to incorporate surface water in protein dynamics.
Are the CGNM results with a 4-Å layer of surface water molecules comparable to results based on MD simulations, in which the protein is fully embedded in a 102×102×81Å3 box filled with both surface and bulk water (Figure 3A)? MD simulations of the protein at 300K did a reasonably good job of reproducing the absolute amplitude and overall pattern of RMSF values from x-ray crystallographic B-factors (Figure 4B). However, the RMSF values from MD simulations at 300K are approximately three times larger than the CGNM results (Figure 4A). Moreover, the R factor between MD results and CGNM results with surface water is only 0.70 (Figure 4E). This deviation is likely caused by the contribution of random, diffusive motions that are included in the MD simulations but are ignored by the harmonic treatment of motions in all NMA approaches.
Since diffusive motions are greater at higher temperatures, we examined the ability of NMA to more accurately correspond to MD simulation results at lower temperatures. We first collected MD trajectories at 120K and 180K, respectively. The RMSF based on the MD simulation at 120K was much smaller than that at 300K; however, the convergence with CGNM results was not improved (0.69) (Figure 4B). Interestingly, the agreement between CGNM and MD simulation results at 180K was much improved, in terms of both the absolute value of fluctuations and overall pattern, with the R-factor increased to 0.85 with a slope factor of 0.82 (Figure 4C). These results indicate that at certain low temperatures (180K), the atomic fluctuations can be largely accounted for by harmonic motions involving the protein plus surface water but do not necessarily involve the bulk solvent that is also present in the MD simulations.
To further examine the effect of surface water on protein dynamics, the distribution of vibrational-mode frequencies was plotted for both CGNM and MD simulations at the three different temperatures. Previous studies have shown that explicit water has a complex influence on protein dynamics, including temperature-dependent frictional dampening and temperature-independent shifting of the vibrational modes to higher frequencies 68,69,70. However, it is not clear whether these effects are due to an interaction of the protein with surface structural water versus an interaction that also requires the presence of bulk solvent 71. Here, using CGNM, we find that surface water alone is sufficient to shift fluctuations to higher frequencies (Figure 5A), an effect that is observed at all three temperatures, thus confirming its temperature-independent character. It is most likely that this effect represents a static interaction of a cagelike structure formed by the interaction of surface water with exposed residues on the protein surface 65.
To isolate the potential influence of the anharmonic, diffusive protein motions captured by MD simulations but not by CGNM, we compared the frequency distributions of vibrational modes between CGNM with 4Å surface water and the MD simulations with bulk solvent (∼10Å from protein surface plus periodic boundary condition). The distribution of vibrational modes from MD simulations at low temperatures (180K and 120K) was quite similar to those from CGNM with water (Figure 5B). However, MD simulations, but not CGNM, revealed a significant shift to lower frequencies upon raising the temperature to 300K. This is in good agreement with previous studies suggesting that the shift to low frequencies is related to the anharmonic nature of protein dynamics (Figure 5C) and that the contributions from bulk solvent are more prominent at high temperatures (300K) and for low-frequency modes 69,70.
As a final test of the various NMA approaches, we compared the orthogonal sets of eigenvectors in the subspace of C-α atoms derived from the three coarse-grained methods (ENM, BNM, and CGNM) versus the eigenvectors based on PCA of MD simulation trajectories (Figure 5D). At 300K, the COF curves show a poor overlap between the eigenvectors at frequencies <25cm−1 from the MD simulations versus those obtained from all three NMA methods. At increasing frequencies, there is a gradual increase in the overlap of the NMA results with the PCA modes, consistent with the idea that frequencies >40–50cm−1 correspond to more “harmonic” protein vibrations 60.
Does the use of CGNM with a layer of water molecules on the protein surface make any significant difference in the overlap of eigenvectors with MD simulations? Careful examination of the COF curve suggests that indeed the results from CGNM with water are slightly but consistently better than the CGNM results without water at 300°K. An even greater improvement in overlap with the MD simulations was observed when using CGNM with water at lower temperatures (180K (Figure 5E) and 120K (Figure 5F)). The increased overlap between CGNM with surface water and the MD simulations at the two lower temperatures (but especially at 120K) suggests not only that CGNM is able to incorporate, at least partially, the contributions from explicit surface water, but also that surface water makes a significant contribution to protein vibrational modes with frequencies >50cm−1.
In this article, we implemented a matrix-partitioning scheme to extract the C-α components from the all-atom Hessian matrix, thus providing a novel coarse-grained NMA approach, which we termed CGNM. This method generated more accurate results than did other coarse-grained NMA methods, including ENM and BNM, based on a comparison with results obtained using classical AANM. However, CGNM retained the benefits of a great reduction in computational cost with the two other coarse-grained approaches. The flexibility in partitioning the all-atom Hessian matrix into relevant versus nonrelevant groups makes it straightforward to scale the scope of analysis, for example, from C-α atoms only to inclusion of all backbone atoms, depending on the size of the system and the available computational resources. In this manner, we were able to model the contributions from explicitly treated surface water to protein motion, which is beyond the reach of other coarse-grained NMA methods. Thus, the CGNM method represents a novel coarse-grained NMA approach that can be used to obtain more accurate results for systems of significant size.
Multiple lines of evidence indicate that CGNM produces more accurate results than ENM or BNM, using AANM results as a reference. Overlap plots and spanning coefficients clearly show that CGNM outperformed the other coarse-grained methods for the first 100 or so individual eigenvectors, which are of great functional significance because they represent the directions of protein conformational changes with highest amplitude, slowest frequency, and least energetic cost. COF, which has the advantage of representing the overlap of two groups of eigenvectors, confirmed that CGNM more closely reproduces classical AANM results than do ENM or BNM methods. We also confirmed that CGNM outperforms ENM or BNM on dozens of other proteins, with a range in size from 200 to 1300 amino acids (results of four other sample proteins are shown in Figs. S2 and S3 ).
A comparison of eigenvalues and related atomic fluctuations among different NMA methods also revealed differences among the three coarse-grained methods. ENM generated a surprisingly good match to the AANM results given the dramatic simplifications in its potential energy functions. However, the results from BNM and CGNM were in much better agreement with the AANM results compared to ENM, indicating that the detailed chemical information embedded in the all-atom Hessian matrix used for AANM, BNM, and CGNM makes an important contribution. These results are in good agreement with a recent comprehensive comparison among NMA approaches of different complexity 41. Moreover, CGNM performed slightly better than BNM, as shown by the correlation coefficient (R) between their MSF values and the values obtained by AANM.
Why does CGNM yield more accurate results than BNM, even though both methods are derived from the same all-atom Hessian matrix? BNM is rooted in the rotation-translation block model, which projects the all-atom Hessian matrix into a subspace of rigid blocks. Even though BNM fully takes into account the coupled motions between different blocks, the method ignores the small high-frequency vibrations related to the intrinsic flexibility within each block 72. Moreover, during analysis, the intermediate BNM results in the subspace of rigid blocks must be projected first back onto the space for all atoms and then onto the subspace of C-α atoms. However, the center of mass for each block is different from the position of the C-α atoms and varies among different amino acid residues. In contrast, CGNM is based on partitioning the all-atom Hessian matrix through a simple but theoretically rigorous scheme, which is then used to derive the motions for the C-α atoms 55. The fact that CGNM implicitly incorporates energetic contributions from non-C-α atoms into C-α atoms may contribute to the greater accuracy of this method.
A key advantage of CGNM is its ability to incorporate the detailed chemical information imbedded in the protein structure, including explicitly treated structural water molecules on the protein surface. For most MD applications, it has been relatively standard to treat solvent molecules explicitly, which is required to reproduce the electrical and dynamic properties of solvents 70,73,74,75,76. Indeed, experimental and theoretical studies have found that the surface water molecules within a radial distance of 3–5Å from the protein surface have very different physical-chemical properties from those in bulk solvent and play important roles in modulating protein motions. For example, the experimental observation that the density of the first hydration shell is ∼5% higher than that of bulk water has been successfully reproduced by MD simulations 65,77,78. Ideally, classical AANM should be performed on the same protein-water system used in MD simulations. However, the size of the system limits the AANM method so that bulk solvent and surface water must often be omitted for proteins of significant size. Here, we applied CGNM to systems containing a layer of explicitly treated water molecules and found that it not only reproduced results based on classical AANM with explicit surface water, but also helped delineate some features of complex solvent effects.
The choice of a surface water layer of 4Å in this study represents a balance that places a modest demand on computational resources but is consistent with experimental observations on protein surface water thickness, ranging from 3Å for lysozyme 64,65 to 5Å for lactose 66. We found that CGNM, which is based on a harmonic approximation to the energy surface, in the presence of surface water is able to reproduce MD results for atomic fluctuations of a fully solvated protein at 180K. Interestingly, this temperature (180K) is near the glass-transition point where diffusion starts to contribute significantly more to protein dynamics than does harmonic vibration 17,71,79,80,81. Moreover, spectroscopic experiments on bovine serum albumin showed that there is a significant dynamic change (glass transition) at around 170K to 180K, which might be due to formation of a rigid structure formed by water molecules covering the protein surface 79,82. These results corroborate this study, in which only the surface water is treated explicitly.
However, it is noticeable that even though CGNM results with water show an improved match with the atomic fluctuations from MD simulations compared to CGNM results on the dehydrated protein, the results from CGNM differ in important respects from those obtained using MD simulations or from experimentally determined crystallographic B-factors. This might be due to the complex nature of the protein energy surface and complex interactions between protein and solvent. The good agreement between the B-factors and the MD results at 300K confirms the advantages of MD, a method that does not involve a harmonic approximation of the protein energy surface and explicitly treats all water molecules (Figure 4B).
The CGNM results are successful in reproducing previous observations that solvation increases protein vibrational frequencies and point to the role of surface water in this phenomenon 24,69,83. Moreover, these effects are likely to reflect temperature-independent interactions in which surface water molecules serve to fill in protein surface irregularities and stabilize polar side chains, forming a cagelike structure around the protein surface 83. In contrast, a comparison of CGNM with surface water to MD simulations including bulk water indicate that bulk water molecules behave more like free water, acting to decrease the vibrational frequency of protein dynamics in a temperature-dependent manner 70. A recent experimental study of the influence of hydration on protein dynamics gives direct support to our results. Quasielastic neutron and light-scattering experiments show that adding an initial hydration layer (h≈0.2) increases the fast vibrational modes. Interestingly, further increasing the hydration level (h>0.2) significantly activates slower processes 78. Therefore, these experimental observations are in good agreement with our simulation results showing the different contributions of solvent molecules to protein dynamics 70,84,85.
The poor overlap in the eigenvectors from various NMA approaches (no water or surface water only) with the MD simulation results (surface water and bulk water), especially for the low-frequency modes (25cm−1), is not surprising. A previous study using a jump-among-minima model, which divides protein motions into intra-substate motions and inter-substate jumps based on a multiple local minima model of the energy surface, generated a much better overlap with MD results than does NMA 86. Moreover, a mixture of harmonic NMA plus diffusive Brownian dynamics has been proven to be effective in reproducing the results of MD simulations and experimental observations 55,87. These studies suggest that the harmonic approximation of the protein energy surface and the neglect of solvent limits the ability of NMA approaches, including AANM, BNM, and CGNM, to reproduce the directionality of intrinsic anharmonic protein dynamics in the native state 54,84. However, for modes beyond 25cm−1, there is a gradual increase in the fidelity of CGNM and BNM, especially for CGNM with a layer of surface water. It is interesting that this frequency region is the same spectrum covered by terahertz absorption spectroscopy (1 THz=33cm−1), where experimental results showed that solvation tends to enhance protein dynamics 88,89,90. Therefore, CGNM provides a convenient tool for modeling the contributions of surface water into protein dynamics at these higher frequencies. In principle, CGNM could be expanded to incorporate the effect of bulk solvent molecules in conjunction with other methods, such as the Langevin Model 71.
Thus far, the results presented for the HCN2 CNBD are for the cyclic-nucleotide bound state of the protein. However, we obtained similar results for the unliganded protein, using a representative snapshot from a 20-ns-long MD simulation with cAMP removed as the starting structure (Fig. S4–S6 ). One theoretical application of a complete set of eigenvectors and eigenvalues from NMA is the estimation of the configurational entropy 58,59. Taking advantage of the CGNM results for the unliganded protein versus the cAMP-bound protein in the subspace of C-α atoms, we estimated the entropy change of C-α atoms upon cAMP binding to be −127.8J/mol without water or −174.3J/mol with surface water (Table 2). Both values should be smaller than the estimate involving all atoms. However, the direction of the changes from the two independent calculations is consistent with previous MD results and the concept that ligand binding for hydrophilic or charged ligands (cAMP carries a negative charge) usually involves a reduction in the configurational entropy of the protein 91,92,93. Further improvements of the computational routine will focus on reducing the memory cost and use of more efficient routines for sparse matrix manipulation. With advances in computational algorithms, more memory-efficient and high-performance (sequential or parallel) routines could further improve this method and thus widen its application to more complex systems.
| Table 2 Configurational entropy of C-α atoms based on NMA and PCA (300K) |
| Unliganded protein (8.31J/mol/K) | cAMP-bound protein (8.31J/mol/K) | Difference (8.31J/mol/K) | |||
|---|---|---|---|---|---|
| ENM-protein | 4103.92 | 4126.26 | 22.34 | ||
| ENM-water | 4154.64 | 4156.63 | 1.99 | ||
| CGNM-protein | 2555.63 | 2540.23 | −15.40 | ||
| CGNM-water | 2390.10 | 2369.13 | −20.97 | ||
| PCA (10ns) | 3051.79 | 2957.70 | −97.09 | ||
| PCA (4ns) | 2756.74 | 2680.41 | −76.33 | ||
We are grateful for the computational time provided by Pittsburgh Supercomputing Center through the National Resource Allocation Committee (grant MCB060032N) to L.Z. and S.A.S. This work was partially supported by grant NS36658 from the National Institutes of Health (to S.A.S.).
1. (2000). Collective protein dynamics in relation to function. Curr. Opin. Struct. Biol. 10, 165–169. CrossRef | PubMed
2. (1999). Investigating protein dynamics in collective coordinate space. Curr. Opin. Struct. Biol. 9, 164–169. CrossRef | PubMed
3. (2002). Molecular dynamics simulations of biomolecules. Nat. Struct. Biol. 9, 646–652. CrossRef | PubMed
4. (1977). Dynamics of folded proteins. Nature 267, 585–590. CrossRef | PubMed
5. (1993). Essential dynamics of proteins. Proteins 17, 412–425. CrossRef | PubMed
6. (1992). Large-amplitude nonlinear motions in proteins. Phys. Rev. Lett. 68, 2696–2699. CrossRef | PubMed
7. (1991). Collective motions in proteins: a covariance analysis of atomic fluctuations in molecular dynamics and normal mode simulations. Proteins 11, 205–217. CrossRef | PubMed
8. (1995). Harmonic analysis of large systems. III. Comparison with molecular dynamics. J. Comput. Chem. 16, 1554–1566. CrossRef | PubMed
9. (2000). Similarities between principal components of protein dynamics and random diffusion. Phys. Rev. E Stat. Phys. Plasmas Fluids Relat. Interdiscip. Topics 62, 8438–8448. CrossRef | PubMed
10. (1996). Principal component analysis and long time protein dynamics. J. Phys. Chem. 100, 2567–2572. CrossRef | PubMed
11. (1983). Harmonic dynamics of proteins: normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. USA 80, 6571–6575. CrossRef | PubMed
12. (1995). Harmonic analysis of large systems. I. Methodology. J. Comput. Chem. 16, 1522–1542. CrossRef | PubMed
13. (2005). Usefulness and limitations of normal mode analysis in modeling dynamics of biomolecular complexes. Structure 13, 373–380. Abstract | Full Text | PDF (346 kb) | CrossRef | PubMed
14. (1983). Dynamics of a small globular protein in terms of low-frequency vibrational modes. Proc. Natl. Acad. Sci. USA 80, 3696–3700. CrossRef | PubMed
15. (2004). A normal mode analysis of structural plasticity in the biomolecular motor F(1)-ATPase. J. Mol. Biol. 340, 345–372. CrossRef | PubMed
16. (1997). Ligand-induced conformational changes in ras p21: a normal mode and energy minimization analysis. J. Mol. Biol. 274, 114–131. CrossRef | PubMed
17. (2003). Thermodynamics of protein hydration computed by molecular dynamics and normal modes. J. Phys. Chem. B 107, 12820–12828. PubMed
18. (1994). A new approach for determining low-frequency normal modes in macromolecules. Biopolymers 34, 759–771. CrossRef | PubMed
19. (1995). Computation of low-frequency normal modes in macromolecules: improvements to the method of diagonalization in a mixed basis and application to hemoglobin. Comput. Chem. 19, 241–246. CrossRef | PubMed
20. (1983). CHARMM: a program for macromolecular energy, minimization, and dynamics calculations. J. Comput. Chem. 4, 187–217. CrossRef | PubMed
21. (1999). LAPACK Users’ Guide. 3rd edition, (Philadelphia: Society for Industrial and Applied Mathematics). PubMed
22. (2001). GROMACS 3.0: a package for molecular simulation and trajectory analysis. J. Mol. Model 7, 306–317, (online computer file). PubMed
23. (1976). The hinge-bending mode in lysozyme. Nature 262, 325–326. CrossRef | PubMed