| Molecular Dynamics Simulation of the Human U2B Protein Complex with U2 snRNA Hairpin IV in Aqueous Solution Biophysical Journal, Volume 81, Issue 2, 1 August 2001, Pages 630-642 Jian-xin Guo and William H. Gmeiner Abstract A 2200-ps molecular dynamics (MD) simulation of the U2 snRNA hairpin IV/U2B″ complex was performed in aqueous solution using the particle mesh Ewald method to consider long-range electrostatic interactions. To investigate the interaction and recognition process between the RNA and protein, the free energy contributions resulting from individual amino acids of the protein component of the RNA/protein complex were calculated using the recently developed glycine-scanning method. The results revealed that the loop region of the U2 snRNA hairpin IV interacted mainly with three regions of the U2B″ protein: 1) 1-helix A, 2) 2-3, and 3) 4-helix C. U2 snRNA hairpin IV bound U2B″ in a similar orientation as that previously described for U1 snRNA with the U1A′ protein; however, the details of the interaction differed in several aspects. In particular, 1-helix A and 4-helix C in U2B″ were not observed to interact with RNA in the U1A′ protein complex. Most of the polar and charged residues in the interacting regions had larger mutant free energies than the nonpolar residues, indicating that electrostatic interactions were important for stabilizing the RNA/protein complex. The interaction was further stabilized by a network of hydrogen bonds and salt bridges formed between RNA and protein that was maintained throughout the MD trajectory. In addition to the direct interactions between RNA and the protein, solvent-mediated interactions also contributed significantly to complex stability. A detailed analysis of the ordered water molecules in the hydration of the RNA/protein complex revealed that bridged water molecules reside at the interface of RNA and protein as long as 2100ps in the 2200-ps trajectory. At least 20 bridged water molecules, on average, contributed to the instantaneous stability of the RNA/protein complex. The stabilizing interaction energy due to bridging water molecules was obtained from ab initio Hartree-Fock and density functional theory calculations. Abstract | Full Text | PDF (478 kb) |
| DNA Polymorphism: A Comparison of Force Fields for Nucleic Acids Biophysical Journal, Volume 84, Issue 3, 1 March 2003, Pages 1421-1449 Swarnalatha Y. Reddy, Fabrice Leclerc and Martin Karplus Abstract The improvements of the force fields and the more accurate treatment of long-range interactions are providing more reliable molecular dynamics simulations of nucleic acids. The abilities of certain nucleic acid force fields to represent the structural and conformational properties of nucleic acids in solution are compared. The force fields are AMBER 4.1, BMS, CHARMM22, and CHARMM27; the comparison of the latter two is the primary focus of this paper. The performance of each force field is evaluated first on its ability to reproduce the B-DNA decamer d(CGATTAATCG) in solution with simulations in which the long-range electrostatics were treated by the particle mesh Ewald method; the crystal structure determined by Quintana et al. (1992) is used as the starting point for all simulations. A detailed analysis of the structural and solvation properties shows how well the different force fields can reproduce sequence-specific features. The results are compared with data from experimental and previous theoretical studies. Abstract | Full Text | PDF (2509 kb) |
| Stability of the I-motif Structure Is Related to the Interactions between Phosphodiester Backbones Biophysical Journal, Volume 84, Issue 6, 1 June 2003, Pages 3838-3847 Thérèse E. Malliavin, Jocelyne Gau, Karim Snoussi and Jean-Louis Leroy Abstract The i-motif DNA tetrameric structure is formed of two parallel duplexes intercalated in a head-to-tail orientation, and held together by hemiprotonated cytosine pairs. The four phosphodiester backbones forming the structure define two narrow and wide grooves. The short interphosphate distances across the narrow groove induce a strong repulsion which should destabilize the tetramer. To investigate this point, molecular dynamics simulations were run on the [d(C2)]4 and [d(C4)]4 tetramers in 3′E and 5′E topologies, for which the interaction of the phosphodiester backbones through the narrow groove is different. The analysis of the simulations, using the Molecular Mechanics Generalized Born Solvation Area and Molecular Mechanics Poisson-Boltzmann Solvation Area approaches, shows that it is the van der Waals energy contribution which displays the largest relative difference between the two topologies. The comparison of the solvent-accessible area of each topology reveals that the sugar-sugar interactions account for the greater stability of the 3′E topology. This stresses the importance of the sugar-sugar contacts across the narrow groove which, enforcing the optimal backbone twisting, are essential to the base stacking and the i-motif stability. Tighter interactions between the sugars are observed in the case of N-type sugar puckers. Abstract | Full Text | PDF (275 kb) |
Copyright © 1999 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 77, Issue 3, 1284-1305, 1 September 1999
doi:10.1016/S0006-3495(99)76979-1
Biophysical Theory and Modeling
Address reprint requests to Dr. Lennart Nilsson, Center for Structural Biochemistry, Department of Bioscience at NOVUM, Karolinska Institutet, S-141 57 Huddinge, Sweden. Tel.: +46-8-6089228; Fax: +46-8-6089290.RNA molecules play a central role in a wide range of biological processes, such as storage of genetic material, propagation of genetic information, protein biosynthesis, and enzymatic activity. The large number of RNA structures that have recently been elucidated at high resolution by x-ray crystallography (Holbrook and Kim, 1997) and NMR spectroscopy (Ramos et al) have advanced our understanding of RNA structure and function (Uhlenbeck et al). Usually, RNA molecules perform their functions in tight association with RNA-binding proteins rather than on their own, so RNA-protein interactions are essential to these biological processes involving RNA. As a number of RNA-protein complex structures have been determined (Nagai, 1996,Arnez and Cavarelli, 1997), it is possible to investigate the specific features of RNA-protein interactions at the molecular level. For example, protein-induced RNA conformational changes are common and substantial in RNA-protein interactions, which resemble protein-protein interactions rather than DNA-protein interactions (Draper, 1995,Varani, 1997,Frankel and Smith, 1998).
The most common structural motif in RNA-binding proteins is the ribonucleoprotein (RNP) motif, found in more than 200 proteins (Varani and Nagai, 1998). An RNP motif is composed of 90–100 amino acids, which form an RNA-binding domain (RBD) that is present in one or more copies in proteins to bind pre-mRNA, mRNA, pre-ribosomal RNA (rRNA), and small nuclear RNAs (snRNAs) (Burd and Dreyfuss, 1994,Nagai et al). These small RNP domains share a common βαββαβ sandwich tertiary folding structure (Nagai et al) and are highly conserved, although they bind diverse RNA targets with different affinities and specificities, ranging from picomolar to micromolar. The two central β strands (β3 and β1) of the folded domain, named RNP1 and RNP2, respectively, are identified as the RNA binding sites.
The human U1A protein is a 282-amino acid protein associated with the U1 snRNP (small nuclear RNP), a large RNA-protein complex involved in pre-mRNA splicing (Lührmann et al). It contains two copies of the RNP motif, one at the N-terminus and one at the C-terminus, which are connected by a protease-sensitive polypeptide of ∼100 residues (Sillekens et al). The 102-amino acid N-terminal RNP motif binds specifically to hairpin II of U1 snRNA (Lutz-Freyermuth et al), with binding sites exclusively located in the loop region of the hairpin, whereas the C-terminal RNP does not appear to associate with any RNA (Lu and Hall, 1995). It has been found that U1A protein also binds to the 3′-untranslated region (3′UTR) of its own pre-mRNA, where it can inhibit polyadenylation at the 3′ end and regulate its own translation (Boelens et al). The U1A protein can bind to the two distinct RNA targets with very high affinity and specificity (both with a subnanomolar dissociation constant). The two RNA molecules contain the same AUUGCAC heptanucleotide in the loop region (see Fig. 1), which is believed to be the main site of interaction with the protein. A common recognition strategy has thus been suggested for the two different RNA molecules to bind to the same protein (Jovine et al).
Recently, the structure of a complex between the human U1A protein (N-terminal RNP domain with 98 residues) and a 21-nucleotide RNA hairpin, representing the hairpin II of U1 snRNA, was elucidated by x-ray crystallography at 1.92-Å resolution (Fig. 1; Oubridge et al). Within the structure there are many direct and water-mediated hydrogen bonds and stacking interactions between the RNA bases and aromatic side chains of the protein, revealing the stereochemical basis for sequence-specific RNA recognition by the RNP domain. Another complex of the same protein domain (including 102 residues) with the 3′UTR of U1A pre-mRNA was determined structurally by NMR spectroscopy (Allain et al), which further confirms the structural basis of the RNA binding specificity of U1A protein (Allain et al) and provides us an opportunity to compare the recognition strategy of the two RNA targets. The structure of the free U1A protein (Nagai et al,Avis et al) and free 3′UTR (Gubser and Varani, 1996) has also been elucidated by x-ray crystallography and NMR spectroscopy, allowing us to investigate conformational changes during RNA-protein interactions. The two structures of U1A protein complexed with distinct RNA targets have clarified many important aspects of RNP-RNA recognition. However, they remain static views, and they have raised intriguing questions concerning the molecular basis of specificity; for example, interactions involving the protein variable loops differ significantly (Allain et al). Therefore, it is necessary to study the specific interactions of U1A protein complexed with RNA from a dynamic and thermodynamic point of view.
Hall observed that chemical shifts of many protons from the bound U1 snRNA were substantially different from those of the free RNA, especially in the loop region of the hairpin (Hall, 1994). Allain and Varani studied the free structure of the RNA hairpin, which shows that the hairpin loop is largely unstructured in free RNA apart from the first three bases in the loop, which stack on each other (unpublished results, just mentioned in Oubridge et al). However, no structural data of this hairpin in the free state are available. Therefore, it is also necessary to run a conformational analysis on the RNA hairpin before and after its binding to the protein, to investigate how much the conformation of the RNA is changed by the binding of U1A protein, which could help us to better understand how the RNA performs its function.
Molecular dynamics (MD) simulation is a powerful tool for analyzing the structural and dynamic features of biomacromolecules (Karplus and Petsko, 1990,Tidor, 1997). It provides detailed information about atomic interactions as a function of time, which can both enhance and complement the experimental results. Water molecules, which have been shown to play an important role in both the affinity and specificity of protein-nucleic acid interactions (Schwabe, 1997), are easily incorporated into MD simulations to model the solvent effects. Besides proteins, there are an increasing number of MD simulations focusing on nucleic acids (Auffinger and Westhof, 1998), from which some conformational features of nucleic acids have been obtained (Cheatham and Kollman, 1997). For RNA, much of the simulation effort has been concentrated on UUCG tetraloop (Miller and Kollman, 1997), transfer RNA (Auffinger and Westhof, 1997,Auffinger et al), and hammerhead ribozyme (Hermann et al). Several MD simulations to date have also been performed on DNA-protein complexes (Eriksson et al,Tang and Nilsson, 1998,Nilsson, 1998). However, MD studies of RNA-protein interactions are still at an early stage (Reyes and Kollman, 1999). This is due, in part, to the fact that the structures of only a handful of RNA-protein complexes have been determined at high resolution, and the available RNA-protein complexes are often large and the RNA substrates are structurally more diverse than the DNA (Weeks, 1997), which makes RNA-protein interactions less tractable by simulation.
Pressure and temperature are thermodynamic variables, which may affect the simulation results. In biophysical chemistry, pressure has long been used as an environmental variable for probing the interactions of proteins with ligands and to study conformational equilibria, protein dynamics, and other properties of the native state of proteins. To mimic experimental conditions, the constant pressure and temperature (NPT) supercanonical ensemble (Kitchen et al,Ceccarelli and Marchi, 1997) was used in the present work. We performed an NPT MD simulation on the U1A protein complexed with the hairpin II of U1 snRNA and the free RNA hairpin in solution. The aim of this study is to investigate the dynamic features of the specific U1A-RNA interaction, to recognize the conformational changes of the RNA before and after binding to the protein, and to explore the structural basis of U1A protein sequence-specific binding to the RNA substrate.
The solvated constant-pressure molecular dynamics simulations were performed on a DEC AXP 4100/300E 4CPU parallel computer and a four-node α cluster (∼13h per 10ps for the complex and 6h per 10ps for the free RNA), using the program CHARMM (Brooks et al), version c26a2, and the all-atom version 22 force field (MacKerell et al,MacKerell et al). The TIP3P water model was used to simulate the solvent (Jorgensen et al).
The starting coordinates of the U1A-RNA complex were extracted from the B/Q monomer in the x-ray trimer structure at 1.92-Å resolution (Oubridge et al), PDB entry code 1URN (Bernstein et al), including 769 protein heavy atoms, 436 RNA heavy atoms, and 157 water oxygen atoms. In the x-ray structure, the C-terminus of U1A was poorly ordered beyond residue 96, and residues Met1 and Lys98 were omitted. Two mutations, Tyr31→His and Gln36→Arg, which were not directly involved in RNA recognition, were contained in the protein structure (see Fig. 1; Oubridge et al). Bases U13, C14, C15 and the 3′ end of the RNA chain were also poorly ordered. All of the hydrogen atoms were added by the CHARMM subroutine HBUILD (Brünger and Karplus, 1988). The complex structure together with the 157 crystal water molecules was minimized in vacuo for 1000 steps, using the adopted basis Newton-Raphson (ABNR) method, to remove the unfavorable contacts, keeping harmonic constraints with a force constant of 20.0kcal/(mol·Å2) on heavy atoms of the complex.
The minimized complex structure, together with the 157 solvent water molecules, was then inserted into the center of a water box, keeping any complex atoms at least 10.0Å away from the boundary and leading to a box size of 84×56×52Å3. Water molecules closer than 1.8Å from any solute or crystallographic water atoms were deleted from the water box. Thirteen sodium counterions were added at random positions into the system, at least 3.0Å from any complex atoms, to make the system electroneutral. The final system contained 24,805 atoms, including the 2250 solute atoms. The solvated U1A-RNA complex system was minimized again, first with the steepest descent method for 100 steps, then with the ABNR method for 1000 steps, to adjust the water molecules and counterions locally and eliminate any residual geometrical strain, keeping the heavy atoms of the complex fixed. The minimized solvated system was served as the initial structure of the subsequent molecular dynamics simulation.
After the initial structure was prepared, the molecular dynamics simulation was begun with an initial and equilibration stage, followed by a 400-ps production run, giving a total simulation time of 600ps. All of the complex atoms were released during the whole process. The initial atomic velocities were assigned from a Gaussian distribution corresponding to a temperature of 300K. The nonbonded energies and forces were smoothly shifted to zero at 12.0Å (Steinbach and Brooks, 1994), and a constant dielectric (ϵ=1) was used for electrostatic interactions. The nonbonded list, including neighboring atoms within a 14.0-Å distance, was updated every 20 steps. All hydrogens were treated explicitly; a time step of 2fs was used to integrate the equations of motion with the Leapfrog Verlet algorithm. All bonds involving hydrogens were constrained with the SHAKE algorithm (Ryckaert et al). Coordinates and energies were saved every 100 time steps for further analysis.
In the energy minimization and the molecular dynamics simulation, periodic boundary conditions were applied in all directions. The solvent and counterion images were updated every 20 steps. The NPT ensemble was implemented, using the weak coupling scheme (Berendsen et al), with a pressure coupling time of 5.0ps and a temperature coupling time of 5.0ps. The system temperature was set at 300K, and the reference pressure of the system was set at 1.0atm. The isothermal compressibility was set at 4.63×10−5atm−1, and the value was approximated from experimental data for water.
The starting coordinates of the free RNA hairpin were extracted from the complex, followed by the same pretreatment and molecular dynamics process as mentioned above, with a water box of 64×46×40Å3. Twenty sodium counterions were placed 3.6Å from the phosphorous atom on the OPO bisectors. The whole system contained 11,800 atoms. The molecular dynamics simulation consisted of a 0.6-ns initial stage followed by a 0.6-ns production run, giving a total simulation ime of 1.2ns.
The analyses of the simulations focused on the production stages. Four structures averaged from different short periods of the production stage of the complex, together with the average structure from the whole production run, were used in the analysis of the complex. All of these structures were minimized with the ABNR method for 500 steps with a 20.0kcal/(mol·Å2) harmonic force constant on the heavy atoms, and the water molecules and counterions were deleted. The average conformations of the RNA hairpin free and in complex created from the corresponding production stage, with a similar treatment, were used in the analysis of RNA.
The root mean square deviations (RMSDs) of the selected atoms were calculated from the trajectory at 0.2-ps intervals, with the initial structure as the reference. Average RMSD values of the backbone and the side chains or bases were then calculated for each residue or nucleotide. The root mean square fluctuation (RMSF) of each residue was calculated similarly.
The protein-RNA, protein-solvent, and RNA-solvent nonbonded interaction energies were calculated from the trajectory at 1.0-ps intervals, using CHARMM force field.
The electrostatic potentials of the protein and RNA were calculated from the average dynamics structure of the complex and shown on the solvent-accessible surface, using the program GRASP (Nicholls, 1992).
The solvent-accessible surface areas were calculated from the trajectory at 6.0-ps intervals, using the definition of Lee and Richards, 1971, with a probe of radius 1.4Å. The Lennard-Jones radii values were used for the complex atoms.
The hydrogen bonds were analyzed from the production trajectories with 0.2-ps and 1.0-ps time resolutions. A hydrogen bond (A···H−D) was defined by an A-H distance of less than 2.5Å and an A-H-D angle of more than 120°. The percent occupancy of a hydrogen bond was defined as the number of frames with the hydrogen bond present divided by the number of total analysis frames. The lifetime of a specific incarnation of a hydrogen bond was calculated as the time elapsed from its first appearance until it was first broken. The average lifetime of a hydrogen bond during the simulation was then calculated as the average of all of its incarnations, with the option of excluding those with a lifetime shorter than a time cutoff of 0.1, 1.0, and 5.0ps, separately. The water-mediated hydrogen bonds were analyzed similarly. A water bridge occurred between the protein and RNA atoms, which formed hydrogen bonds with a common water molecule.
The most probable positions for the water molecules and sodium ions around the solute were calculated from the production trajectories at 1.0-ps intervals, by the use of a three-dimensional histogram with a (2Å)3 bin size, by orienting every coordinate set to be superimposed on the average structure of the solute. Only probabilities larger than 40% for the water and 10% for the sodium were considered in the density map.
The potential energy of the system, the temperature, and the RMSD values of the protein and RNA atoms from the initial x-ray structure remained quite stable during the production run (shown in Fig. 2), except that the RMSD of the protein underwent a small increase at 450–500ps. The global RMSD of the complex was below 2.5Å after a 600-ps simulation.
The U1A protein–RNA hairpin nonbonded interaction energies were stable in the simulation (Figure 3A). The van der Waals interaction energy was constant, at about −90kcal/mol. The major component of the total interaction energy was the electrostatic (including hydrogen bonding) interaction energy at about −310kcal/mol, with some fluctuation. In particular, there was a decrease at 450–500ps, indicating that the electrostatic interaction was more favorable during this period. These interaction energies did not include the effects of the solvent.
The origins of the favorable electrostatic interaction energies are clearly illustrated by the electrostatic potential calculated from the average structure of the complex. A strongly positive electrostatic potential region was found on the solvent-accessible surfaces of loops 1 and 3 of the U1A protein, whereas the major groove surface of the RNA double-helical stem was a strongly negative potential region (Fig. 4). The positive and negative potential regions faced each other and matched very well.
The solvent-accessible surface area of the complex increased slightly with time (Figure 3B). In the first 250ps of the production run (200–450ps), the surface area of the complex was constant, but after this it increased a little and remained constant again after 500ps, at a 300Å2 higher level. The increased surface area of the complex came from the protein, and the exposed area of the RNA was constant. However, the buried surface area remained constant, at ∼1000Å2 for each part, during the simulation. The buried surface area was ∼17% for the protein and ∼22% for the RNA.
From the results above, it seemed that something had happened on the U1A protein at 450–500ps. Average structures from four short windows, 270–300, 370–400, 470–500, and 570–600ps, together with the x-ray structure, were superimposed to investigate the changes (Fig. 5). Compared to the x-ray structure of the complex, most of the folded domain of the protein (α-carbon atoms only) overlapped very well, except that the N- and C-terminals were in severe disorder. In particular, there were structural deviations in loop 3 and the end of helix B for the second structure; in loops 1 and 3 and helix B for the third structure; and in loop 1 and the beginning of helix B for the last structure. The RNA hairpin (backbone only) appeared flexible in the loop from U8 to C15, but highly stable in the double-helix stem and A6, U7.
The detailed local motions of the complex structure during the final simulation were reflected quantitatively from the RMSD and RMSF values of each residue in the complex. Fig. 6 shows a uniform mobility for most of the residues in the U1A protein in the production stage, with ∼0.5Å for the RMSD and RMSF values for the backbone atoms, except for the residues at the two termini, especially the N-terminal. The backbone of residues Glu19 to Lys22 of loop 1 had higher RMSD values close to 1.0Å, whereas the RMSF values of these residues remained stable, indicating that these four residues changed their conformations during the simulation. The backbone of residues Ser46 to Lys50 of loop 3 had RMSD values a little higher than the average, but RMSF values very little higher than the average, which demonstrated that these residues had also changed their conformations somewhat. Notably, the long positively charged side chain of Arg52 remained highly stable with very low RMSD and RMSF values, indicative of the important structural role played by Arg52 in the RNA-protein interaction. Interestingly, residues Leu69 and Arg70 of helix B, which had no direct contacts with RNA, also had higher than average RMSD values. The side chain of Arg70 in particular was highly flexible and deviated from the original position.
Figure 6CD, shows that the sugar-phosphate backbone of nucleotides U13 to G16 had RMSD values of ≥1.5Å in the complex, whereas the backbone of nucleotides U8, U13, C14, and U15 had RMSF values of ≥1.0Å, except for the two ends. The bases of U8, U13, and C15 had unusually high RMSD and RMSF values, besides the 3′-end nucleotide U21. C14 also had a high RMSF value, but this was weaker than that of U8, U13, and C15. Contrasted to the high mobility of U8, nucleotides A6, U7, and G9CAC12 in the loop region were quite stable during the simulation, the backbone as well as the bases, which is very similar to the nucleotides within the double-helix stem.
Compared to the RNA in complex, the free RNA hairpin had a quite different conformation for the loop region, from U8 to C15, indicated by the RMSD values much higher than 2.0Å and RMSF higher than 1.5Å (Figure 6CD). In most of these nucleotides, the bases were deviating and fluctuating more than the backbone, especially U13 and C14, the bases of which deviated from the starting structure by more than 14Å. The double-helix stem and the first two nucleotides of the loop remained stable, with RMSDs less than 1.0Å. Nucleotides A1, A2, G16, and U20 had RMSF values greater than 1.5Å for the backbone.
The overall structure of the complex was very similar to the initial x-ray structure, with RMSD values of 0.92Å and 1.68Å for the backbone and side chains of the U1A protein, and 1.35Å and 2.38Å for the backbone and bases of the RNA hairpin, respectively. For the U1A protein in the average structure, the four β-sheets were in an antiparallel arrangement on the surface. Hydrophobic residues in these sheets were packed on each other and formed a hydrophobic core, which linked the four β-sheets together (Figure 7A). Among these residues, only Tyr13, Leu44, and Phe56 were exposed to bind to the RNA. Therefore, the four β-sheets, together with helices A and B, were quite stable and almost completely overlapped with those in the x-ray structure. Loop 3 between β2 and β3 of the protein protruded from the surface and was confronted with the N- and C-terminals. There was a large hydrophilic and charged region on the outside surface of loop 3, loop 1, and part of helix A. Together with loop 5 between β4 and helix C, which is also highly polar, these three polar loops were involved in extensive contacts with the RNA hairpin (Figure 7A).
The RNA hairpin is attached to the surface of the protein in an open conformation. The 10-nucleotide loop of the RNA hairpin encircles loop 3 of the protein and falls into the hydrophobic saddle region between loop 3 and the C-terminal of the protein (Figure 7A). The seven conserved nucleotides, A6UUGCAC12, were splayed out on the protein surface. The bases of A6 and U7 were stacked with the bases of the double-helix stem. They and the base of G9 were located in the inside of the loop, whereas the bases of U8, C10, A11, and C12 were on the outside of the loop. The base plane of C10 was approximately perpendicular to that of A11, whereas the bases of A11 and C12 were stacked on each other. The bases of G9CAC12 were buried in the protein hydrophobic core, but U8 had no obvious contacts with the protein, which was different from the situation in the x-ray structure, although the related three residues Asn16, Lys80, and Arg83 remained stable during the simulation. The last three nucleotides in the loop, UCC, had no obvious contacts with the protein, with bases extending into the solution and the backbone pointing toward the protein. Their bases were partially stacked on each other. The double-helical stem of the hairpin faced toward the protein through the major groove but had no obvious contacts with the surface of the protein. The 5′ end of the stem was very close to several positively charged residues of the protein, such as Lys20, Lys22, and Arg47.
There are several nucleotides contacting with U1A protein at the RNA-protein interface, such as A6, U7, G9, C10, A11, C12, and G16. The residues in U1A protein contacting RNA bases included Tyr13, Asn15, Glu19, Leu44, Arg52, Gln54, Phe56, Gln85, Tyr86, Lys88, Thr89, Asp90, Ser91, and Asp92. These nucleotides and amino acid residues form a variety of electrostatic, hydrophobic, and hydrogen-bonding interactions at their interface.
From Fig. 4 we can see that the positive electrostatic potential on the outside of the U1A protein loops 1 and 3 was just opposed to the negative electrostatic potential on the major groove surface of the RNA hairpin. In the positively charged region of the protein, residues Lys20 and Lys22 formed electrostatic interactions with the phosphate groups of nucleotides A2, U3, and C4 of the RNA (Table 1). Arg47 also formed some electrostatic interactions with these nucleotides, but with low occupancy, and so is not listed in Table 1. The phosphate groups of C10 and G16 formed electrostatic interactions with the side chains of Lys88 and Arg52, respectively. Moreover, we found an electrostatic interaction between the phosphate group of C15 and the side chain of Lys23, which was not observed in the x-ray structure.
The bases of G9, C10, A11, and C12, four of the seven conserved nucleotides in the loop region, were buried in the hydrophobic core (Fig. 7), so a wide range of hydrophobic contacts was found between these nucleotides and the protein (Table 2). The side chain of Gln54 was parallel to the base plane of G9 and underwent a hydrophobic interaction. The aromatic plane of Tyr13 and the long side chain of Lys88 were parallel to the base of C10 on both sides and underwent stacking interactions with the base plane of C10. The aromatic plane of Phe56 was stacked with the base plane of A11. The CD2atom of Leu44 underwent a hydrophobic interaction with the C2atom of A11. One side of the base of C12 was stacked with the base of A11, and on the other side, it had hydrophobic contacts with the side chain of Asp92. In contrast to the x-ray structure, the base of C12 did not undergo stacking interactions with the side chain of Asp92. The sugar rings of G9, C10, and A11 also made partial hydrophobic contacts with Gln54 and Phe56, respectively. The side chain of Leu49 underwent several hydrophobic interactions with both the sugar and base of G16, and it interacted with the base of G6. In addition, the CB atom of Ser48 underwent hydrophobic interactions with the sugar backbone of C15. This was the second interaction related to C15.
| Table 2 Hydrophobic contacts at the RNA-protein interface |
| Nucleotide atoms | Amino acid atoms | Average distance (Å) | RMS fluctuation (Å) | ||
|---|---|---|---|---|---|
| A6 C2 | Leu49 CD1 | 3.93 | 0.32 | ||
| G9 C1′ | Gln54 CD | 4.14 | 0.23 | ||
| G9 C2′ | Gln54 CD | 3.96 | 0.26 | ||
| A11 C1′ | Phe56 CE1 | 3.69 | 0.21 | ||
| A11 C1′ | Phe56 CZ | 3.85 | 0.24 | ||
| A11 C2 | Leu44 CD2 | 4.18 | 0.65 | ||
| C12 C2 | Asp92 CB | 3.89 | 0.22 | ||
| C15 C5′ | Ser48 CB | 4.02 | 0.27 | ||
| G16 C1′ | Leu49 CG | 3.99 | 0.22 | ||
| G16 C1′ | Leu49 CD1 | 3.98 | 0.26 | ||
| G16 C4 | Leu49 CD1 | 4.11 | 0.30 | ||
| G16 C8 | Leu49 CD1 | 3.83 | 0.27 | ||
| G9 base | Gln54 side chain | 3.99 | |||
| C10 base | Tyr13 side chain | 3.74 | |||
| C10 base | Lys88 side chain | 4.06 | |||
| A11 base | Phe56 side chain | 3.92 | |||
Besides hydrophobic contacts, the four buried nucleotides G9CAC12 also formed hydrogen-bonding interactions with the U1A protein (Fig. 7).
A number of direct and water-mediated hydrogen bonds were monitored at the RNA-protein interface during the simulation (Table 3). These hydrogen bonds were formed not only between the backbone of RNA and the backbone of protein, but also between backbone and side chain, bases and backbone, and bases and side chain of RNA and protein, respectively. Eight of the direct hydrogen bonds and five water-mediated ones were formed between RNA bases and protein side chains. In particular, the side chain of residue Glu19 formed three direct and four water-mediated hydrogen bonds with the bases of A6, U7, and G9, which indicated that Glu19 was important to the specificity. Direct hydrogen bonds between atoms A6 N1 and Arg52 HH12, U7 H3 and Glu19 OE1, G9 O6 and Asn16 HN, C10 H41 and Tyr86 O, A11 N1 and Ser91 HG1, C12 H41 and Asp90 O, and G16 N7 and Arg52 HH11 had high occupancies (more than 80%) and were also found in the x-ray structure.
Eleven of the 15 direct hydrogen bonds were consistent with those observed in the x-ray structure. All of these hydrogen bonds were present more than half of the production run time. Four of these with more than 80% occupancy formed between bases of RNA and side chain of protein, namely A6 N1 and Arg52 HH12, U7 H3 and Glu19 OE1, A11 N1 and Ser91 HG1, and G16 N7 and Arg52 HH11, indicating that these four hydrogen-bonding interactions were critical for the specific RNA-protein interaction. Among the 12 water-mediated hydrogen bonds, in which three bridged more than two residues, only two were present in the x-ray structure, possibly because the water-mediated hydrogen bonds were less stable than the direct ones.
In the x-ray structure, it was found that most of the hydrogen bond donors and acceptors of G9CAC12 of the loop were involved in hydrogen bonding interactions with the protein residues. However, in the dynamics simulation some of these hydrogen bonds were present for less than one-third of the simulation, so they were not listed in Table 3. Some of the direct hydrogen bonds became water-mediated. The base of U8 did not form any hydrogen bonds with the protein.
Some hydrogen bonds changed their donors or acceptors. For example, the phosphate group of G16 formed a hydrogen bond with the HN atom of Arg47 in the x-ray structure, but we found that the phosphate group of G16 formed a hydrogen bond with the HN atom of Leu49 with ∼63% occupancy.
Some new hydrogen bonds were found from the simulation. In particular, most of the water-mediated hydrogen bonds were new, some of them with more than 70% occupancy, which suggested that these water-mediated bonds contribute to the specific interactions and should not be ignored. It was noted that the phosphate group of C15 underwent two water-mediated hydrogen-bonding interactions with the side chains of Ser46 and Ser48, and the phosphate group of U13 also interacted with the side chain of Asp92 through a water bridge.
As indicated in Figure 6CD, the loop of the free RNA hairpin was highly flexible and disordered, which was also reflected by the superimposition of the six snapshots taken from the production simulation with an interval of 0.1ns (Figure 8A). The double-helix stem of the RNA, as well as the first two nucleotides in the loop, was superimposed very well. Only the sugar-phosphate backbone of nucleotides A1, U7, G16, and U20 exhibited some fluctuation.
Superimposing the average conformations of the free and bound RNA (Figure 8B) shows that the double-helix stem of the RNA hairpin in the two conformations overlapped quite well. The RMS deviation was just 0.36Å for all the atoms of the 10 nucleotides in the stem. Both of them were close to the canonical A-RNA conformation. However, the conformation of the stem in complex was closer to the canonical A-RNA than that in free RNA, in which the terminal base pair A1U20 was more mobile. The significant differences between the two conformations were located in the loop region, from U8 to C15, as well as the 3′-end U21. Among the 10 loop nucleotides, A6, U7, G9, C10, U13, and C14 were located on the inside of the loop, whereas U8, A11, C12, and C15 were on the outside, so the space enclosed by the loop was small. However, the space enclosed by the loop was enlarged in the complex, in which only A6, U7, and G9 remained on the inside of the loop. Nucleotides U13 and C14 had the most deviations in the complex from the free conformation of the RNA.
In the x-ray structure of the U1A-RNA hairpin complex, only nucleotides U8, C10, and U13 were found to have C2′-endo pucker (U13 was not mentioned in Oubridge et al), and the other ones had C3′-endo pucker. However, this situation changed during the simulations. Besides nucleotides U8, C10, and U13 with C2′-endo pucker in the complex, the sugar puckers of U7 and U21 changed from C3′-endo to C2′-endo spontaneously. The other nucleotides maintained a C3′-endo pucker during the simulation. In the free state, nucleotides C12, C14, C15, and U21 changed their ribose pucker from C3′-endo to C2′-endo spontaneously in the course of the simulation, in addition to U8, C10, and U13 keeping their C2′-endo puckers. However, U7 in the unbound state kept its C3′-endo conformation. Among the 10 nucleotides in the loop region, interestingly, only the three purines, namely A6, G9, and A11, maintained the C3′-endo pucker both in the free state and in the complex, whereas the pyrimidines preferred C2′-endo pucker to C3′-endo.
Table 4 lists all of the hydrogen-bonding interactions occurring within the RNA hairpin in the two different states, calculated from the corresponding production run with 0.2-ps resolution. From Table 4A, we can see that most of the direct hydrogen bonds present in the free conformation were base pairing interactions in the double-helix stem, with four of these base pairs having almost 100% occupancy and a long average lifetime, which indicated that these base pairs were quite stable during the simulation. However, the ending base pair A1U20 only had about half occupancy for the two hydrogen bonds, indicating that this base pair was less well defined during the simulation. Among the four hydrogen bonds involving the 2′-OH group, which were found only in the loop region, three were formed with their own O3′ atoms with less than 50% occupancy and a very short average lifetime, and the fourth, C12 H2′, was formed with the phosphate oxygen atom O2P of the next nucleotide with more than 50% occupancy and a longer average lifetime. In addition, the 5′-end hydrogen A1 H5T formed a hydrogen bond with A2 O1P.
| Table 4 Hydrogen bonds within the RNA hairpin (with occupancy≥33.3%) |
| Free RNA | RNA in complex | |||||
|---|---|---|---|---|---|---|
| Hydrogen bonds | Occupancy (%) | Average lifetime (ps) | Occupancy (%) | Average lifetime (ps) | ||
| (A) Normal direct hydrogen bonds | ||||||
| A1 H5T … A2 O1P | 44.0 | 2.2 | 34.4 | 1.4 | ||
| A1 H61 … U20 O4 | 66.6 | 9.5 | 97.6 | 12.6 | ||
| A1 N1 … U20 H3 | 51.1 | 4.4 | 91.6 | 5.8 | ||
| A2 H61 … U19 O4 | 96.4 | 6.2 | 95.2 | 5.1 | ||
| A2 N1 … U19 H3 | 99.6 | 54.3 | 99.9 | 99.9 | ||
| U3 H3 … A18 N1 | 99.9 | 119.8 | 99.8 | 99.9 | ||
| U3 O4 … A18 H61 | 96.4 | 6.3 | 87.0 | 2.3 | ||
| C4 H41 … G17 O6 | 97.2 | 7.9 | 97.0 | 11.4 | ||
| C4 N3 … G17 H1 | 99.9 | 199.8 | 99.0 | 44.0 | ||
| C4 O2 … G17 H21 | 99.6 | 49.8 | 99.7 | 57.0 | ||
| C5 H41 … G16 O6 | 99.6 | 49.8 | 99.3 | 28.4 | ||
| C5 N3 … G16 H1 | 99.9 | 199.9 | 99.9 | 99.9 | ||
| C5 O2 … G16 H21 | 98.9 | 18.5 | 99.0 | 18.9 | ||
| U7 H2′ … U7 O3′ | 35.4 | 0.3 | 33.4 | 0.4 | ||
| G9 H2′ … G9 O3′ | 45.5 | 0.4 | 44.6 | 0.4 | ||
| A11 H2′ … A11 O3′ | 37.1 | 0.4 | 46.0 | 0.4 | ||
| C12 H2′ … U13 O2P | 58.5 | 3.7 | ||||
| U7 O2 … G9 H22 | 53.2 | 1.1 | ||||
| C12 H2′ … C12 O3′ | 34.2 | 0.3 | ||||
| U13 H2′ … U13 O3′ | 38.0 | 0.4 | ||||
| C15 H2′ … C15 O3′ | 44.0 | 0.4 | ||||
| U19 H2′ … U19 O3′ | 33.8 | 0.3 | ||||
| (B) Unusual direct CH … O hydrogen bonds | ||||||
| A2 H8 … A2 O5′ | 49.6 | 0.5 | 45.2 | 0.5 | ||
| U3 H6 … U3 O5′ | 52.6 | 0.5 | 53.4 | 0.5 | ||
| C4 H6 … C4 O5′ | 48.1 | 0.5 | 54.4 | 0.5 | ||
| C5 H6 … C5 O5′ | 52.9 | 0.5 | 45.3 | 0.4 | ||
| A6 H8 … A6 O5′ | 58.5 | 0.5 | 44.8 | 0.4 | ||
| G17 H8 … G17 O5′ | 59.4 | 0.6 | 56.4 | 0.5 | ||
| A18 H8 … A18 O5′ | 48.2 | 0.5 | 56.9 | 0.5 | ||
| U19 H6 … U19 O5′ | 39.9 | 0.5 | 45.2 | 0.5 | ||
| U7 H6 … U7 O5′ | 63.6 | 0.6 | ||||
| C5 H2” … A6 O4′ | 36.4 | 0.4 | ||||
| C12 H6 … C12 O5′ | 54.6 | 0.5 | ||||
| G16 H8 … G16 O2P | 64.2 | 0.9 | ||||
| (C) Water-mediated hydrogen bonds | ||||||
| A2 O1P … w … U3 O1P | 54.8 | 1.4 | 46.2 | 0.9 | ||
| A2 O2P … w … U3 O1P | 58.9 | 2.7 | 35.0 | 2.4 | ||
| U3 O1P … w … C4 O1P | 46.9 | 0.8 | 51.1 | 0.8 | ||
| U3 O5′ … w … C4 O1P | 35.0 | 0.5 | 46.8 | 0.6 | ||
| C4 O5′ … w … C5 O1P | 46.0 | 0.6 | 45.3 | 0.7 | ||
| C5 O5′ … w … A6 O1P | 56.4 | 0.7 | 41.6 | 0.6 | ||
| C5 O2P … w … A6 O1P | 44.1 | 1.5 | 48.4 | 1.8 | ||
| A6 O2P … w … U7 O1P | 78.8 | 5.6 | 86.4 | 5.6 | ||
| U7 O2P … w … U8 O1P | 64.2 | 2.6 | 55.2 | 4.0 | ||
| U8 O2P … w … G9 O2P | 48.5 | 4.6 | 38.1 | 4.1 | ||
| G9 O1P … w … C10 O1P | 61.4 | 1.9 | 47.2 | 1.3 | ||
| A11 O5′ … w … C12 O1P | 60.3 | 1.4 | 63.8 | 0.8 | ||
| C12 O2P … w … U13 O2′ | 34.4 | 1.7 | 77.7 | 5.4 | ||
| G16 N7 … w … G17 N7 | 49.1 | 1.2 | 45.5 | 0.8 | ||
| G17 N7 … w … A18 H62 | 42.0 | 0.6 | 50.4 | 0.7 | ||
| G17 O2P … w … A18 O1P | 49.0 | 2.4 | 56.8 | 2.5 | ||
| A18 O2P … w … U19 O1P | 52.1 | 2.9 | 55.0 | 2.2 | ||
| U3 O2P … w … C4 O1P | 41.3 | 1.7 | ||||
| C5 O1P … w … A6 O1P | 46.8 | 0.6 | ||||
| G9 O1P … w … C10 H42 | 44.8 | 0.8 | ||||
| G9 H22 … w … C10 O4′ | 38.3 | 0.6 | ||||
| C10 O2P … w … C12 O1P | 50.6 | 3.1 | ||||
| C10 O2P … w … C12 O2P | 40.3 | 3.5 | ||||
| C12 O1P … w … U13 O2′ | 43.7 | 3.1 | ||||
| C12 O2P … w … C14 O1P | 82.9 | 5.1 | ||||
| C15 O1P … w … G16 O1P | 45.6 | 6.2 | ||||
| A6 O1P … w … U7 O1P | 35.2 | 0.8 | ||||
| U8 O3′ … w … G9 N7 | 39.0 | 0.8 | ||||
| C10 O2 … w … A11 N7 | 74.2 | 1.0 | ||||
| C10 O2P … w … A11 O2P | 87.3 | 8.3 | ||||
| A11 N3 … w … A11 O2′ | 40.6 | 0.6 | ||||
| C12 O1P … w … U13 O2P | 52.5 | 1.4 | ||||
| C12 O2P … w … U13 O2P | 55.4 | 1.9 | ||||
| A18 O5′ … w … U19 O1P | 34.4 | 0.6 | ||||
| U19 O2P … w … U20 O1P | 49.4 | 1.8 | ||||
| U20 O2P … w … U21 O4′ | 35.8 | 1.3 | ||||
| Time cutoff 0.1ps, resolution 0.2ps. |
In contrast to these, more hydrogen bonds were present within the RNA in the complex. Almost all of the hydrogen-bonding interactions between the five base pairs were 100% occupied, including the first one, A1U20. Only the occupancy of hydrogen bond between U3 O4 and A18 H61 was slightly lower. In addition to the three hydrogen bonds between the 2′-OH group and the O3′ atom of the same nucleotide present in the free state, four more such interactions appeared in the three loop nucleotides C12, U13, and C15, and in U19, with about one-third occupancy. Another hydrogen bond was found to connect the bases of U7 and G9. The hydrogen bond connecting the backbone of G12 and U13 in the free state disappeared in the complex. Besides these hydrogen bonds within the RNA, a large number of hydrogen bonds formed between the RNA and the U1A protein have been described above.
In contrast to the normal O(N)-H···O(N) hydrogen-bonding interactions, a number of unusual C-H···O hydrogen bonds (Wahl and Sundaralingam, 1997) were found in the RNA hairpin (Table 4B). Most of these hydrogen bonds were present between the base H6 or H8atom and the backbone O5′ atom within the same nucleotide with a C3′-endo sugar pucker. Most of these hydrogen bonds were formed in the double-helix stem, such as A2, U3, C4, C5, G17, A18, and U19, both in the free state and in complex, with about half occupancy and a very short average lifetime. In the complex, G16 had a C-H···O hydrogen bond between the H8atom and O2P instead of the O5′ atom of the backbone, with higher occupancy than the others, showing that the phosphate group of G16 was a little different from the others in the stem. Only a few C-H···O hydrogen bonds were present in the loop nucleotides, except A6 and U7 in the free conformation and A6 in the complex, which were extensions of the double-helix stem. Nevertheless, a C-H···O hydrogen bond was found in C12 of the loop in the complex. Another strange C-H···O hydrogen bond was found to connect the backbones of C5 and A6 in the complex, with about one-third occupancy.
The water-mediated hydrogen bonds were also analyzed from the production run (Table 4C). From Table 4C, we see that in both conformations most of the adjacent phosphate groups were bridged to form hydrogen bonds through water molecules with about half occupancy, especially in the double-helix stem. The 2′-OH group of U13 and the phosphate group of C12, the base of G17 and bases of G16 and A18 formed water-mediated hydrogen bonds, but no bridging water was seen between the backbones of G16 and G17. Nevertheless, differences were obvious between the two states, particularly in the loop region. In the free state, the backbone and base of G9 formed water-mediated hydrogen bonds with the base and backbone of C10, respectively. The phosphate groups of C10 and C12, and C12 and C14 also formed water-mediated hydrogen-bonding interactions. However, these interactions disappeared in the complex, and instead the sugar of U8 and base of G9, bases of C10 and A11, phosphate groups of C10 and A11, C12 and U13 formed water-mediated hydrogen bonds. No water-mediated C-H···O hydrogen bonds were found during the simulations.
Besides the hydrogen-bonding interactions, there were also some base stacking interactions within the RNA hairpin. In addition to the base stacking interactions in the double-helix stem, the bases of A6 and U7, G9 and C10, A11 and C12, and U13 and C14 were stacked separately on each other in pairs in the free state. In the complex, this situation was changed. A6 and U7 kept the stacking interaction together with the stem, the bases of C10, A11, and C12 underwent partially stacking interactions, and U13, C14, and C15 were also partially stacked. More importantly, the bases of G9CAC12 were buried in the hydrophobic core of the protein and formed stacking interactions with the protein separately.
The distributions of water molecules and counterions around the complex during the simulation are shown in Figure 9A, according to the probability density map. Figure 9A illustrates that most of the water molecules with large probability were distributed uniformly around the folded domain of the protein and part of the loop of the RNA, whereas the water molecules around the two terminals of the protein and the RNA double-helix stem were more randomly distributed with low probability density. During the simulation the most probable positions of the sodium ions were close to the major groove of the RNA stem and outside of the protein C-terminal helix. The former region was the negative electrostatic region, and the latter one had two negative residues, Asp90 and Asp92.
For the free RNA hairpin, the sodium counterions were initially placed 3.6Å from the phosphorous atoms. In the course of time, the counterions gradually moved away from the RNA atoms. In particular, four ions were detained in the major groove of the hairpin stem and stayed within 6.0Å of the phosphate groups, as seen from the most probable distribution region of the ions shown in Figure 9B. The probability densities of the counterions were higher (up to 45%) in the free state than in the complex (<30%). Figure 9B also shows the probability density map of water molecules around the free RNA, in which water molecules with a probability of more than 40% were distributed around the hairpin stem.
In both states of the RNA hairpin, most of the hydrogen-bonding donors and acceptors formed hydrogen bonds with water molecules during the simulations, indicating the high degree of hydration of the RNA hairpin. The water-mediated hydrogen-bonding interactions (listed in Table 4C) were only a part of the hydration. From the production runs, we found that almost all of the phosphate group oxygen atoms were hydrated with three water molecules in both states, except that in the complex O1P of C15 and both O1P and O2P of G16 formed no more than two hydrogen bonds to water molecules. For the sugar and base atoms, the hydration degrees are listed in Table 5, in which almost all atoms in the free RNA have at least one hydrogen bond to water 50% of the time. Apart from a few differences, most of the atoms in the double-helix stem had very similar hydrations in both states. All O2′ atoms had occupancies of ∼140%; that is, they were hydrated by about one or two water molecules. On the contrary, for the hairpin loop atoms, the bases of G9, C10, A11, and C12 had high occupancy for the free RNA, but low occupancy in the complex, because of the hydrophobic interactions with the protein.
| Table 5 Hydrogen-bonding occupancy (≥50%) with water for each donor or acceptor atom of RNA sugar and base |
| Occupancy | Occupancy | Occupancy | |||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Nucleotide | Atom | Free | Complex | Nucleotide | Atom | Free | Complex | Nucleotide | Atom | Free | Complex | ||
| A1 | O5′ | 146.3 | 79.6 | G9 | O2′ | 129.4 | 114.8 | C15 | O4′ | 99.9 | 54.4 | ||
| O2′ | 134.7 | 36.8 | N3 | 81.2 | 70.4 | O2′ | 145.8 | 140.1 | |||||
| H2′ | 52.4 | 54.4 | H1 | 92.6 | 89.3 | O2 | 205.9 | 200.6 | |||||
| N7 | 92.9 | 93.9 | H21 | 84.6 | N3 | 113.8 | 115.8 | ||||||
| N1 | 77.1 | H22 | 85.4 | H41 | 66.6 | 66.8 | |||||||
| N3 | 76.7 | 72.6 | O6 | 141.3 | 64.0 | H42 | 63.2 | ||||||
| H62 | 79.1 | 80.2 | N7 | 171.7 | 91.0 | G16 | O5′ | 68.7 | 84.0 | ||||
| A2 | O4′ | 52.3 | C10 | O4′ | 66.5 | O4′ | 62.7 | ||||||
| O2′ | 132.3 | 136.6 | O2′ | 148.2 | 51.0 | O2′ | 143.1 | 140.4 | |||||
| N7 | 64.4 | 72.3 | H2′ | 63.9 | H2′ | 53.8 | 56.1 | ||||||
| N3 | 84.1 | 78.0 | O2 | 188.3 | 95.8 | N3 | 77.1 | 79.8 | |||||
| H62 | 77.5 | 84.2 | N3 | 98.9 | H22 | 79.9 | 79.6 | ||||||
| U3 | O4′ | 58.9 | 54.6 | H41 | 69.5 | O6 | 91.1 | 73.8 | |||||