Article Outline

Article Information

PubMed

Related Articles

  • …more

Copyright © 1999 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 77, Issue 3, 1284-1305, 1 September 1999

doi:10.1016/S0006-3495(99)76979-1

Biophysical Theory and Modeling

Molecular Dynamics Simulations of the Complex between Human U1A Protein and Hairpin II of U1 Small Nuclear RNA and of Free RNA in Solution

Yun Tang and Lennart NilssonGo To Corresponding Author 

Center for Structural Biochemistry, Department of Bioscience at Novum, Karolinska Institutet, S-141 57 Huddinge, Sweden

Address reprint requests to Dr. Lennart Nilsson, Center for Structural Biochemistry, Department of Bioscience at NOVUM, Karolinska Institutet, S-141 57 Huddinge, Sweden. Tel.: +46-8-6089228; Fax: +46-8-6089290.

Abstract

RNA-protein interactions are essential to a wide range of biological processes. In this paper, a 0.6-ns molecular dynamics simulation of the sequence-specific interaction of human U1A protein with hairpin II of U1 snRNA in solution, together with a 1.2-ns simulation of the free RNA hairpin, is reported. Compared to the findings in the x-ray structure of the complex, most of the interactions remained stable. The nucleotide U8, one of the seven conserved nucleotides AUUGCAC in the loop region, was unusually flexible during the simulation, leading to a loss of direct contacts with the protein, in contrast to the situation in the x-ray structure. Instead the sugar-phosphate backbone of nucleotide C15 was found to form several interactions with the protein. Compared to the NMR structure of U1A protein complexed with the 3′-untranslated region of its own pre-mRNA, the protein core kept the same conformation, and in the two RNA molecules the conserved AUUGCAC of the loop and the closest CG base pair were located in very similar positions and orientations, and underwent very similar interactions with the protein. Therefore, a common sequence-specific interaction mechanism was suggested for the two RNA substrates to bind to the U1A protein. Conformational analysis of the RNA hairpin showed that the conformational changes of the RNA primarily occurred in the loop region, which is just involved in the sites of binding to the protein and in agreement with experimental observation. Both the loop and stem of the RNA became more ordered upon binding to the protein. It was also demonstrated that the molecular dynamics method could be successfully used to simulate the dynamical behavior of a large RNA-protein complex in aqueous solution, thus opening a path for the exploration of the complex biological processes involving RNA at a molecular level.

Introduction

RNA molecules play a central role in a wide range of biological processes, such as storage of genetic material, propagation of genetic information, protein biosynthesis, and enzymatic activity. The large number of RNA structures that have recently been elucidated at high resolution by x-ray crystallography (Holbrook and Kim, 1997) and NMR spectroscopy (Ramos et al) have advanced our understanding of RNA structure and function (Uhlenbeck et al). Usually, RNA molecules perform their functions in tight association with RNA-binding proteins rather than on their own, so RNA-protein interactions are essential to these biological processes involving RNA. As a number of RNA-protein complex structures have been determined (Nagai, 1996,Arnez and Cavarelli, 1997), it is possible to investigate the specific features of RNA-protein interactions at the molecular level. For example, protein-induced RNA conformational changes are common and substantial in RNA-protein interactions, which resemble protein-protein interactions rather than DNA-protein interactions (Draper, 1995,Varani, 1997,Frankel and Smith, 1998).

The most common structural motif in RNA-binding proteins is the ribonucleoprotein (RNP) motif, found in more than 200 proteins (Varani and Nagai, 1998). An RNP motif is composed of 90–100 amino acids, which form an RNA-binding domain (RBD) that is present in one or more copies in proteins to bind pre-mRNA, mRNA, pre-ribosomal RNA (rRNA), and small nuclear RNAs (snRNAs) (Burd and Dreyfuss, 1994,Nagai et al). These small RNP domains share a common βαββαβ sandwich tertiary folding structure (Nagai et al) and are highly conserved, although they bind diverse RNA targets with different affinities and specificities, ranging from picomolar to micromolar. The two central β strands (β3 and β1) of the folded domain, named RNP1 and RNP2, respectively, are identified as the RNA binding sites.

The human U1A protein is a 282-amino acid protein associated with the U1 snRNP (small nuclear RNP), a large RNA-protein complex involved in pre-mRNA splicing (Lührmann et al). It contains two copies of the RNP motif, one at the N-terminus and one at the C-terminus, which are connected by a protease-sensitive polypeptide of ∼100 residues (Sillekens et al). The 102-amino acid N-terminal RNP motif binds specifically to hairpin II of U1 snRNA (Lutz-Freyermuth et al), with binding sites exclusively located in the loop region of the hairpin, whereas the C-terminal RNP does not appear to associate with any RNA (Lu and Hall, 1995). It has been found that U1A protein also binds to the 3′-untranslated region (3′UTR) of its own pre-mRNA, where it can inhibit polyadenylation at the 3′ end and regulate its own translation (Boelens et al). The U1A protein can bind to the two distinct RNA targets with very high affinity and specificity (both with a subnanomolar dissociation constant). The two RNA molecules contain the same AUUGCAC heptanucleotide in the loop region (see Fig. 1), which is believed to be the main site of interaction with the protein. A common recognition strategy has thus been suggested for the two different RNA molecules to bind to the same protein (Jovine et al).

Display large version of this figure
Figure 1
(A) Schematic representation of the x-ray structure of the U1A-RNA hairpin complex. The U1A protein is shown in ribbon, and the RNA is shown with a ball-and-stick model. This figure, together with Figure 5ABBC and Figure 7ABBC and Figure 8ABBC and Figure 9ABBC and Figure 10ABBC and Figure 11ABBC, was generated with the program MOLSCRIPT (Kraulis, 1991). (B) Sequence and number of related protein and RNA segments: (a) U1A protein; (b) the synthetic RNA hairpin used in the x-ray structure; (c) the hairpin II of U1 snRNA; (d) the 3′-untranslated region of U1A pre-mRNA (box 2) used in the NMR complex structure.

Recently, the structure of a complex between the human U1A protein (N-terminal RNP domain with 98 residues) and a 21-nucleotide RNA hairpin, representing the hairpin II of U1 snRNA, was elucidated by x-ray crystallography at 1.92-Å resolution (Fig. 1; Oubridge et al). Within the structure there are many direct and water-mediated hydrogen bonds and stacking interactions between the RNA bases and aromatic side chains of the protein, revealing the stereochemical basis for sequence-specific RNA recognition by the RNP domain. Another complex of the same protein domain (including 102 residues) with the 3′UTR of U1A pre-mRNA was determined structurally by NMR spectroscopy (Allain et al), which further confirms the structural basis of the RNA binding specificity of U1A protein (Allain et al) and provides us an opportunity to compare the recognition strategy of the two RNA targets. The structure of the free U1A protein (Nagai et al,Avis et al) and free 3′UTR (Gubser and Varani, 1996) has also been elucidated by x-ray crystallography and NMR spectroscopy, allowing us to investigate conformational changes during RNA-protein interactions. The two structures of U1A protein complexed with distinct RNA targets have clarified many important aspects of RNP-RNA recognition. However, they remain static views, and they have raised intriguing questions concerning the molecular basis of specificity; for example, interactions involving the protein variable loops differ significantly (Allain et al). Therefore, it is necessary to study the specific interactions of U1A protein complexed with RNA from a dynamic and thermodynamic point of view.

Hall observed that chemical shifts of many protons from the bound U1 snRNA were substantially different from those of the free RNA, especially in the loop region of the hairpin (Hall, 1994). Allain and Varani studied the free structure of the RNA hairpin, which shows that the hairpin loop is largely unstructured in free RNA apart from the first three bases in the loop, which stack on each other (unpublished results, just mentioned in Oubridge et al). However, no structural data of this hairpin in the free state are available. Therefore, it is also necessary to run a conformational analysis on the RNA hairpin before and after its binding to the protein, to investigate how much the conformation of the RNA is changed by the binding of U1A protein, which could help us to better understand how the RNA performs its function.

Molecular dynamics (MD) simulation is a powerful tool for analyzing the structural and dynamic features of biomacromolecules (Karplus and Petsko, 1990,Tidor, 1997). It provides detailed information about atomic interactions as a function of time, which can both enhance and complement the experimental results. Water molecules, which have been shown to play an important role in both the affinity and specificity of protein-nucleic acid interactions (Schwabe, 1997), are easily incorporated into MD simulations to model the solvent effects. Besides proteins, there are an increasing number of MD simulations focusing on nucleic acids (Auffinger and Westhof, 1998), from which some conformational features of nucleic acids have been obtained (Cheatham and Kollman, 1997). For RNA, much of the simulation effort has been concentrated on UUCG tetraloop (Miller and Kollman, 1997), transfer RNA (Auffinger and Westhof, 1997,Auffinger et al), and hammerhead ribozyme (Hermann et al). Several MD simulations to date have also been performed on DNA-protein complexes (Eriksson et al,Tang and Nilsson, 1998,Nilsson, 1998). However, MD studies of RNA-protein interactions are still at an early stage (Reyes and Kollman, 1999). This is due, in part, to the fact that the structures of only a handful of RNA-protein complexes have been determined at high resolution, and the available RNA-protein complexes are often large and the RNA substrates are structurally more diverse than the DNA (Weeks, 1997), which makes RNA-protein interactions less tractable by simulation.

Pressure and temperature are thermodynamic variables, which may affect the simulation results. In biophysical chemistry, pressure has long been used as an environmental variable for probing the interactions of proteins with ligands and to study conformational equilibria, protein dynamics, and other properties of the native state of proteins. To mimic experimental conditions, the constant pressure and temperature (NPT) supercanonical ensemble (Kitchen et al,Ceccarelli and Marchi, 1997) was used in the present work. We performed an NPT MD simulation on the U1A protein complexed with the hairpin II of U1 snRNA and the free RNA hairpin in solution. The aim of this study is to investigate the dynamic features of the specific U1A-RNA interaction, to recognize the conformational changes of the RNA before and after binding to the protein, and to explore the structural basis of U1A protein sequence-specific binding to the RNA substrate.


Methods

The solvated constant-pressure molecular dynamics simulations were performed on a DEC AXP 4100/300E 4CPU parallel computer and a four-node α cluster (∼13h per 10ps for the complex and 6h per 10ps for the free RNA), using the program CHARMM (Brooks et al), version c26a2, and the all-atom version 22 force field (MacKerell et al,MacKerell et al). The TIP3P water model was used to simulate the solvent (Jorgensen et al).

Simulation details

The starting coordinates of the U1A-RNA complex were extracted from the B/Q monomer in the x-ray trimer structure at 1.92-Å resolution (Oubridge et al), PDB entry code 1URN (Bernstein et al), including 769 protein heavy atoms, 436 RNA heavy atoms, and 157 water oxygen atoms. In the x-ray structure, the C-terminus of U1A was poorly ordered beyond residue 96, and residues Met1 and Lys98 were omitted. Two mutations, Tyr31→His and Gln36→Arg, which were not directly involved in RNA recognition, were contained in the protein structure (see Fig. 1; Oubridge et al). Bases U13, C14, C15 and the 3′ end of the RNA chain were also poorly ordered. All of the hydrogen atoms were added by the CHARMM subroutine HBUILD (Brünger and Karplus, 1988). The complex structure together with the 157 crystal water molecules was minimized in vacuo for 1000 steps, using the adopted basis Newton-Raphson (ABNR) method, to remove the unfavorable contacts, keeping harmonic constraints with a force constant of 20.0kcal/(mol·Å2) on heavy atoms of the complex.

The minimized complex structure, together with the 157 solvent water molecules, was then inserted into the center of a water box, keeping any complex atoms at least 10.0Å away from the boundary and leading to a box size of 84×56×52Å3. Water molecules closer than 1.8Å from any solute or crystallographic water atoms were deleted from the water box. Thirteen sodium counterions were added at random positions into the system, at least 3.0Å from any complex atoms, to make the system electroneutral. The final system contained 24,805 atoms, including the 2250 solute atoms. The solvated U1A-RNA complex system was minimized again, first with the steepest descent method for 100 steps, then with the ABNR method for 1000 steps, to adjust the water molecules and counterions locally and eliminate any residual geometrical strain, keeping the heavy atoms of the complex fixed. The minimized solvated system was served as the initial structure of the subsequent molecular dynamics simulation.

After the initial structure was prepared, the molecular dynamics simulation was begun with an initial and equilibration stage, followed by a 400-ps production run, giving a total simulation time of 600ps. All of the complex atoms were released during the whole process. The initial atomic velocities were assigned from a Gaussian distribution corresponding to a temperature of 300K. The nonbonded energies and forces were smoothly shifted to zero at 12.0Å (Steinbach and Brooks, 1994), and a constant dielectric (ϵ=1) was used for electrostatic interactions. The nonbonded list, including neighboring atoms within a 14.0-Å distance, was updated every 20 steps. All hydrogens were treated explicitly; a time step of 2fs was used to integrate the equations of motion with the Leapfrog Verlet algorithm. All bonds involving hydrogens were constrained with the SHAKE algorithm (Ryckaert et al). Coordinates and energies were saved every 100 time steps for further analysis.

In the energy minimization and the molecular dynamics simulation, periodic boundary conditions were applied in all directions. The solvent and counterion images were updated every 20 steps. The NPT ensemble was implemented, using the weak coupling scheme (Berendsen et al), with a pressure coupling time of 5.0ps and a temperature coupling time of 5.0ps. The system temperature was set at 300K, and the reference pressure of the system was set at 1.0atm. The isothermal compressibility was set at 4.63×10−5atm−1, and the value was approximated from experimental data for water.

The starting coordinates of the free RNA hairpin were extracted from the complex, followed by the same pretreatment and molecular dynamics process as mentioned above, with a water box of 64×46×40Å3. Twenty sodium counterions were placed 3.6Å from the phosphorous atom on the OPO bisectors. The whole system contained 11,800 atoms. The molecular dynamics simulation consisted of a 0.6-ns initial stage followed by a 0.6-ns production run, giving a total simulation ime of 1.2ns.


Analysis of the simulation

The analyses of the simulations focused on the production stages. Four structures averaged from different short periods of the production stage of the complex, together with the average structure from the whole production run, were used in the analysis of the complex. All of these structures were minimized with the ABNR method for 500 steps with a 20.0kcal/(mol·Å2) harmonic force constant on the heavy atoms, and the water molecules and counterions were deleted. The average conformations of the RNA hairpin free and in complex created from the corresponding production stage, with a similar treatment, were used in the analysis of RNA.

The root mean square deviations (RMSDs) of the selected atoms were calculated from the trajectory at 0.2-ps intervals, with the initial structure as the reference. Average RMSD values of the backbone and the side chains or bases were then calculated for each residue or nucleotide. The root mean square fluctuation (RMSF) of each residue was calculated similarly.

The protein-RNA, protein-solvent, and RNA-solvent nonbonded interaction energies were calculated from the trajectory at 1.0-ps intervals, using CHARMM force field.

The electrostatic potentials of the protein and RNA were calculated from the average dynamics structure of the complex and shown on the solvent-accessible surface, using the program GRASP (Nicholls, 1992).

The solvent-accessible surface areas were calculated from the trajectory at 6.0-ps intervals, using the definition of Lee and Richards, 1971, with a probe of radius 1.4Å. The Lennard-Jones radii values were used for the complex atoms.

The hydrogen bonds were analyzed from the production trajectories with 0.2-ps and 1.0-ps time resolutions. A hydrogen bond (A···H−D) was defined by an A-H distance of less than 2.5Å and an A-H-D angle of more than 120°. The percent occupancy of a hydrogen bond was defined as the number of frames with the hydrogen bond present divided by the number of total analysis frames. The lifetime of a specific incarnation of a hydrogen bond was calculated as the time elapsed from its first appearance until it was first broken. The average lifetime of a hydrogen bond during the simulation was then calculated as the average of all of its incarnations, with the option of excluding those with a lifetime shorter than a time cutoff of 0.1, 1.0, and 5.0ps, separately. The water-mediated hydrogen bonds were analyzed similarly. A water bridge occurred between the protein and RNA atoms, which formed hydrogen bonds with a common water molecule.

The most probable positions for the water molecules and sodium ions around the solute were calculated from the production trajectories at 1.0-ps intervals, by the use of a three-dimensional histogram with a (2Å)3 bin size, by orienting every coordinate set to be superimposed on the average structure of the solute. Only probabilities larger than 40% for the water and 10% for the sodium were considered in the density map.



Results

The potential energy of the system, the temperature, and the RMSD values of the protein and RNA atoms from the initial x-ray structure remained quite stable during the production run (shown in Fig. 2), except that the RMSD of the protein underwent a small increase at 450–500ps. The global RMSD of the complex was below 2.5Å after a 600-ps simulation.

Display large version of this figure
Figure 2
Time evolution of the potential energy of the system (solid line) and temperature (dotted line) of (A) the complex and (C) the free RNA. Time evolution of the root mean square (RMS) deviations of (B) U1A protein (solid line) and RNA hairpin (dotted line) from the initial structure, and (D) the RNA free (solid line) and in complex (dotted line).

Interaction energies and solvent-accessible surface

The U1A protein–RNA hairpin nonbonded interaction energies were stable in the simulation (Figure 3A). The van der Waals interaction energy was constant, at about −90kcal/mol. The major component of the total interaction energy was the electrostatic (including hydrogen bonding) interaction energy at about −310kcal/mol, with some fluctuation. In particular, there was a decrease at 450–500ps, indicating that the electrostatic interaction was more favorable during this period. These interaction energies did not include the effects of the solvent.

Display large version of this figure
Figure 3
(A) Time-dependent RNA-protein nonbonded interaction energies. Solid line: total energy; dashed line: van der Waals energy; dotted line: electrostatic energy. (B) Time-dependent solvent-accessible surface areas of the complex. Thick solid line: whole complex; dashed line: protein; dotted line: RNA; thin solid line: buried area.

The origins of the favorable electrostatic interaction energies are clearly illustrated by the electrostatic potential calculated from the average structure of the complex. A strongly positive electrostatic potential region was found on the solvent-accessible surfaces of loops 1 and 3 of the U1A protein, whereas the major groove surface of the RNA double-helical stem was a strongly negative potential region (Fig. 4). The positive and negative potential regions faced each other and matched very well.

Display large version of this figure
Figure 4
Electrostatic potentials of (A) the U1A protein and (B) the RNA hairpin are shown on the solvent-accessible surface, produced with the program GRASP (Nicholls, 1992). The surface charge distribution is color coded as follows: dark blue, positive (∼8.5kT); red, negative (∼−6.5kT); white, neutral. k is the Boltzmann constant and T is the temperature.

The solvent-accessible surface area of the complex increased slightly with time (Figure 3B). In the first 250ps of the production run (200–450ps), the surface area of the complex was constant, but after this it increased a little and remained constant again after 500ps, at a 300Å2 higher level. The increased surface area of the complex came from the protein, and the exposed area of the RNA was constant. However, the buried surface area remained constant, at ∼1000Å2 for each part, during the simulation. The buried surface area was ∼17% for the protein and ∼22% for the RNA.


Deviations and fluctuations

From the results above, it seemed that something had happened on the U1A protein at 450–500ps. Average structures from four short windows, 270–300, 370–400, 470–500, and 570–600ps, together with the x-ray structure, were superimposed to investigate the changes (Fig. 5). Compared to the x-ray structure of the complex, most of the folded domain of the protein (α-carbon atoms only) overlapped very well, except that the N- and C-terminals were in severe disorder. In particular, there were structural deviations in loop 3 and the end of helix B for the second structure; in loops 1 and 3 and helix B for the third structure; and in loop 1 and the beginning of helix B for the last structure. The RNA hairpin (backbone only) appeared flexible in the loop from U8 to C15, but highly stable in the double-helix stem and A6, U7.

Display large version of this figure
Figure 5
Stereo view of the superposition of four structures of (A) U1A protein and (B) RNA hairpin averaged from 270–300ps (red), 370–400ps (green), 470–500ps (blue), and 570–600ps (purple), together with the x-ray structure (black). (C) The backbone RMS deviations of the four average protein structures, compared with the x-ray structure.

The detailed local motions of the complex structure during the final simulation were reflected quantitatively from the RMSD and RMSF values of each residue in the complex. Fig. 6 shows a uniform mobility for most of the residues in the U1A protein in the production stage, with ∼0.5Å for the RMSD and RMSF values for the backbone atoms, except for the residues at the two termini, especially the N-terminal. The backbone of residues Glu19 to Lys22 of loop 1 had higher RMSD values close to 1.0Å, whereas the RMSF values of these residues remained stable, indicating that these four residues changed their conformations during the simulation. The backbone of residues Ser46 to Lys50 of loop 3 had RMSD values a little higher than the average, but RMSF values very little higher than the average, which demonstrated that these residues had also changed their conformations somewhat. Notably, the long positively charged side chain of Arg52 remained highly stable with very low RMSD and RMSF values, indicative of the important structural role played by Arg52 in the RNA-protein interaction. Interestingly, residues Leu69 and Arg70 of helix B, which had no direct contacts with RNA, also had higher than average RMSD values. The side chain of Arg70 in particular was highly flexible and deviated from the original position.

Display large version of this figure
Figure 6
The RMS deviations of (A) each residue in the U1A protein and (C) each nucleotide in the RNA hairpin, as well as the RMS fluctuations of (B) each residue in the U1A protein and (D) each nucleotide in the RNA hairpin from the production run, compared with the initial structure of the complex. ——, Backbone; ·····, side chains or bases. —●—, backbone of free RNA; ···○···, bases of free RNA.

Figure 6CD, shows that the sugar-phosphate backbone of nucleotides U13 to G16 had RMSD values of ≥1.5Å in the complex, whereas the backbone of nucleotides U8, U13, C14, and U15 had RMSF values of ≥1.0Å, except for the two ends. The bases of U8, U13, and C15 had unusually high RMSD and RMSF values, besides the 3′-end nucleotide U21. C14 also had a high RMSF value, but this was weaker than that of U8, U13, and C15. Contrasted to the high mobility of U8, nucleotides A6, U7, and G9CAC12 in the loop region were quite stable during the simulation, the backbone as well as the bases, which is very similar to the nucleotides within the double-helix stem.

Compared to the RNA in complex, the free RNA hairpin had a quite different conformation for the loop region, from U8 to C15, indicated by the RMSD values much higher than 2.0Å and RMSF higher than 1.5Å (Figure 6CD). In most of these nucleotides, the bases were deviating and fluctuating more than the backbone, especially U13 and C14, the bases of which deviated from the starting structure by more than 14Å. The double-helix stem and the first two nucleotides of the loop remained stable, with RMSDs less than 1.0Å. Nucleotides A1, A2, G16, and U20 had RMSF values greater than 1.5Å for the backbone.


Overall structure of the complex

The overall structure of the complex was very similar to the initial x-ray structure, with RMSD values of 0.92Å and 1.68Å for the backbone and side chains of the U1A protein, and 1.35Å and 2.38Å for the backbone and bases of the RNA hairpin, respectively. For the U1A protein in the average structure, the four β-sheets were in an antiparallel arrangement on the surface. Hydrophobic residues in these sheets were packed on each other and formed a hydrophobic core, which linked the four β-sheets together (Figure 7A). Among these residues, only Tyr13, Leu44, and Phe56 were exposed to bind to the RNA. Therefore, the four β-sheets, together with helices A and B, were quite stable and almost completely overlapped with those in the x-ray structure. Loop 3 between β2 and β3 of the protein protruded from the surface and was confronted with the N- and C-terminals. There was a large hydrophilic and charged region on the outside surface of loop 3, loop 1, and part of helix A. Together with loop 5 between β4 and helix C, which is also highly polar, these three polar loops were involved in extensive contacts with the RNA hairpin (Figure 7A).

Display large version of this figure
Figure 7
Interactions of the U1A protein with the RNA hairpin. (A) An overall stereo view of the complex. The hydrophobic core is shown in red, the polar loops 1 and 3 in green, and the polar loop 5 in blue. (B and C) Detailed interaction stereoviews. Residues in the U1A protein are green, nucleotides in the RNA red, hydrogen bonds blue, and the purple spheres represent water molecules.

The RNA hairpin is attached to the surface of the protein in an open conformation. The 10-nucleotide loop of the RNA hairpin encircles loop 3 of the protein and falls into the hydrophobic saddle region between loop 3 and the C-terminal of the protein (Figure 7A). The seven conserved nucleotides, A6UUGCAC12, were splayed out on the protein surface. The bases of A6 and U7 were stacked with the bases of the double-helix stem. They and the base of G9 were located in the inside of the loop, whereas the bases of U8, C10, A11, and C12 were on the outside of the loop. The base plane of C10 was approximately perpendicular to that of A11, whereas the bases of A11 and C12 were stacked on each other. The bases of G9CAC12 were buried in the protein hydrophobic core, but U8 had no obvious contacts with the protein, which was different from the situation in the x-ray structure, although the related three residues Asn16, Lys80, and Arg83 remained stable during the simulation. The last three nucleotides in the loop, UCC, had no obvious contacts with the protein, with bases extending into the solution and the backbone pointing toward the protein. Their bases were partially stacked on each other. The double-helical stem of the hairpin faced toward the protein through the major groove but had no obvious contacts with the surface of the protein. The 5′ end of the stem was very close to several positively charged residues of the protein, such as Lys20, Lys22, and Arg47.


Contacts at the RNA-protein interface

There are several nucleotides contacting with U1A protein at the RNA-protein interface, such as A6, U7, G9, C10, A11, C12, and G16. The residues in U1A protein contacting RNA bases included Tyr13, Asn15, Glu19, Leu44, Arg52, Gln54, Phe56, Gln85, Tyr86, Lys88, Thr89, Asp90, Ser91, and Asp92. These nucleotides and amino acid residues form a variety of electrostatic, hydrophobic, and hydrogen-bonding interactions at their interface.

From Fig. 4 we can see that the positive electrostatic potential on the outside of the U1A protein loops 1 and 3 was just opposed to the negative electrostatic potential on the major groove surface of the RNA hairpin. In the positively charged region of the protein, residues Lys20 and Lys22 formed electrostatic interactions with the phosphate groups of nucleotides A2, U3, and C4 of the RNA (Table 1). Arg47 also formed some electrostatic interactions with these nucleotides, but with low occupancy, and so is not listed in Table 1. The phosphate groups of C10 and G16 formed electrostatic interactions with the side chains of Lys88 and Arg52, respectively. Moreover, we found an electrostatic interaction between the phosphate group of C15 and the side chain of Lys23, which was not observed in the x-ray structure.

The bases of G9, C10, A11, and C12, four of the seven conserved nucleotides in the loop region, were buried in the hydrophobic core (Fig. 7), so a wide range of hydrophobic contacts was found between these nucleotides and the protein (Table 2). The side chain of Gln54 was parallel to the base plane of G9 and underwent a hydrophobic interaction. The aromatic plane of Tyr13 and the long side chain of Lys88 were parallel to the base of C10 on both sides and underwent stacking interactions with the base plane of C10. The aromatic plane of Phe56 was stacked with the base plane of A11. The CD2atom of Leu44 underwent a hydrophobic interaction with the C2atom of A11. One side of the base of C12 was stacked with the base of A11, and on the other side, it had hydrophobic contacts with the side chain of Asp92. In contrast to the x-ray structure, the base of C12 did not undergo stacking interactions with the side chain of Asp92. The sugar rings of G9, C10, and A11 also made partial hydrophobic contacts with Gln54 and Phe56, respectively. The side chain of Leu49 underwent several hydrophobic interactions with both the sugar and base of G16, and it interacted with the base of G6. In addition, the CB atom of Ser48 underwent hydrophobic interactions with the sugar backbone of C15. This was the second interaction related to C15.

Besides hydrophobic contacts, the four buried nucleotides G9CAC12 also formed hydrogen-bonding interactions with the U1A protein (Fig. 7).

A number of direct and water-mediated hydrogen bonds were monitored at the RNA-protein interface during the simulation (Table 3). These hydrogen bonds were formed not only between the backbone of RNA and the backbone of protein, but also between backbone and side chain, bases and backbone, and bases and side chain of RNA and protein, respectively. Eight of the direct hydrogen bonds and five water-mediated ones were formed between RNA bases and protein side chains. In particular, the side chain of residue Glu19 formed three direct and four water-mediated hydrogen bonds with the bases of A6, U7, and G9, which indicated that Glu19 was important to the specificity. Direct hydrogen bonds between atoms A6 N1 and Arg52 HH12, U7 H3 and Glu19 OE1, G9 O6 and Asn16 HN, C10 H41 and Tyr86 O, A11 N1 and Ser91 HG1, C12 H41 and Asp90 O, and G16 N7 and Arg52 HH11 had high occupancies (more than 80%) and were also found in the x-ray structure.

Eleven of the 15 direct hydrogen bonds were consistent with those observed in the x-ray structure. All of these hydrogen bonds were present more than half of the production run time. Four of these with more than 80% occupancy formed between bases of RNA and side chain of protein, namely A6 N1 and Arg52 HH12, U7 H3 and Glu19 OE1, A11 N1 and Ser91 HG1, and G16 N7 and Arg52 HH11, indicating that these four hydrogen-bonding interactions were critical for the specific RNA-protein interaction. Among the 12 water-mediated hydrogen bonds, in which three bridged more than two residues, only two were present in the x-ray structure, possibly because the water-mediated hydrogen bonds were less stable than the direct ones.

In the x-ray structure, it was found that most of the hydrogen bond donors and acceptors of G9CAC12 of the loop were involved in hydrogen bonding interactions with the protein residues. However, in the dynamics simulation some of these hydrogen bonds were present for less than one-third of the simulation, so they were not listed in Table 3. Some of the direct hydrogen bonds became water-mediated. The base of U8 did not form any hydrogen bonds with the protein.

Some hydrogen bonds changed their donors or acceptors. For example, the phosphate group of G16 formed a hydrogen bond with the HN atom of Arg47 in the x-ray structure, but we found that the phosphate group of G16 formed a hydrogen bond with the HN atom of Leu49 with ∼63% occupancy.

Some new hydrogen bonds were found from the simulation. In particular, most of the water-mediated hydrogen bonds were new, some of them with more than 70% occupancy, which suggested that these water-mediated bonds contribute to the specific interactions and should not be ignored. It was noted that the phosphate group of C15 underwent two water-mediated hydrogen-bonding interactions with the side chains of Ser46 and Ser48, and the phosphate group of U13 also interacted with the side chain of Asp92 through a water bridge.


Conformation of the RNA hairpin

As indicated in Figure 6CD, the loop of the free RNA hairpin was highly flexible and disordered, which was also reflected by the superimposition of the six snapshots taken from the production simulation with an interval of 0.1ns (Figure 8A). The double-helix stem of the RNA, as well as the first two nucleotides in the loop, was superimposed very well. Only the sugar-phosphate backbone of nucleotides A1, U7, G16, and U20 exhibited some fluctuation.

Display large version of this figure
Figure 8
Stereo view of superimposition of (A) six snapshots of the free RNA hairpin, taken from the production run with an interval of 0.1ns, colored from red to blue as time evolution. (B) The two conformations of the RNA free (red) and in complex (green), as well as a canonical A-RNA double helix (blue) with the same sequence as the hairpin stem.

Superimposing the average conformations of the free and bound RNA (Figure 8B) shows that the double-helix stem of the RNA hairpin in the two conformations overlapped quite well. The RMS deviation was just 0.36Å for all the atoms of the 10 nucleotides in the stem. Both of them were close to the canonical A-RNA conformation. However, the conformation of the stem in complex was closer to the canonical A-RNA than that in free RNA, in which the terminal base pair A1U20 was more mobile. The significant differences between the two conformations were located in the loop region, from U8 to C15, as well as the 3′-end U21. Among the 10 loop nucleotides, A6, U7, G9, C10, U13, and C14 were located on the inside of the loop, whereas U8, A11, C12, and C15 were on the outside, so the space enclosed by the loop was small. However, the space enclosed by the loop was enlarged in the complex, in which only A6, U7, and G9 remained on the inside of the loop. Nucleotides U13 and C14 had the most deviations in the complex from the free conformation of the RNA.

In the x-ray structure of the U1A-RNA hairpin complex, only nucleotides U8, C10, and U13 were found to have C2′-endo pucker (U13 was not mentioned in Oubridge et al), and the other ones had C3′-endo pucker. However, this situation changed during the simulations. Besides nucleotides U8, C10, and U13 with C2′-endo pucker in the complex, the sugar puckers of U7 and U21 changed from C3′-endo to C2′-endo spontaneously. The other nucleotides maintained a C3′-endo pucker during the simulation. In the free state, nucleotides C12, C14, C15, and U21 changed their ribose pucker from C3′-endo to C2′-endo spontaneously in the course of the simulation, in addition to U8, C10, and U13 keeping their C2′-endo puckers. However, U7 in the unbound state kept its C3′-endo conformation. Among the 10 nucleotides in the loop region, interestingly, only the three purines, namely A6, G9, and A11, maintained the C3′-endo pucker both in the free state and in the complex, whereas the pyrimidines preferred C2′-endo pucker to C3′-endo.


Interactions maintaining the RNA conformation

Table 4 lists all of the hydrogen-bonding interactions occurring within the RNA hairpin in the two different states, calculated from the corresponding production run with 0.2-ps resolution. From Table 4A, we can see that most of the direct hydrogen bonds present in the free conformation were base pairing interactions in the double-helix stem, with four of these base pairs having almost 100% occupancy and a long average lifetime, which indicated that these base pairs were quite stable during the simulation. However, the ending base pair A1U20 only had about half occupancy for the two hydrogen bonds, indicating that this base pair was less well defined during the simulation. Among the four hydrogen bonds involving the 2′-OH group, which were found only in the loop region, three were formed with their own O3′ atoms with less than 50% occupancy and a very short average lifetime, and the fourth, C12 H2′, was formed with the phosphate oxygen atom O2P of the next nucleotide with more than 50% occupancy and a longer average lifetime. In addition, the 5′-end hydrogen A1 H5T formed a hydrogen bond with A2 O1P.

Table 4 Hydrogen bonds within the RNA hairpin (with occupancy≥33.3%)
Free RNARNA in complex
Hydrogen bondsOccupancy (%)Average lifetime (ps)Occupancy (%)Average lifetime (ps)
(A) Normal direct hydrogen bonds
A1 H5T … A2 O1P44.02.234.41.4
A1 H61 … U20 O466.69.597.612.6
A1 N1 … U20 H351.14.491.65.8
A2 H61 … U19 O496.46.295.25.1
A2 N1 … U19 H399.654.399.999.9
U3 H3 … A18 N199.9119.899.899.9
U3 O4 … A18 H6196.46.387.02.3
C4 H41 … G17 O697.27.997.011.4
C4 N3 … G17 H199.9199.899.044.0
C4 O2 … G17 H2199.649.899.757.0
C5 H41 … G16 O699.649.899.328.4
C5 N3 … G16 H199.9199.999.999.9
C5 O2 … G16 H2198.918.599.018.9
U7 H2′ … U7 O3′35.40.333.40.4
G9 H2′ … G9 O3′45.50.444.60.4
A11 H2′ … A11 O3′37.10.446.00.4
C12 H2′ … U13 O2P58.53.7
U7 O2 … G9 H22 53.21.1
C12 H2′ … C12 O3′ 34.20.3
U13 H2′ … U13 O3′ 38.00.4
C15 H2′ … C15 O3′ 44.00.4
U19 H2′ … U19 O3′ 33.80.3
(B) Unusual direct CH … O hydrogen bonds
A2 H8 … A2 O5′49.60.545.20.5
U3 H6 … U3 O5′52.60.553.40.5
C4 H6 … C4 O5′48.10.554.40.5
C5 H6 … C5 O5′52.90.545.30.4
A6 H8 … A6 O5′58.50.544.80.4
G17 H8 … G17 O5′59.40.656.40.5
A18 H8 … A18 O5′48.20.556.90.5
U19 H6 … U19 O5′39.90.545.20.5
U7 H6 … U7 O5′63.60.6
C5 H2” … A6 O4′ 36.40.4
C12 H6 … C12 O5′ 54.60.5
G16 H8 … G16 O2P 64.20.9
(C) Water-mediated hydrogen bonds
A2 O1P … w … U3 O1P54.81.446.20.9
A2 O2P … w … U3 O1P58.92.735.02.4
U3 O1P … w … C4 O1P46.90.851.10.8
U3 O5′ … w … C4 O1P35.00.546.80.6
C4 O5′ … w … C5 O1P46.00.645.30.7
C5 O5′ … w … A6 O1P56.40.741.60.6
C5 O2P … w … A6 O1P44.11.548.41.8
A6 O2P … w … U7 O1P78.85.686.45.6
U7 O2P … w … U8 O1P64.22.655.24.0
U8 O2P … w … G9 O2P48.54.638.14.1
G9 O1P … w … C10 O1P61.41.947.21.3
A11 O5′ … w … C12 O1P60.31.463.80.8
C12 O2P … w … U13 O2′34.41.777.75.4
G16 N7 … w … G17 N749.11.245.50.8
G17 N7 … w … A18 H6242.00.650.40.7
G17 O2P … w … A18 O1P49.02.456.82.5
A18 O2P … w … U19 O1P52.12.955.02.2
U3 O2P … w … C4 O1P41.31.7
C5 O1P … w … A6 O1P46.80.6
G9 O1P … w … C10 H4244.80.8
G9 H22 … w … C10 O4′38.30.6
C10 O2P … w … C12 O1P50.63.1
C10 O2P … w … C12 O2P40.33.5
C12 O1P … w … U13 O2′43.73.1
C12 O2P … w … C14 O1P82.95.1
C15 O1P … w … G16 O1P45.66.2
A6 O1P … w … U7 O1P 35.20.8
U8 O3′ … w … G9 N7 39.00.8
C10 O2 … w … A11 N7 74.21.0
C10 O2P … w … A11 O2P 87.38.3
A11 N3 … w … A11 O2′ 40.60.6
C12 O1P … w … U13 O2P 52.51.4
C12 O2P … w … U13 O2P 55.41.9
A18 O5′ … w … U19 O1P 34.40.6
U19 O2P … w … U20 O1P 49.41.8
U20 O2P … w … U21 O4′ 35.81.3
Time cutoff 0.1ps, resolution 0.2ps.

In contrast to these, more hydrogen bonds were present within the RNA in the complex. Almost all of the hydrogen-bonding interactions between the five base pairs were 100% occupied, including the first one, A1U20. Only the occupancy of hydrogen bond between U3 O4 and A18 H61 was slightly lower. In addition to the three hydrogen bonds between the 2′-OH group and the O3′ atom of the same nucleotide present in the free state, four more such interactions appeared in the three loop nucleotides C12, U13, and C15, and in U19, with about one-third occupancy. Another hydrogen bond was found to connect the bases of U7 and G9. The hydrogen bond connecting the backbone of G12 and U13 in the free state disappeared in the complex. Besides these hydrogen bonds within the RNA, a large number of hydrogen bonds formed between the RNA and the U1A protein have been described above.

In contrast to the normal O(N)-H···O(N) hydrogen-bonding interactions, a number of unusual C-H···O hydrogen bonds (Wahl and Sundaralingam, 1997) were found in the RNA hairpin (Table 4B). Most of these hydrogen bonds were present between the base H6 or H8atom and the backbone O5′ atom within the same nucleotide with a C3′-endo sugar pucker. Most of these hydrogen bonds were formed in the double-helix stem, such as A2, U3, C4, C5, G17, A18, and U19, both in the free state and in complex, with about half occupancy and a very short average lifetime. In the complex, G16 had a C-H···O hydrogen bond between the H8atom and O2P instead of the O5′ atom of the backbone, with higher occupancy than the others, showing that the phosphate group of G16 was a little different from the others in the stem. Only a few C-H···O hydrogen bonds were present in the loop nucleotides, except A6 and U7 in the free conformation and A6 in the complex, which were extensions of the double-helix stem. Nevertheless, a C-H···O hydrogen bond was found in C12 of the loop in the complex. Another strange C-H···O hydrogen bond was found to connect the backbones of C5 and A6 in the complex, with about one-third occupancy.

The water-mediated hydrogen bonds were also analyzed from the production run (Table 4C). From Table 4C, we see that in both conformations most of the adjacent phosphate groups were bridged to form hydrogen bonds through water molecules with about half occupancy, especially in the double-helix stem. The 2′-OH group of U13 and the phosphate group of C12, the base of G17 and bases of G16 and A18 formed water-mediated hydrogen bonds, but no bridging water was seen between the backbones of G16 and G17. Nevertheless, differences were obvious between the two states, particularly in the loop region. In the free state, the backbone and base of G9 formed water-mediated hydrogen bonds with the base and backbone of C10, respectively. The phosphate groups of C10 and C12, and C12 and C14 also formed water-mediated hydrogen-bonding interactions. However, these interactions disappeared in the complex, and instead the sugar of U8 and base of G9, bases of C10 and A11, phosphate groups of C10 and A11, C12 and U13 formed water-mediated hydrogen bonds. No water-mediated C-H···O hydrogen bonds were found during the simulations.

Besides the hydrogen-bonding interactions, there were also some base stacking interactions within the RNA hairpin. In addition to the base stacking interactions in the double-helix stem, the bases of A6 and U7, G9 and C10, A11 and C12, and U13 and C14 were stacked separately on each other in pairs in the free state. In the complex, this situation was changed. A6 and U7 kept the stacking interaction together with the stem, the bases of C10, A11, and C12 underwent partially stacking interactions, and U13, C14, and C15 were also partially stacked. More importantly, the bases of G9CAC12 were buried in the hydrophobic core of the protein and formed stacking interactions with the protein separately.


Counterions and hydration

The distributions of water molecules and counterions around the complex during the simulation are shown in Figure 9A, according to the probability density map. Figure 9A illustrates that most of the water molecules with large probability were distributed uniformly around the folded domain of the protein and part of the loop of the RNA, whereas the water molecules around the two terminals of the protein and the RNA double-helix stem were more randomly distributed with low probability density. During the simulation the most probable positions of the sodium ions were close to the major groove of the RNA stem and outside of the protein C-terminal helix. The former region was the negative electrostatic region, and the latter one had two negative residues, Asp90 and Asp92.

Display large version of this figure
Figure 9
Stereo view of the sodium ions (yellow spheres) and water molecules (red spheres): distribution around (A) the RNA-protein complex and (B) the free RNA, according to the probability density map.

For the free RNA hairpin, the sodium counterions were initially placed 3.6Å from the phosphorous atoms. In the course of time, the counterions gradually moved away from the RNA atoms. In particular, four ions were detained in the major groove of the hairpin stem and stayed within 6.0Å of the phosphate groups, as seen from the most probable distribution region of the ions shown in Figure 9B. The probability densities of the counterions were higher (up to 45%) in the free state than in the complex (<30%). Figure 9B also shows the probability density map of water molecules around the free RNA, in which water molecules with a probability of more than 40% were distributed around the hairpin stem.

In both states of the RNA hairpin, most of the hydrogen-bonding donors and acceptors formed hydrogen bonds with water molecules during the simulations, indicating the high degree of hydration of the RNA hairpin. The water-mediated hydrogen-bonding interactions (listed in Table 4C) were only a part of the hydration. From the production runs, we found that almost all of the phosphate group oxygen atoms were hydrated with three water molecules in both states, except that in the complex O1P of C15 and both O1P and O2P of G16 formed no more than two hydrogen bonds to water molecules. For the sugar and base atoms, the hydration degrees are listed in Table 5, in which almost all atoms in the free RNA have at least one hydrogen bond to water 50% of the time. Apart from a few differences, most of the atoms in the double-helix stem had very similar hydrations in both states. All O2′ atoms had occupancies of ∼140%; that is, they were hydrated by about one or two water molecules. On the contrary, for the hairpin loop atoms, the bases of G9, C10, A11, and C12 had high occupancy for the free RNA, but low occupancy in the complex, because of the hydrophobic interactions with the protein.