Article Outline

Article Information

PubMed

Related Articles

  • …more

Copyright © 2008 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 94, Issue 7, 2470-2481, 1 April 2008

doi:10.1529/biophysj.107.117622

Biophysical Theory and Modeling

Mutations Affecting the Oligomerization Interface of G-Protein-Coupled Receptors Revealed by a Novel De Novo Protein Design Framework

Martin S. Taylor*Ho K. Fung*Rohit Rajgaria*Marta FilizolaHarel Weinstein§ and Christodoulos A. Floudas*Go To Corresponding Author 

* Department of Chemical Engineering, Princeton University, Princeton, New Jersey
Department of Structural and Chemical Biology, Mount Sinai School of Medicine, New York, New York
Department of Physiology and Biophysics, Weill Medical College of Cornell University, New York, New York
§ HRH Prince Alwaleed Bin Talal Bin Abdulaziz Alsaud Institute for Computational Biomedicine, Weill Medical College of Cornell University, New York, New York

Address reprint requests to Christodoulos A. Floudas, Dept. of Chemical Engineering, Princeton University, Princeton, NJ 08544. Tel.: 609-258-4595; Fax: 609-258-0211.

Abstract

Specific functional and pharmacological properties have recently been ascribed to G-protein-coupled receptor (GPCR) dimers/oligomers. Because the association of two identical or two distinct GPCR monomers seems to be required to elicit receptor function, it is necessary to understand the exact nature of this interaction. We present here a novel method for de novo protein design and its application to the prediction of mutations that can stabilize or destabilize a GPCR dimer while maintaining the monomer's native fold. To test the efficacy of this new method, the dimer of the single-spanned transmembrane domain of glycophorin A was used as a model system. Experimental data from mutagenesis of the helix-helix interface are compared with computational predictions at that interface, and the model's results are found to be consistent with the experimental findings. A flexible template was developed for the rhodopsin homodimer at atomic resolution and used to predict sets of three and five mutations. The results are found to be consistent across eight case studies, with favored mutations at each position. Mutation sets predicted to be the most disruptive at the dimerization interface are found to be less specific to the flexible template than sets predicted to be less disruptive.

Introduction

Compelling evidence indicates that G-protein-coupled receptors (GPCRs) form multimeric complexes with distinct pharmacological and functional properties (for recent review articles, see 1,2,3,4,5,6,7,8,9,10). Although most of this evidence comes from in vitro experiments, recent studies using animal models support a specific role for GPCR oligomerization in vivo and in human pathologies. Accordingly, understanding the basis of protein-protein interaction in GPCR oligomerization will significantly enhance our understanding of the molecular mechanisms underlying GPCR cellular function, with the promise of new and improved therapeutics targeting the complex structures and their mechanisms.

Both computational and experimental efforts have been made to identify the interface of GPCR oligomers, but the specific molecular determinants required for stable protein-protein interaction are still unknown. Rhodopsin is the only GPCR whose native organization in rows of dimers has been demonstrated directly using data from atomic force microscopy 11. Based on these data and the crystallographic monomeric structure of rhodopsin 12, a three-dimensional model of rhodopsin oligomers was proposed 13. Specifically, this model consisted of intradimeric interfaces involving transmembrane (TM) helices TM4 and TM5, and the second intracellular loop, whereas helices TM1 and TM2 and the third intracellular loop were involved in the formation of rhodopsin dimer rows. Although it is possible that some GPCRs use different oligomerization interfaces to achieve functional selectivity, a systematic application of an enhanced correlated mutation-analysis-based approach 14,15 to several rhodopsinlike GPCRs that are known to form homo-oligomers identified TM1 and TM4 most often as putative interfaces of dimerization/oligomerization. Cross-linking studies of substituted cysteine residues in TM4 and TM5 of rhodopsin 16, and in TM4 of dopamine D2 receptor, 17, further supported the involvement of these two helices in the dimerization/oligomerization of GPCRs. The recent demonstration that the putative dopamine D2 receptor homodimerization interface involving TM4 is related to function, and that activation of this receptor requires changes at this interface 18, further justifies the need for an improved understanding of the exact nature of the interaction between GPCR monomers. The final goal is to suggest specific mutations that may affect the interaction between GPCR monomers, and thereby either disrupt cross talk between receptor subunits, or promote specific signaling cascades.

To this end, and to reduce the overwhelmingly large number of mutagenesis and cross-linking experiments that would otherwise be required to obtain the desired structural insight, we developed a novel two-stage framework for computational de novo protein design. The goal of this method is to discover mutations that would stabilize or destabilize the interfaces of a GPCR dimer while maintaining the monomer's native fold to the maximum extent possible. To test the efficacy of this new method, we used the dimer of the single-spanned transmembrane domain of Glycophorin A as a model system, and compared the predictions to experimental data from mutagenesis studies 19,20,21. The protein design framework was then applied to the TM4,5-TM4,5 dimer of the prototypic GPCR rhodopsin, and used to predict sets of three and five simultaneous lipid-exposed mutations that are likely to disrupt the dimerization interface of rhodopsin with minimal changes in the structural integrity of the monomers.


Theory

Novel two-stage framework for de novo protein design

The framework for de novo protein design applied to the investigation of the dimerization/oligomerization interface of rhodopsin involves two main stages. The methodology is outlined in Fig. 1. The first stage of the method, termed the “sequence selection” stage, begins with a high-resolution flexible template and uses a distance-dependent force field to select a rank-ordered list of amino acid sequences that are predicted to be of low energy in that template. As indicated in Fig. 1, these sequences are the input to the second stage, the “fold validation” stage, in which sequences are selected from a list of sequence positions that were rank-ordered by a criterion of relative level of specificity for the flexible template. This fold validation stage is outlined in Fig. 2.

Display large version of this figure
Figure 1
Overview of the de novo protein design method. The first stage of the method, the “sequence selection” stage, begins with a high-resolution flexible template and uses a distance-dependent force field to select a rank-ordered list of amino acid sequences that are predicted to be of low energy in that template. Fig. 1 shows that these sequences are the input to the second stage, the “fold validation” stage, in which sequences are selected from the rank-ordered list of sequence positions that were found to have the highest level of specificity for the flexible template. This fold validation stage is outlined in detail in Fig. 2.
Display large version of this figure
Figure 2
Overview of the method for fold validation.

Defining the flexible template

The starting point for any de novo protein design method is the definition of the template or backbone structure. The template is the de novo design method's representation of the desired three-dimensional structure, and therefore appropriate definition is critical. Early methods of de novo protein design assumed a rigid template, with the coordinates of all atoms fixed in space. This assumption was highly convenient because it significantly reduces the complexity of the problem 22. However, it has been observed that this unrealistic constraint renders some problems in de novo protein design untreatable without the use of flexible templates 23,24,25.

Methods have been created to incorporate template flexibility; two of the most popular are modeling atoms with smaller-than-natural atomic radii and considering a discrete set of templates and sampling among the results 26. Modeling atoms with smaller atomic radii, with typical reductions between 5 and 10%, allows steric overlapping of atoms due to backbone movements. However, this approach has a number of disadvantages including an overestimation of attractive forces between atoms and the possibility of overpacking atoms, particularly in the hydrophobic cores of target molecules 27. Considering a discrete set of templates and sampling among the results alleviates these disadvantages and allows for flexibility to be incorporated through the variations within the sets. Additionally, flexibility can be controlled across different regions of the protein. However, the various templates must somehow be combined into a meaningful three-dimensional structure, and the design problem must be solved for each template considered. This greatly increases the complexity and computational difficulty of the problem. There are several recent reviews of advances in de novo protein design 28,29,30.

Here, we apply two strategies to incorporate flexibility into the template. Perhaps the most elegant way to incorporate backbone flexibility is to allow variability within the template itself. Following the method of Klepeis and Floudas 26,31,32,33, we allow for backbone flexibility by incorporating a distance-dependent force field in the sequence selection stage. First, we represent the protein as a matrix of distances between α-carbons. Then, the force field considers pairwise interactions between residues as strictly distance-dependent, allowing rotational and torsional flexibility. Distances between residues are discretized into a set of bins rather than scoring their energy-continuous function. The bin sizes vary between 0.5 and 1Å, implicitly allowing backbone movements of similar magnitude. It is important to note that, because of the precision required by this method, a high-resolution structure must be used as a starting point.

In addition, when multiple structural models are available, such as the multiple NMR models of the glycophorin A dimer 34, a probability-weighted-average method is used to increase template flexibility by incorporating information from all models 26,27,33.


Developing a flexible template for the rhodopsin dimer

No high-resolution structure of the rhodopsin dimer is yet available. An atomic-level resolution model of a rhodopsin dimer with TM4 and TM5 at the dimerization interface was recently proposed (Protein Data Bank (PDB) identification code 1N3M) that uses atomic force microscopy data 13 and the crystallographic structure of rhodopsin 12. We have recently described the first 45-ns molecular dynamics simulations of this model in an equilibrated unit cell of hydrated palmitoyloleoyl phosphatidylcholine 35. The resulting energy-optimized average structure of the converged interval of these simulations (the last 17.5ns of the 45-ns simulation) was used to develop a flexible template for the rhodopsin dimer using the strategy outlined in the section above.


Developing mutation sets

Once a flexible template is defined, an appropriate mutation set is developed. The mutation set dictates which positions are considered for mutation and which residues are allowed in each position. As illustrated in Fig. 1, this selection is influenced by experimental results and the solvent-accessible surface area (SASA). In the general case, all 20 amino acids can be considered at all n positions, and the total combinatorial complexity of the problem is 20n. However, reducing the search space can be advantageous both to reduce computer time and to increase biological relevance. For example, when redesigning a protein with a particular binding region or catalytically active domain, successful designs often focus on the surrounding regions. Indeed, this was the case for compstatin, a synthetic peptide inhibitor of complement 3, whose activity was increased 45-fold by redesigning residues surrounding the binding loop 31,32,36,37,38,39.

Another simple and highly useful method for reducing the search space is to consider only subsets of residues at each position. A popular classification of this kind is to separate the residues based on their environment. This makes sense from a biochemical perspective for a number of reasons. Protein cores are typically composed of tight packings of hydrophobic, nonpolar amino acids. The surface of globular proteins is typically exposed to water, and, accordingly, residues at the surface are generally hydrophilic, polar amino acids. This idea has been used successfully by Harbury and co-workers in their design of α-helical bundle proteins with a right-handed superhelical twist 40, and, recently, by Hecht and co-workers to design a four-helix bundle with a novel fold 41. We introduce this idea in residue selection for our protocol, using the calculated SASA to classify residues into three categories: surface (SASA>50%), intermediate (20%<SASA<50%), and core (SASA<20%). SASA is readily calculated with the program NACCESS 42. In the general case, only hydrophilic residues (GNQHKRDESTP) are considered at surface positions, only hydrophobic residues (AVILMFYW) are considered at core positions, and all 20 amino acids except cysteine are considered at intermediate positions. Cysteine is excluded from the mutation sets because of its ability to cross-link. However, these guidelines can be modified; for example, it may be advantageous to alter the mutation set at a given position if the native residue does not fall into the category predicted by its SASA.

Implementing these approaches, we developed an appropriate mutation set at the dimerization interface of rhodopsin by selecting the following five positions as candidates for mutation: 4.41, 4.48, 4.55, 5.37, and 5.41. We chose these specific positions in TM4 and TM5 because they correspond to the closest symmetric interactions (distance between Cβ atoms <11Å) at the intradimeric interface of rhodopsin, as derived from atomic force microscopy data 13. These positions are identified by the “generic numbering system” adopted for GPCRs to refer, comparatively, to structurally cognate receptors 43. Briefly, in this generic numbering scheme, two numbers (N1 and N2) are assigned to amino acid residues in TMs. N1 refers to the TM number, whereas the N2 numbering is relative to the most conserved residue in each TM, which is assigned a value of 50. The other residues in the TM are numbered in relation to this conserved residue, with numbers decreasing toward the N-terminus and increasing toward the C-terminus.

Based on the SASA of native residues in the model (data not shown), two sets of residues were used for the chosen positions of mutations: 1), all residues except cysteine at the helix boundary positions 4.41 and 5.37; and 2), hydrophobic residues plus serine and threonine (AVILMFYWST) at positions 4.48, 4.55, and 5.41. Cysteine was excluded from the mutation sets because previous data show that mutations of lipid-exposed residues to cysteine may cause GPCR cross-linking 16,17, and serine and threonine were added to the standard hydrophobic set because of their prevalence on the lipid-facing surface of membrane proteins.


Energy function

The energy function used in this work is a modified version of the high-resolution (HR) force field developed recently by Rajgaria et al. 44. The HR force field is a knowledge-based distance-dependent potential that has been shown to be highly effective in discriminating native protein folds from highly similar sets of decoy structures at high resolution (root mean-square deviation (RMSDs)<3Å), and medium resolution (3Å>RMSD>10Å) 44. The HR force field contains eight distance bins and considers a contact to be any interaction where the distance is <9Å. It is the next generation version of the LKF force field used previously in this method 45. To better model the interactions across the dimerization interface of rhodopsin, a longer-range version of the HR force field was generated and designated the HR-12 force field. The HR-12 force field contains 12 bins, and includes interactions between residues with distances up to 13Å. The HR-12 force field performs similarly to the HR force field on all test cases (data not shown).

The energy measured by this type of HR force field is a distance-dependent approximation of the Gibbs free energy, since it is derived from energy-minimized native conformations in the PDB. Additionally, this type of force field is advantageous for de novo protein design because evaluation of contact energies requires only a table lookup. Hence it is very efficient, allowing for rapid evaluation of many candidate sequences.

The flexible template and mutation set are used to design sets of residues to fit the template structure. Sequence selection is guided by the distance-dependent force field in a mixed integer linear programming model. In the general case, residues are selected to minimize the energy of the structure within the template; the novel sequences generated are designed to be specific to that template. However, to disrupt the dimerization process, we focus on interactions across the dimerization interface and instead maximize the energy, disrupting interactions between the monomer units.


Sequence selection

The sequence selection formulation used in this work is based on the original formulation by Klepeis et al. 31,32. This model was recently improved by Fung et al. and proven to be totally equivalent to, but computationally more efficient than, the original model 26,27,33. The integer linear programming model then takes the form

Consider set i=1,…,n to define the residue positions along the template where n represents the total number of residues. At each position i, the set of mutations is represented by j{i}= 1,…,mi, where, for the general case where all mutations are allowed, mi=20i. The equivalent sets ki and lj are defined so that pairwise interactions can be represented, and k>i is required to ensure that all pairwise interactions are unique. Binary variables and are introduced to indicate the possible mutations at a given position. That is, when a particular amino acid (j or l) is active at a given position (i or k), variable or indicate this by taking the value of 1. To ensure that there is exactly one type of amino acid at each position, composition constraints require the sum of be equal to 1 for all positions i. Additionally, there are two sets of RLT constraints that are introduced to reduce the integrality gap. The model minimizes the energy function over the entire structure. It is important to note that the energy value then depends on the distance between the α-carbons at the two backbone positions, i and j, as well as the type of amino acids, k and l, at those positions. It should be noted that the binary variables can be relaxed into continuous variables as shown by Zhu 46.

To predict sets of mutations that disrupt the dimerization interface of rhodopsin, the objective function is maximized rather than minimized. Accordingly, residues of the highest energy are selected at the dimerization interface.


Increased flexibility and fidelity with multiple structural models

When multiple structural models are available, such as the multiple NMR models of the glycophorin A dimer, a probability-weighted-average method is used to incorporate information from all models 26,27,33. Because of protein flexibility, the distances across multiple structural models can differ; for a particular pair of residues, they often span a number of bins. This distribution of distances between residues is reflective of the residues’ conformational freedom. We therefore represent the pairwise energy contribution of each pair of residues as a sum of the contributions from all distances spanned, weighted by the probability of finding the residues at each distance.

This energy contribution of the objective function is then writtenwhere d is the current distance bin, b represents the total number of distance bins in the force field, and Ω is the weight given to that bin for the current pair of residues, defined as the fraction of structures in which the distance between residues xi and xk falls into bin d. This method increases the flexibility of the template by spanning multiple distances for each interaction. At the same time, the fidelity of the model is increased because the probability-weighted distribution is more representative of the interaction between two residues than any single distance can be.

To illustrate this method, imagine a protein for which 20 structural models are available. In these models, the distance between residues α and β falls into bin 2 eight times, bin 3 ten times, and bin 4 two times. The values of Ω for these bins are then 0.4, 0.5, and 0.1, respectively. If the distance-dependent force field has the values −6.0, −4.0, and −2.0 units, respectively, for the interaction between α and β over these bins, then the total contribution of the interaction between these residues would be −4.6 units.


Development of structure-based constraints

The mixed-integer linear programming model allows for straightforward incorporation of constraints to increase the biological relevance of the results. As seen in Fig. 1, experimental results and homology information are incorporated in this step, resulting in specifically targeted constraints. For example, charge can be maintained over the entire protein, within specific domains of the protein, or both. The number of simultaneous mutations can be limited, allowing a small number of mutations with maximal effect to be introduced across a large search space. Also, the composition of the protein or a region of the protein can be controlled by imposing minimum and maximum quantities of specific amino acids or groups of amino acids within the design. For example, the number of hydrophobic amino acids in existing β-strands can be bounded to enhance the formation of β-sheets.


Incorporation of knowledge-based weights

We incorporated three distinct sets of knowledge-based weights in the rhodopsin runs to illustrate how such knowledge-based information can be included in the new type of de novo protein design method we present here. Such information can be very useful in improving the specificity of the results. As outlined in Fig. 1, these weights allow knowledge from homology studies to be incorporated into the energy function, supplementing the energy function with structural data focused on the region, or even position, of interest. Incorporation into the energy function is straightforward; the objective function is multiplied by two additional terms The term represents the weight of amino acid j at position i in the protein backbone, and represents the same information for the second residue being considered in the pairwise energy function. Note that for positions not selected for mutation, the weight factors λ are given the value of 1. This prevents bias of some interactions over others based on weight assigned to native residues in nonmutated positions.

One set of knowledge-based input comes from two scales previously described for membrane proteins: the structure surface fraction (SF) scale, and the rhodopsin surface propensity (RhoSP) scale 47. These scales describe what types of residues one might expect to find on the surface of general membrane proteins (SF) and of rhodopsinlike proteins (RhoSP). They are based on the inside/outside-facing distribution of the residues on the template obtained from a bioinformatics procedure described in detail 47. The procedure yields an amino acid property scale (APS) that corresponds to the propensity of residues to be located on the lipid-oriented TM surface in membrane protein. The knowledge-based scale was shown to refine predictions based on conservation criteria alone 47. The APS-based prediction method is available on the web-accessible server ProperTM http://icb.med.cornell.edu/crt/ProperTM/ProperTM.xml.

To increase further the specificity of the structure-based information content, we also developed a position-dependent set of knowledge-based weights for class A GPCRs. These weights represent the probability of finding a residue at a given position across all class A GPCRs. An archive of GPCR sequence information was obtained from the GPCRDB 48 and a database of human class A GPCRs was built from these sequences. In total, there were 545 receptors used. Human receptors were selected to minimize bias for receptors heavily studied across species. Multiple pairwise alignments were then performed using CLUSTAL W 1.83 49. Taking bovine rhodopsin as a “gold standard” for class A GPCRs, each helix of each receptor was aligned with the corresponding helix in bovine rhodopsin. The probability of each residue occurring at each position was then calculated. Note that a residue was tallied only if it was found aligned with one of the residues in bovine rhodopsin; residues aligning with gaps were not tallied. The resulting weights for the positions mutated in this study are presented in Table 1. Full data for all positions are available in Supplementary Materials , Tables S1–S7).


Method for fold validation

As outlined by the sequence of procedural steps shown in Fig. 1, once a rank-ordered list of sequences has been selected into the template by the sequence selection stage, the sequences are further validated by a second stage, designed to predict the sequence's specificity for the flexible template. In the general case, the same template is used for both stages, and the second stage provides refinement of the results from sequence selection. However, to disrupt the dimerization of rhodopsin, the sequence selection stage maximizes the energy rather than minimizing it. Here, we use the crystal structure of the rhodopsin monomer in the second stage, and the resulting predictions are for sets of mutations that will destabilize the dimerization process of rhodopsin while remaining specific to the native fold of the monomer.

In the two-stage framework developed by Klepeis et al. 31,32, the second stage uses ab initio structure prediction techniques based on the αBB deterministic global optimization solver with an objective function of a full-atomistic force field over the set of independent dihedral angles that describe the configuration of the system 50,51,52,53,54,55,56,57,58,59,60,61,62,63,64. However, these computations are not currently feasible for proteins the size of rhodopsin. Here, we present an alternative method for fold specificity; this method is outlined in Fig. 2. Briefly, a conformation ensemble is generated by simulated annealing using the CYANA 2.1 package 65,66. The resulting hundreds of structures are subjected to local energy minimizations using TINKER 67 with the AMBER force field 68. The ensembles of structures from candidate sequences are then statistically compared to yield a template specificity factor, STemp. Details of this procedure are reported below.

First, upper and lower bounds on both the distances between α carbons and the ϕ and ψ angles between residues are extracted from the flexible template. When only one structural model is available, these bounds are defined parametrically, with distances of ±10% and angles of ±25°. We note that true backbone flexibility is incorporated into this method. The parametrically defined template allows for any possible values of distances and dihedral angles.

Then, for both the sequence of interest and the native sequence, an ensemble of random structures (conformers) is generated within the confines of the flexible template. This is done using the CYANA 2.1 software package for NMR structure refinement 65,66. CYANA 2.1 performs annealing calculations that simulate a rapid heating of the protein followed by a slow cooling in which high-temperature torsion dynamics and annealing torsion dynamics are performed. Violations of van der Waals radii and of the flexible template are minimized, thereby minimizing the energy of the target structures. Hundreds of these structures are generated within the confines of the flexible template; for this work we have generated 500 conformers for each sequence. For each conformer in the ensemble, local minimizations are then performed with the TINKER 67 package using the BFGS quasi-Newton optimization algorithm, as guided by gradients in the fully atomistic force field AMBER 68. AMBER is then used to evaluate the potential energy of the final conformer structure. The use of TINKER, AMBER, and CYANA is widespread and documented in the literature, even though there have been reports 69 discussing the inaccuracies of the parameters used within AMBER and suggesting improvements.

To analyze the results, a method similar to that used by Klepeis et al. 31,32 for ensemble comparison in fold validation was employed. First, the mean and standard deviation of both RMSD and AMBER energies were found for the native sequence. Upper bounds on both RMSD and energy were then established; for RMSD, the upper bound was selected as 1.5 standard deviations above the mean, in the energy the upper bound was selected as 2 standard deviations from the mean. A structure is considered to make a contribution to the ensemble only if its energy and RMSD both fall under these upper bounds. This is illustrated in Fig. 3.

Display large version of this figure
Figure 3
Illustration of upper bounds on RMSD and AMBER energy. Thick lines indicate the upper bounds. Data points in shaded regions are not considered in further calculations.

The specificity of each mutant sequence to the target template is then calculated relative to the native sequence using a Boltzmann distribution. Both the predicted energy of each conformer and its RMSD from the template structure are used in this calculation.

We define the “set native” as the set of all data points from the native sequence that are below both upper bounds, and “set novel” as the set of all data points from the novel sequence that meet the same criterion. The template specificity factor, STemp, is then calculated using Boltzmann probabilities:

The approximate criterion described here does not claim to be a rigorous calculation of stability or free energy. However, AMBER is designed to quantify the potential energy of a protein in a given three-dimensional conformation, and it provides a good approximation of the enthalpy of the protein when folded into the template. Combined with the large sampling of conformational space around the template, approximating an entropy calculation, the template specificity factor approximates the specificity of the fold within the confines of the flexible template.

In summary, the proposed approach consists of two stages. In the first stage, the sequence selection model minimizes the free energy, which is approximated as a distance-dependent force field derived from existing structures in the PDB. In the second stage, we perform a fold-specificity calculation via two protein-folding calculations, one around the flexible template and the other without template restrictions. Then, based on the ensembles of the generated conformers we rerank the predicted sequences based on the fold specificity. As a result, in stage 1, we aim for better stability, whereas in stage 2 we aim for better specificity, and the proposed approach combines the two in a unique way. This is similar to the “design-in/design-out” method described by Koehl and Levitt 70,71.



Results and discussion

Glycophorin A: a model system

To test the efficacy of the new method, we compare our results with the experimental data from the Fleming group on the dimerization of glycophorin A 19,20,21. Their recent studies have probed the relationship between structure, sequence, and stability of the dimerization of glycophorin A, a human erythrocyte protein with a single transmembrane domain containing a GxxxG motif that is known to form symmetric homodimers. The structure was solved with NMR and the coordinates for 20 models are deposited in the PDB file 1AFO. This information was used to construct the flexible template as described in Methods, and both stages of calculations (see Fig. 1, Stage I, to yield rankings and energies, and Stage II to yield Stemp values) were performed as described in Methods on all reported mutations.

Three case studies had been performed experimentally 19,20,21, examining three types of mutations. We repeat the studies in silico and compare our results to those found experimentally. We note that the calculations in the two-stage approach presented here relate to the experimentally determined ΔΔG of mutation, but they are not the same. The sequence selection stage minimizes the distance-dependent force field 44 in selection of mutations and the second stage provides a metric for structural specificity. As a result, our approach does not explicitly calculate ΔΔG for the dimer, but our results are consistent with those determined experimentally.

Single alanine mutations

In a first study, single alanine mutations were made along the 75–87 helix fragment of glycophorin A, including both “lipid-facing” residues (77, 78, 81, 85, and 86) with side chains pointing away from the dimerization interface, and “helix-facing” residues (75, 76, 79, 80, 83, 84, and 87) with side chains pointing into the interface 20. In these experiments, the lipid-facing alanine mutations were found to cause no statistically significant change in the energy of dimerization, with the exception of a moderate stabilizing effect in mutation I85A. In stark contrast with the lipid-facing residues, the helix-facing alanine mutations were found to have moderate to strong destabilizing effects, especially mutation G83A in the GxxxG motif (residues 79–83).

The results from computational studies of these mutants are presented in Table 2,Table 3. In the sequence selection, we find a strong stabilizing effect from mutation I85A. We also note a moderate stabilizing effect of mutation G86A, which does not contradict the experimental results. Whereas the sequence-selection phase fails to identify the stabilizing effect of the GxxxG motif, we find all other mutations destabilizing, in strong accord with the experimental results. Additionally, in the specificity calculations, we also find that all alanine mutations, including those that disrupt the GxxxG motif, produce monomers that are less specific to the template structure than the wild-type sequence.

A number of factors may explain the failure of the sequence-selection phase to identify the GxxxG motif as stabilizing. It has been shown that this complex motif is not required for the dimerization of the glycophorin A transmembrane domain, and that its presence is not sufficient for dimerization 21. Although the function of the glycines in this motif is still not fully understood, proposed mechanisms suggest that they may permit long-range interactions by allowing tighter helical packing. 19,72. Furthermore, where GxxxG motifs occur in the context of the membrane, they may facilitate packing of helical conformations distinct from those seen in soluble proteins 73. Therefore, we propose that the role of these glycines may be different from that played by any glycine used to derive the HR-12 force field. It is also possible that the distance-dependent approximation used in the sequence selection phase is not sufficient to model this distinct type of packing.


Single aliphatic mutations along the helix-facing residues

A second study carried out in Fleming's lab 21 explored the requirement of the GxxxG motif (residues 79–83) for glycophorin A dimerization using a series of single mutations of the “helix-facing” residues to large aliphatic residues and to glycines. This study also repeats mutations to alanine; however, these are discussed in the previous section and are therefore not repeated. The experiments again find that the GxxxG motif stabilizes the dimer but is not required for dimerization (Fig. 4). As noted previously, our method does not identify the GxxxG motif to be stabilizing in the sequence-selection phase, and therefore mutations that alter these glycines are almost all found to stabilize the dimerization interface. Accordingly, the results segregate those that disrupt the glycines in the GxxxG motif (Table 4) from those that do not (Table 5).

Display large version of this figure
Figure 4
Experimentally determined energies of dimerization for single aliphatic mutations. Reprinted from Doura et al. 21.

Experimentally, all but two point mutations along the helix-facing residues were found to be destabilizing. Mutation V80L was found to be moderately stabilizing, and mutation I76V was found to have no effect on the energy of dimerization. We find mutation V80L as the best stabilizing mutation in the sequence-selection phase (Rank 1, Energy −763 in Table 5) and in the specificity calculations we find it more specific to the template (STemp=1.1) than the wild-type sequence (STemp=1.0). Also, we find mutation I76V to be highly specific to the template. Mutations I76G and I76L are also found to be specific to the template.


Scanning double alanine mutations

In a third experimental study on glycophorin A 19, alanine scan double mutations were performed along the dimerization interface to further probe residue interactions. All double mutants that did not disrupt the glycines in the GxxxG motif were found to be destabilizing in these experiments, and the results were found to be more complex than could be predicted from coupling single mutations. In agreement with these results, we found that all but one mutant were moderately to significantly less specific to the template than wild-type receptor (see Table 6,Table 7, Stage 2 STemp). The sole exception to this trend was the double leucine mutant G79LG83L, in which both glycines in the GxxxG motif had been replaced by leucines. Despite the agreement of our second-stage results with experimental data, our method failed to recognize the GxxxG motif as stabilizing. In fact, mutations disrupting both glycines were found to be stabilizing relative to wild-type, as seen by their lower Stage 1 energies.