| Pressure-induced correlation field splitting of vibrational modes: structural and dynamic properties in lipid bilayers and biomembranes Biophysical Journal, Volume 66, Issue 5, 1 May 1994, Pages 1505-1514 P.T. Wong Abstract Correlation field splittings of the vibrational modes of methylene chains in lipid bilayers, isolated lipid molecules in perdeuterated lipid bilayers, crystalline lipid, and interdigitated lipid bilayers have been investigated by pressure-tuning Fourier-transform infrared spectroscopy. The correlation field splittings of these modes are originating from the vibrational coupling interactions between the fully extended methylene chains with different site symmetry along each bilayer leaflet. The interchain-interactions of the methylene chains with the same site symmetry only contribute to frequency shift of the vibrational modes. The magnitude of the correlation field splitting is a measure of the strength of the interchain-interactions, and the relative intensities of the correlation field component bands provide information concerning the relative orientation of the zig-zag planes of the interacting methylene chains. It has been demonstrated in the present work that the correlation field splitting of the CH2 bending and rocking modes commonly observed in the vibrational spectra of lipid bilayers is the result of the intermolecular interchain-interactions among the methylene chains of the neighboring molecules. The intramolecular interchain-interactions between the sn-1 and sn-2 methylene chains within each molecule are weak. The correlation field splitting resulting from the intramolecular interchain-interactions exhibits a much smaller magnitude than that from the intermolecular interchain-interactions and is observed only at very high pressure. Interdigitation of the opposing bilayer leaflets disturbs significantly the intermolecular interchain-interactions and results in dramatic changes in the pressure profiles of the correlation field component bands of both the CH2 bending and rocking modes. The relative intensities of the correlation field component bands of these modes and the magnitude of the splitting are also altered significantly. These results provide further evidence that the correlation field splitting of the CH2 bending and rocking modes in the vibrational spectra of lipid bilayers is due to the intermolecular interchain-interactions. The present work has also demonstrated that the correlation field splitting of the vibrational modes in lipid bilayers is mainly contributed by the intermolecular interchain-interactions among the nearest neighboring molecules and that the long-range correlation interactions beyond the second neighboring molecules are insignificant. Abstract | PDF (1088 kb) |
| No Spt6, No Nucleosomes, No Activator Required Molecular Cell, Volume 21, Issue 4, 17 February 2006, Pages 452-453 Judith K. Davie and Sharon Y.R. Dent Summary In the February 3 issue of , a paper from demonstrates that nucleosome reassembly is required for gene repression and, strikingly, that transcriptional activators are not necessary for gene activation in the absence of nucleosome reassembly. Summary | Full Text | PDF (96 kb) |
| Dynamic Light Scattering and Optical Absorption Spectroscopy Study of pH and Temperature Stabilities of the Extracellular Hemoglobin of Glossoscolex paulistus Biophysical Journal, Volume 94, Issue 6, 15 March 2008, Pages 2228-2240 Patrícia S. Santiago, Franciane Moura, Leonardo M. Moreira, Marco M. Domingues, Nuno C. Santos and Marcel Tabak Abstract The extracellular hemoglobin of (HbGp) is constituted of subunits containing heme groups, monomers and trimers, and nonheme structures, called linkers, and the whole protein has a minimum molecular mass near 3.1×10 Da. This and other proteins of the same family are useful model systems for developing blood substitutes due to their extracellular nature, large size, and resistance to oxidation. HbGp samples were studied by dynamic light scattering (DLS). In the pH range 6.0–8.0, HbGp is stable and has a monodisperse size distribution with a -average hydrodynamic diameter () of 27±1nm. A more alkaline pH induced an irreversible dissociation process, resulting in a smaller of 10±1nm. The decrease in suggests a complete hemoglobin dissociation. Gel filtration chromatography was used to show unequivocally the oligomeric dissociation observed at alkaline pH. At pH 9.0, the dissociation kinetics is slow, taking a minimum of 24h to be completed. Dissociation rate constants progressively increase at higher pH, becoming, at pH 10.5, not detectable by DLS. Protein temperature stability was also pH-dependent. Melting curves for HbGp showed oligomeric dissociation and protein denaturation as a function of pH. Dissociation temperatures were lower at higher pH. Kinetic studies were also performed using ultraviolet-visible absorption at the Soret band. Optical absorption monitors the hemoglobin autoxidation while DLS gives information regarding particle size changes in the process of protein dissociation. Absorption was analyzed at different pH values in the range 9.0–9.8 and at two temperatures, 25°C and 38°C. At 25°C, for pH 9.0 and 9.3, the kinetics monitored by ultraviolet-visible absorption presents a monoexponential behavior, whereas for pH 9.6 and 9.8, a biexponential behavior was observed, consistent with heme heterogeneity at more alkaline pH. The kinetics at 38°C is faster than that at 25°C and is biexponential in the whole pH range. DLS dissociation rates are faster than the autoxidation dissociation rates at 25°C. Autoxidation and dissociation processes are intimately related, so that oligomeric protein dissociation promotes the increase of autoxidation rate and vice versa. The effect of dissociation is to change the kinetic character of the autoxidation of hemes from monoexponential to biexponential, whereas the reverse change is not as effective. This work shows that DLS can be used to follow, quantitatively and in real time, the kinetics of changes in the oligomerization of biologic complex supramolecular systems. Such information is relevant for the development of mimetic systems to be used as blood substitutes. Abstract | Full Text | PDF (222 kb) |
Copyright © 2008 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 94, Issue 5, 1575-1588, 1 March 2008
doi:10.1529/biophysj.107.119651
Biophysical Theory and Modeling
Lydia M. Contreras Martínez1, Ernesto E. Borrero Quintana, Fernando A. Escobedo
,
and Matthew P. DeLisa
, 
School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, New York
Address reprint requests to Fernando A. Escobedo, School of Chemical and Biomolecular Engineering, Cornell University, Ithaca, NY 14853. Tel.: 607-255-8243; Fax: 607-255-9166 or to Matthew P. DeLisa at the same address. Tel.: 607-254-8560; Fax: 607-255-9166.Recent advances in molecular biology techniques have led to the development of many powerful research tools that have been key in providing detailed knowledge of the principles underlying highly specific interactions between cellular proteins. Of particular note is the protein fragment complementation assay (PCA), wherein a reporter protein is split into individual fragments that by themselves remain inactive but upon reassembly under the appropriate cellular conditions yield the original, properly folded and active protein structure. For example, the yeast two-hydrid system, based on the functional reconstitution of the split Gal-4 transcriptional activator 1, has facilitated the systematic determination of proteome-scale protein-protein interaction networks within numerous organisms, including humans 2, Drosophila melanogaster3, Caenorhabditis elegans4, Saccharomyces cerevisiae5,6, vaccinia virus 7, and Escherichia coli bacteriophage T7 8.
The increasing interest in protein-protein interactions has motivated the search for additional split reporter proteins that can be used for different applications and in other systems besides yeast 9. Examples include split green fluorescent protein (GFP) and its spectral variants yellow FP and cyan FP 10,11, ubiquitin 12, murine dihydrofolate reductase (DHFR) 13, β-lactamase 14,15, and firefly luciferase 16. The use of these split proteins is highly convenient, since the reconstituted activity of each is directly measurable by fluorescence or other well-established enzymatic assay. Numerous successes notwithstanding 17,18, the use of split proteins can be limited in usefulness because of the slow folding kinetics and formation of misfolded aggregates associated with the reassembly process of the fragments 11,17. For instance, whereas GFP activity can be detected in minutes, the two split fragments that result when the protein is dissected near the middle of the sequence fail to associate and reassemble when expressed in bacteria 11. A similar drawback has also been observed in other split systems like DHFR, β-lactamase, and ubiquitin, where folding is dramatically (or completely) inhibited upon protein fragmentation. In most cases, the addition of two interacting proteins to the split halves dramatically improves the kinetics of split protein reassembly, presumably by nucleating the reassembly reaction 11. However, even when fragments are each fused to strongly interacting leucine zippers (KD ≈ 1–20μM), folding and activity of the reconstituted protein are achieved only after 1–2 days 19. This inefficiency hinders the effective application of these detection systems on biologically relevant timescales. In an effort to increase the self-assembly efficiency of protein fragments in the absence of any interacting partners, a number of strategies have been employed, including 1), the identification of “permissive” split sites along the protein sequence using circular permutation 20,21, structure-guided design 14,22, or bioinformatic and theoretical analyses 23; and 2), the optimization of a target sequence for more efficient splitting/reassembly using directed evolution 24,25. In the majority of cases, split sites are often selected in regions away from the catalytic site, in areas containing flexible loops that can typically tolerate amino acid insertions, or in linker regions that separate naturally occurring functional domains 17. However, given that a few key residues known as the folding nucleus provide a significant driving force in the folding of a protein 26,27,28, we hypothesized that the way in which this nucleus is distributed between fragments determines reassembly efficiency of split proteins. In support of this notion, it has been observed that introduction of residues into the folding nucleus that lower its stability can dramatically slow the folding process 29.
To test our hypothesis, we have developed an on-lattice minimalist coarse-grained protein model to address how the reassembly kinetics, thermodynamic stability, and folding mechanism of a lattice model protein are affected upon splitting. Specifically, we designed several two-fragment systems derived from a well characterized 48-mer that is known to follow a nucleation-driven folding mechanism 30. Each of these split 48-mers was analyzed to determine the extent to which the reassembly process was impacted by differential partitioning of the folding nucleus between the two fragments. Our results suggest that a balanced distribution of folding nuclei amino acids between protein fragments is essential for efficient reassembly; this result was corroborated by the behavior observed for the reassembly process of a second set of split proteins derived from a 64-mer model protein. Collectively, these results provide new insights into the thermodynamic and kinetic aspects underlying protein fragment complementation and should prove extremely useful in the forward design and engineering of new split proteins.
To explore protein fragment complementation experimentally, three two-protein fragment systems (N-split, Mid-split, and C-split) were created by splitting a model 48-mer protein, namely 48-1 (TSKRQQPYPMSLGSPFIRIPMIGPRPRMRLLILLMGYPKRGRSGGGLF) 31, in three different locations (Fig. 1). Folded structures and a detailed thermodynamic and kinetic characterization for the parental 48-1 model protein sequence can be found elsewhere 31,32,33. In the N-split case, the sequence was split near the N-terminus between amino acids 16 and 17, creating one 16-residue fragment and a second 32-residue fragment. In the Mid-split case, the sequence was split in the middle between residues 24 and 25, creating two equal-sized fragments. In the C-split case, the sequence was split C-terminally between residues 32 and 33, creating one 16-residue fragment and a second 32-residue fragment. The symmetry shared by the N- and C-split systems was created so that the two fragments in each system were of equal length (i.e., each system has one 16-mer and one 32-mer fragment). This was done to eliminate any effect on folding due to variations in chain size since it was unclear at the outset how this might impact reassembly.
To model the folding process, we adopted an on-lattice minimalist protein model in which the configuration of each protein chain evolves according to a canonical Monte Carlo (MC) algorithm 34. Briefly, space was discretized into a three-dimensional cubic lattice. Proteins were represented as self-avoiding chains, where each bead represents an amino acid with the bonds between the amino acids having uniform length equal to the lattice spacing (σ). Amino acid interactions were simulated by a Miyazawa-Jerningan contact energy potential 35 that takes into account implicit solvent effects and side-chain character. Conformational sampling was performed through a set of MC moves based on the Verdier-Stockmayer algorithm that mimics the diffusive movement of the amino acids during the folding process and includes 1), tail moves of one of the end beads to one of the available four neighboring sites; 2), corner flips for beads characterized by a right angle between directions to both contour neighbors; and 3), crankshaft moves of bead pairs located at the bottom of a U-turn 36. Relative to Verdier-Stockmayer moves, translation of a randomly selected chain was attempted after each MC step with a priori probability ≤10−4, consisting of adding either +1 or −1 (randomly chosen) to a random axis coordinate of all segment positions. Although this choice of translational move probability has no impact on thermodynamic averages, it affects the apparent kinetic dynamics of the system; for this reason, we only considered relative comparisons of real time kinetics between simulated dynamics for the 48-mer and the split systems 37.
To capture the specific chain topology of the folded state, two main parameters were used: the native energy (Enat), which records the sum of the energies of all interresidue contacts, and the similarity parameter (Q), which represents the number of native contacts formed divided by the total number of native contacts that describes the folded structure of each system 38. According to this convention, Q=1 represents the native (folded) conformation and Q=0 represents the highly extended (unfolded) protein. As previously reported, the configuration corresponding to the folded state of the 48-mer structure was distinguished among all other visited configurations by the formation of 57 native contacts and a minimum energy value of −20.24kBT31,32,33.
To make the association event more likely to occur without unduly constraining the conformations of the individual chains, space was restricted to a cubic box of 12σ length units, corresponding to a volume fraction of chains of ∼3%. However, given the small size (3×4×4) of the folded structure, the small size of each chain (16–32 residues), and the small number of chains involved (two), this spatial restriction was closer to a diluted regime, since the chains had plenty of free space to move. It is also worth noting that by encaging the system, we essentially disregarded the diffusion process that needs to occur before the two chains come near each other; instead, we consider a restricted open space where the local environment was crowded enough to allow for interchain interactions but precluded the chains separating to an infinite distance.
To collect kinetic data, simulations were run up to the point where the native structure of the system was observed for the first time, and this time was recorded as the folding time. In the case where no folding was observed, simulations were run for a maximum of 5×108 MC steps. Data from each simulation was obtained by taking the mean folding time (MFT) values over 500 independent runs in the canonical ensemble, each one starting from a different unfolded structure (Q≤0.2). Results were determined to be statistically invariant, since the data was not significantly affected when additional runs beyond 500 were included in each simulation.
The thermodynamics of the single and multichain systems were studied by employing replica exchange MC (REMC) sampling 36,39 combined with the multihistogram reweighting method (MHR) 40. REMC was used to alleviate problems related to the sampling of a rugged free-energy landscape, in which the polypeptide chains could be temporarily trapped at low temperature. Protein folding was simulated by running several parallel replicas (M), each at a different temperature (Ti). The reduced temperature, T, was normalized by the reference temperature, To, such that kBTo represented the energy unit pertinent to the system. Relative to Verdier-Stockmayer and translation moves, swap moves between systems of different temperatures were attempted after each MC step with a probability ≤0.05. In most calculations, the number of replicas was 9, with T ranging between 0.1 and 0.5. Details of the thermodynamic analysis are given in 32. By using the REMC-MHR method, data from all replicas were combined and analyzed, minimizing the error in the estimation of the density of state function [Ω(E)] and facilitating the calculation of thermodynamic quantities over a wide range of temperatures, such as the specific heat (Cv) via Eq. (1), and free energy via Eq. (2):
![]() | (1) |
![]() | (2) |
For this study, we chose the model 48-mer protein, 48-1, because its thermodynamic behavior, folding pathway, and transition state have been characterized in detail 31,32,33. The 48-1 sequence was originally designed by Shakhnovich and co-workers to model a well designed sequence that exhibits a stable, fast-folding structure and an all-or-none transition between clearly distinguishable native and unfolded states 31. To generate split lattice model proteins, we dissected the 48-1 sequence at three positions: between residues 16 and 17 (N-split), 24 and 25 (Mid-split), and 32 and 33 (C-split) (Figure 1a). The minimum-energy folded structure recovered from a large MC simulation for each of the N-, Mid-, and C-split systems (Figure 1b) was identical to that reached by the unsplit 48-1 chain (data not shown). However, whereas unsplit 48-1 was characterized by 57 native contacts, the folded state for all split cases was characterized by 58 native contacts, since the additional contact lost upon the excision of the full chain needed to reform between the last amino acid of the first fragment and the first amino acid of the second fragment. Additionally, as a result of this new native contact, the energy values for the N-, Mid-, and C-split systems were −20.43, −20.65, and −20.62kBT, respectively, compared to −20.24kBT for the unsplit 48-mer. It is also worth noting that the split sites for the N-, Mid-, and C-split systems were involved in five, three, and two total native contacts (including the split pair), respectively, that contributed locally to ∼8%, 5%, and 4%, respectively, of the total native energy.
More recently, it was shown that the 48-1 protein folds according to a classical nucleation mechanism, whereby a core of native contacts forms at an early stage of the process and causes the protein to rapidly collapse to more compact nativelike conformations that lead to the fast rearrangement of its residues into the final folded structure 34. These same authors reported that the nucleus was composed of several mostly hydrophobic amino acids that have >60% probability of forming native contacts in the transition-state intermediates; these residues (residues 13, 16, 17, 19–24, 26–31, and 34–47 in Figure 1a) form a core at the center of the folded structure. It is important to note that in the Mid- and C-split cases, folding nuclei residues are well distributed between fragments and participate in a significant number of interchain native contacts (InterC) as seen in Figure 1cd. In contrast, for the N-split case, the folding nuclei residues are disproportionately distributed between fragments and none of these are involved in interchain native contacts (Figure 1cd).
The effect of splitting on thermodynamics was studied by determining the transition temperature (Tmax) for the unsplit 48-1 and each multichain system. A plot of heat capacity as a function of temperature revealed a single, strong peak corresponding to the folding temperature (Tmax) for the 48-1, N-, Mid-, and C-split systems (Fig. 2), indicating a single-phase conformational transition. Relative to the single 48-mer chain, all of the two-fragment systems exhibited lower folding temperatures. Normalized transition temperatures were found to be Tmax/Tf=1 for the 48-mer, Tmax/Tf=0.956 for the C-split system, Tmax/Tf=0.937 for the Mid-split system, and Tmax/Tf=0.926 for the N-split system. Thus, whereas the unsplit 48-mer remained stable at a higher temperature, thermal denaturation occurred at lower temperatures when protein folding was reconstituted from multiple fragments. These data also suggest that thermal denaturation was dependent on the choice of split site, as evidenced by the difference in folding temperatures between the entirely symmetric N- and C-split systems.
Whereas we did not explicitly test the effect of protein concentration in this study, the decrease in thermal stability observed in the context of split fragments was consistent with the earlier observation that folding temperature decreased as the concentration of protein chains increased in a system designed to mimic protein aggregation 41. The observed decrease here was related to both 1), an increase in the frequency with which the protein's configurational energies were close to that of the unfolded state (Q≈0); and 2), a decrease in the frequency with which the multichain system explored nativelike configurations (Q≈1.0) during the folding process. For a more detailed analysis, let us assume a pseudoreaction of the following form for the unsplit 48-mer:
![]() | (3) |
![]() | (4) |
![]() | (5) |
![]() | (6) |
![]() | (7) |
![]() | (8) |
, since the number of molecules does not change upon folding (Δn=0) and the entropy is independent of the chain's center of mass. In contrast, the change of translational entropy upon folding for the split processes is given by (with Δn=−1 and assuming V≫1):![]() | (9) |
is obtained; the fact that this change is always negative indicates that, relative to the folding process of the unsplit 48-mer, the folding process of the split systems results in an overall unfavorable entropic change (i.e.,
).In addition to the entropic differences between the unsplit and split systems, the enthalpy change associated with the folding process (computed from the difference between the average configurational energy of the folded (EF) and unfolded (EU) states) of the split systems is also unfavorable relative to the enthalpy change associated with the folding of the single 48-mer chain. In this case, ΔE=EF−EU increases for the split proteins because the energy of the unfolded state decreases with the number of protein fragments. The lower energy of the unfolded state in split systems can be rationalized by the fact that protein fragmentation allows more freedom for some favorable contacts to form that are not able to form in the unsplit 48-mer (where all amino acids are connected). As shown in Fig. 3, the multichain system can sample configurations around the unfolded state for a range of energies that are not available for the unsplit system. In these plots, free energy landscapes for the unsplit and split chains are projected over the plane of native energy and the fractional nativeness. Note that the configurational energy refers to the total energy of the system (i.e., sum of the configurational energy for chain 1 and chain 2, and that between the two chains). If we assume that the folded state has essentially the same average energy (EF) for the unsplit 48-mer and split systems, the difference in energy between these two processes is always positive, as shown below:
![]() | (10) |
The thermodynamic destabilization of the assembled split chains is also reflected by their higher free energies (ΔA) relative to the free energies observed in the case of the unsplit 48-mer (Fig. 3). ΔA is defined as the difference in free energy change between the folded state (AF) and the unfolded state (AU), i.e., ΔA=AF−AU. Using Eq. (10), the difference in free energy changes between the unsplit 48-mer and the split-chain systems can be found by Eq. (11) (i.e., Eq. (13)−Eq. (12)):
![]() | (11) |
![]() | (12) |
![]() | (13) |
To determine the effect of temperature on the relative folding kinetics of the different split protein systems, we calculated the mean folding time for the 48-mer and N-, Mid-, and C-split systems over a wide range of temperatures. The optimum temperature (Topt), defined as the temperature at which a given system folds fastest, was ∼0.23 for the 48-mer, 0.22 for N-split, 0.23 for Mid-split, and 0.22 for C-split (Fig. 4). The MFTs for the N- and Mid-split proteins were approximately three and two times slower, respectively, than that of the 48-mer at their corresponding Topts (Fig. 4). Importantly, the total number of independent runs where the native structure formed within the maximum simulation time (5×108 MC steps) was 500 out of 500, or 100%, for each system. This percentage was defined as the folding frequency (FF). The apparent folding rate (AFR), defined as the ratio of FF to MFT at Topt, was determined to be 1.92×10−5 for N-split, 3.57×10−5 for Mid-split, and 6.37×10−5 for C-split.
The slower kinetics of fragment reassembly, relative to the folding of a single chain, is not entirely surprising. Intuitively, this could be partially reasoned by the fact that all the residues that need to come into contact to form the folded structure in a single chain are in closer proximity by virtue of their interconnectivity; this is strikingly different from the case of two unconnected chains, where residues that have to associate to enable the formation of native contacts can move independently in space. Thermodynamically, the increase in folding times for the split fragments relative to the folding time of the single 48-mer chain is also not surprising, since it can be argued that the reassembly of split fragments (represented by Eq. (4)) has a larger free-energy barrier (ΔA#=ATS−AU) and thus should be slower than the folding process for the unsplit 48-mer (represented by Eq. (3)). This conjecture can be reached by assuming that the folding “transition state” (TS) is roughly independent of whether or not the protein is split, the “folded” state (F) on the righthand sides of Eqs. (3) can be replaced by the TS. Although the assumption of TS isomorphism is not generally justified, since the TS should depend on the location of the splitting site, it is sensible to expect that the relative decrease of the free energy of the unfolded state (embodied by Eq. (13)) in any two-chain system will also tend to increase the barrier to folding (for the same underlying physical reasons).
Two aspects of the kinetic data shown in Fig. 4 are unexpected and intriguing: 1), the observation that a much smaller change in folding kinetics exists between the 48-mer and the C-split system (relative to the 48-mer and the other split systems), to the extent that there is no significant change in the folding times of these two systems at temperatures neighboring their respective Topts; and 2), the observation that at Topt the N-split folds 46% slower than the Mid-split and 70% slower than the C-split, despite the complete symmetry of these two systems. These trends prevailed over most of the temperature range tested for each system. It is also worth noting that the fragmentation itself did not dramatically retard folding in the case of the C-split system. This can best be attributed to the spatial constrictions that were placed on this moderately confined system (3-D cage of size 12σ), where a crowded environment relative to open space was created to ensure association between the different fragments. Note that it has been previously shown that, relative to folding in open space, the folding kinetics of this particular unsplit 48-mer remain unchanged when confined within a cubic box of size >10σ unit length 32,33.
The differences in folding kinetics can be rationalized thermodynamically by comparing the differences in free-energy barriers observed between the 48-mer and the different split proteins. For instance, the similarity in folding kinetics between the unsplit 48-mer and the C-split system is reflected in Fig. 3. These data show that, although ΔA# is larger for the C-split than for the unsplit system, these two systems display approximately the same TS dividing surface. Likewise, the much slower folding kinetics between the Mid-split and especially the N-split system relative to the unsplit 48-mer is reflected by the displacement of the TS toward the folded state (i.e., toward states of lower configurational energies, where it is more difficult to be accessed). The shift in the transition-state dividing surface observed for the N- and Mid-split systems, but not the C-split, indicates that the reassembly of these systems takes place via a different folding mechanism that appears to be slower. Collectively, our kinetic data and thermodynamic analysis of free energies suggest that in this confined system, the degree of retardation observed as a result of having two separate fragments is modulated by the location of the splitting site with respect to the folding nucleus.
To further explore the differences underlying the observed trends in MFTs, we plotted the free-energy landscape of the 48-mer, N-, Mid-, and C-split systems at their respective Tmax as a function of the total contact energy and the similarity parameter Q (Fig. 5). In the case of the split-fragment systems, the parameter Q included native contacts that formed within the same chain (intrachain) as well as those formed between different chains (interchain) (Fig. 1, IntraC and InterC, respectively). The free energy was obtained from Eq. (2). Consistent with previous work, the 48-mer exhibited two free-energy minima corresponding to the unfolded (high energy, Q≈0) and folded (low energy, Q≈1) states that were connected by a relative narrow passage wherein the transition state was identified as a saddle point (Figure 5a). The narrowness of the connecting region between the unfolded and folded states was characteristic of well designed proteins that exhibit a minimum number of misfolded (i.e., low-energy, low-Q-structure) states 42.
The fact that the same lowest energy configuration state was observed in all the landscapes confirmed that all systems shared the same folded state (Table 1). Moreover, since the additional contact observed in the split systems was favorable, the total configurational energy of these systems decreased with respect to the unsplit 48-mer. It is also important to stress that this folded state remained unique and was only achieved by the reassembly of the two chains; this is implicitly suggested by Figure 5bd, where only one low-energy state with a large number of native contacts was observed. The absence of multiple local energy minima in a region of a large number of native contacts supports the observation that single fragments by themselves remained unstructured and high in energy relative to the state they formed upon assembly. These differences separated these landscapes from those observed in a multichain aggregation system 41, where the appearance of low-energy/high-Q states suggested that each chain folded independently and that the formation of interprotein contacts only inhibited their separate folding process and resulted in aggregated, high-energy/low-Q states.
| Table 1 Native contact pairs observed in the 48-mer folded structure |
| NC pair (i, j) | NC pair (i, j) | ||||||
|---|---|---|---|---|---|---|---|
| NC pair code | I | j | NC pair code | i | j | ||
| 1 | T1 | R4 | 31* | I19 | M28 | ||
| 2 | T1 | R40 | 32* | I19 | L30 | ||
| 3 | T1 | R42 | 33* | I19 | L34 | ||
| 4 | S2 | K39 | 34* | P20 | R27 | ||
| 5 | K3 | Q6 | 35* | P20 | M35 | ||
| 6 | K3 | Y8 | 36* | P20 | Y37 | ||
| 7 | R4 | P9 | 37* | M21 | P24 | ||
| 8 | R4 | P15 | 38* | M21 | P26 | ||
| 9 | Q5 | F16 | 39* | M21 | L30 | ||
| 10 | Q5 | R18 | 40 | I22 | L31 | ||
| 11 | Q5 | P20 | 41* | I22 | M35 | ||
| 12 | Q5 | R40 | 42 | I22 | L47 | ||
| 13 | Q6 | R27 | 43* | G23 | G36 | ||
| 14 | Q6 | K39 | 44 | G23 | G46 | ||
| 15 | P7 | R18 | 45* | P24 | Y37 | ||
| 16 | P7 | M28 | 46 | R25 | P38 | ||
| 17 | P9 | R18 | 47* | P26 | R29 | ||
| 18 | M10 | P15 | 48 | R27 | P38 | ||
| 19 | M10 | I17 | 49 | L31 | L34 | ||
| 20 | S11 | S14 | 50 | I32 | L47 | ||
| 21 | L12 | I17 | 51 | L33 | F48 | ||
| 22 | L12 | L33 | 52 | M35 | F48 | ||
| 23* | G13 | F16 | 53 | G36 | G41 | ||
| 24 | G13 | G44 | 54 | G36 | G45 | ||
| 25 | G13 | F48 | 55 | Y37 | R40 | ||
| 26 | S14 | S43 | 56 | G41 | G44 | ||
| 27 | P15 | R42 | 57 | G45 | F48 | ||
| 28 | F16 | M35 | 58† N-split | F16 | I17 | ||
| 29 | F16 | G41 | Mid-split | P24 | R25 | ||
| 30* | I17 | L34 | C-split | I32 | L33 | ||
| All native contacts (NC) found in the folded structure are listed. The pair code numbers for NCs (left column) correspond to the same numbering (1–58) used in Fig. 7 to represent NCs. The NC pair (i, j) describes an interaction between amino acids i and j, where i and j indicate the type of amino acid and its position in the 48-mer sequence (e.g., pair 1 describes an interaction between the threonine found at position 1 (T1) and the arginine found at position 4 (R4) in the unsplit 48-mer sequence). |
| * NCs that form the critical folding nuclei (listed in Table 2). † One extra contact describes the folded structure of the split systems as a result of the additional link that needs to form at the site of fragmentation. |
One striking difference observed in the folding landscape of the 48-mer (Figure 5a) when compared to the split proteins (Figure 5bd) was the spread of the free-energy minima region neighboring the unfolded state across a wider range of low Q values, closer to the transition state region of the parent 48-mer protein. This observation was significant, since the extent to which this low-energy, misfolded (low energy/low Q) region was amplified directly correlated with the retardation observed in the kinetics of the reassembly process. That is, whereas the free-energy landscape of the 48-mer did not change significantly when splitting the protein C-terminally (Figure 5a versus d), a much more diffusive (i.e., broad and rough) passage from the unfolded to the folded state resulted when splitting the 48-mer near its N-terminus (compare Figure 5ab).These data suggest that the efficiency of the reassembly process was decreased by the entrapment of protein fragments in misfolded configurations. Given that slower folding kinetics and a diffusive free-energy landscape were observed for the N-split relative to the C-split system, we hypothesized that the shared distribution of critical core residues between the two fragments is essential for efficient reassembly. This hypothesis is supported by the observation that the distribution pattern of critical core residues is the primary difference between the N-split and C-split fragments.
The inefficiency in folding observed for the N-split relative to other systems could have resulted from lack of association between the two fragments (i.e., the fragments never came together) or, if they did associate, from an inability of the fragments to form productive interactions. Since the parameter Q includes both interchain and intrachain native contacts, the free energy landscapes shown in Fig. 5 do not distinguish between misfolded configurations caused by unproductive interactions between the two fragments and those caused from unproductive interactions among individual fragments. To decouple this effect, we plotted contours of the number of interchain contacts as a function of the similarity parameter, Q, for the N-, Mid-, and C-split systems at their respective Tmax (see Fig. 1 in Supplementary Material ). Two highly populated regions were observed in these landscapes. The first region, representing a large number of interchain contacts neighboring the folded state (high InterC, high Q), confirmed that access to the folded state was highly dependent on associations between chains. The second region represented a significant (but not high) number of interchain contacts neighboring the unfolded state (mid-InterC, low Q) and was much more populated for the N-split than for the Mid- and C-split systems. This observation suggests that although associations between fragments occurred for all the systems, the occurrence of these in the N-split case was less likely to result in productive interactions that would lead to the folded state. Taken together, these data support the notion that the efficiency of protein reassembly depends to a great extent on the site at which the protein is split.
Given that the formation of the critical nucleus is key for folding efficiency in the case of a classical nucleation folding mechanism, as is the case for the 48-mer 32, we next analyzed how the dissection of amino acids in the nucleus upon protein fragmentation affected reassembly and folding. Specifically, we plotted landscapes of the critical core residues (Table 2 and Figure 1d) as a function of the total number of native contacts (Q). In the N-split case, a region with a high number of critical contacts and a low number of total native contacts was observed (Figure 6a), but not in the case of the Mid- or C-split proteins (Figure 6bc). These data indicate that the more difficult transition to the folded state observed for the N-split protein stems from the formation of the full core in a single chain that trapped the system in a region of highly misfolded states. Further analysis of folding “snapshots” of the N-split system during a typical folding trajectory suggests that intrachain formation of the core leads to preassembly of the largest fragment (chain 2) into a semistable structure that prevents the efficient incorporation of the smallest chain (chain 1) (Figure 6a). This type of isolated preassembled structure was clearly observed in the snapshots (Figure 6a, i and ii), where these chains exhibited minimum association with each other. In stark contrast, the shared formation of the core between the Mid- and C-split systems resulted in transition-state structures of highly interacting fragments that more readily formed the rest of the native contacts, leading to efficient assembly of the folded structure (Figure 6bc). However, although these structural configurations were characterized by the formation of interchain native contacts, the part of the fragments that was away from the contact point between the two chains remained highly extended. The structural patterns reflected in these snapshots were repeatedly observed throughout the 10–15 sets of data that we analyzed for each system (data not shown).
| Table 2 Distribution of native contacts forming the folding nuclei in split proteins |
| NC pairs (i, j) | N-split | Mid-split | C-split | ||||||
|---|---|---|---|---|---|---|---|---|---|
| i | j | InterC | IntraC | InterC | IntraC | InterC | IntraC | ||
| P20 | M35 | 2 | X | X | |||||
| M21 | P24 | 2 | 1 | 1 | |||||
| I19 | L34 | 2 | X | X | |||||
| P20 | Y37 | 2 | X | X | |||||
| I19 | L30 | 2 | X | 1 | |||||
| I22 | M35 | 2 | X | X | |||||
| G23 | G36 | 2 | X | X | |||||
| P20 | R27 | 2 | X | 1 | |||||
| M21 | L30 | 2 | X | 1 | |||||
| M21 | P26 | 2 | X | 1 | |||||
| I19 | M28 | 2 | X | 1 | |||||
| G13 | F16 | 1 | 1 | 1 | |||||
| P24 | Y37 | 2 | X | X | |||||
| I17 | L34 | 2 | X | X | |||||
| P26 | R29 | 2 | 2 | 1 | |||||
| Most probable native contacts found in the transition-state ensemble (i.e., the folding core/nuclei) for folding of the 48-mer sequence at Tf=0.27 are listed in order of decreasing probability. The NC pair (i, j) describes the interacting pair, where i and j entries indicate the type of amino acid and its position in the unsplit 48-mer sequence. The distribution of these contacts in all the split proteins is marked as follows: interchain contacts (InterC), which involve interacting residues from both chains are marked by an “X” and intrachain contacts (IntraC), which involve interacting residues within the same chain, are marked by a number (1 or 2) that specifies the fragment where the interaction takes place. Fragments 1 and 2 for each system correspond to those illustrated in Fig. 1. |
The simple thermodynamic model presented above (see Eqs. (5)) was used to rationalize the differences in behavior between the case where one of the two chains preassembles (such as the N-split case) and the case where both chains exhibit more cooperative folding behavior (such as the C-split case). For this analysis, we assume that the “unfolded” state is the one in which the two chain fragments have already collapsed or associated, if strongly inclined to do so. Based on the typical snapshots analyzed for the folding trajectory of the N-split case (Figure 6a), we assume that in the unfolded state, chain 1 (the small chain) has an open conformation (with
), chain 2 is prefolded (with ΔSConf→0), and the two chains tend to be separate (with
); in this case, the total (conformational and translational) entropy can be described as ΔSN-split/kB=−N1−ξ−lnV. Consistent with Fig. 6, for the C-split case, we assume that in the unfolded state both chains are not collapsed but tend to be associated (with ΔSTrans→0); in this case, the total entropy is purely conformational and can be described as: ΔSC-split=−(N1+N2)kB. Given these expressions for entropy, the free-energy changes upon folding, for the N-split and C-split cases, can be described as
![]() | (14) |
![]() | (15) |
![]() | (16) |
note that this result is consistent with Fig. 3, where we observed that average unfolded-state configurational energies were lower in the case of the N-split than in the case of the C-split system. Additionally, since our simulation results showed that the folded N-split protein was less stable than the folded C-split protein, we conclude that ΔAN-split>ΔAC-split. Based on these results, the righthand side of Eq. (16) must be positive. In this case, it appears that the first two (positive) terms in the lefthand side of Eq. (16) dominate, so thatΔAN-split>ΔAC-split. It is important to note that this result indicates that the driving force for folding is smaller for the N-split system than for the C-split system. Note, however, the nontrivial interplay of the interactions: 1), the prefolding of a chain fragment favors folding on entropic grounds (since the unfolded states start at lower entropies, e.g., more ordered) but disfavors folding on energetic grounds (since unfolded states are found at lower energies, e.g., closer to the folded state); and 2), the interchain association favors folding on entropic grounds (by reducing translational entropy) but may disfavor it if the associated (unfolded) states are found at very low energies.To obtain insight into the mechanism by which the two fragments assemble, we examined the order in which all native contacts formed over 500 different folding trajectories for each split system. It was observed that the first native contacts to form (i.e., the ones with longer contact waiting time, τf) are those corresponding to the critical core (Fig. 7). Although a precise folding mechanism for the split fragments cannot be inferred by these results alone (i.e., specific transition states are not identified), these data indicate that 1), the same critical core (Table 1) of native contacts seen for the parent protein forms even in the cases when the protein is split; and 2), early formation of this set of native contacts is critical to the folding pathway of the split fragments.
Inspection of these data also suggests that the assembly mechanism of the N-split differs significantly from that of the Mid- and C-split cases. For instance, two separate stages were observed in the reassembly process of the N-split protein (Figure 7a). During the first stage (at longer τf), a set of critical native contacts preassembled in the longer chain (chain 2), whereas the smaller chain (chain 1) remained completely unincorporated (no interchain contacts were formed) and unfolded (no native contacts were observed). Then, during a later stage (at shorterτf), the folding process was completed when the smaller chain was incorporated into this already preassembled structure to form the rest of the native contacts. It is important to note that the coassembly stage did not take place until a long time (relative to the total folding time) after the folding process had started. A much different folding process, closer to the one observed for the unsplit protein, was observed for the Mid- and C-split cases. In these systems, both chains coassembled from the beginning of the folding process and jointly proceeded to the folded state. The fact that folding for the N-split system was significantly inhibited (relative to the parent 48-mer protein and to the other two split protein systems) further supports the notion that folding is less efficient when individual folding of one of the fragments (i.e., the nuclei-containing fragment) occurs. The mechanistic insight obtained by this analysis is consistent with our interpretation of the folding landscapes and snapshots shown in Fig. 6.
It is worth noting that in all the split protein cases, contact 58 (where each protein is split; see Table 1) was one of the very last native contacts to form in the folding process, as reflected by the very short τf associated with its formation (Fig. 7). Interestingly, all other native contacts that were locally affected upon protein fragmentation in each system also formed at relatively short τf, toward the very end of the folding process; these contacts included pair codes 9, 23, 28, and 29, pair codes 45 and 37, and pair code 50 for the N-, Mid-, and C-split systems, respectively (Table 1). Additionally, although contact 28 was one of the last to form in the N-split system, this contact was the first to form in both the Mid- and C-split systems. Most noteworthy are the observations that reattachment at (or near) the split site occurred late in all the split folding processes and that formation of interchain nuclei contacts occurred early in the cases of productive folding (i.e., the Mid- and C-split cases). This confirmed that efficient folding depends on the early “gluing” of the fragments specifically by the early interchain formation of folding nuclei contacts. Furthermore, productive folding appears to be independent of the early reconstitution of the original full-length 48-mer sequence, by reattachment of the fragments at the site where they were split.
To test whether a shared folding nucleus contributed to the reassembly efficiency of proteins other than the 48-mer, we analyzed a model 64-mer 41,43,44. It is important to note that, like the 48-mer, this 64-mer also folds according to a classical nucleation mechanism where the core of critical native contacts that forms at an early stage of the folding process is composed of residues 2, 3, and 24–37, which have >90% probability of forming native contacts in the transition-state intermediates 34. Also noteworthy is that in contrast to the folding nucleus of the 48-mer, the amino acid composition of the folding core of the 64-mer is only 50% hydrophobic, and its location is on the side (as opposed to the center) of the folded structure. Additionally, given the larger size of this sequence relative to the 48-mer, it exhibits a more complex and therefore slower pattern of folding, where 81 native contacts characterize the folded structure.
To evaluate the importance of the folding nucleus in the reconstitution of a split 64-mer, two symmetric two-fragment systems, each containing a 27-mer and a 37-mer fragment, were derived (Fig. 8). N-split64 was derived by splitting the 64-mer near the N-terminus of the sequence between residues 27 and 28, whereas C-split64 was derived by splitting the sequence toward the C-terminal end of the sequence between residues 37 and 38. The additional native contact that restores the amino acid connection lost upon excision in each fragmentation case changes the native energy corresponding to the parent 64-mer from −30.13kBT to −29.93kBT and −30.22kBT for the N-split64 and C-split64 systems, respectively. It is important to note that all of the 13 native core contacts of C-split64 form within the larger of the two chains, whereas 8 out of 13 core contacts (>60%) of N-split64 form between the two fragments, and only 5 out of 13 core contacts form within a single chain (two contacts in the shorter chain and three contacts in the longer chain (see Supplementary Material, Table 1S ).
Given the distribution of core contacts for the N- and C-split64, we hypothesized that folding would be more efficient in the case of the N-split64 due to the higher number of interchain critical native contacts in this system relative to the C-split64. Indeed, the resulting MFT, calculated as the average over 100 simulations at T=0.22 (a temperature below the Tmax for the two systems) for protein reassembly within 5×108 MC steps, was 3.10±0.48×108 for the N-split64 and 3.99±0.82×108 for the C-split64. Both split cases exhibited slower folding kinetics relative to that of the unsplit 64-mer (MFT=1.38±0.08×108) at the same temperature. Moreover, the N-split64 protein was observed to reassemble in 66 out of 100 simulation trials (FF=66%) with an AFR of 2.13×10−7, whereas the C-split64 protein only reassembled 59 times out of 100 trials (FF=59%) with an AFR of 1.48×10−7, indicating a 31% decrease in folding for the C-split64 relative to the N-split64. It is also important to note that a decrease in thermal stability was observed upon fragmentation of the 64-mer, as reflected by the much lower Tmax of the N-split64 (Tmax=0.23) and C-split64(Tmax=0.22), relative to that of the unsplit 64-mer (Tmax=0.27). Furthermore, a small and broad Cv peak is observed for the split systems, which implies an increase in near-native conformations. This effect suggests that their thermal transition is less cooperative 42. However, the split systems still follow a two-state mechanism, which is evidenced by the presence of a single Cv peak. Thus, the split 64-mer systems exhibited the same correlation between thermal stability and folding kinetics as was observed for the split 48-mer system.
In this work, we used two relatively simple model systems to obtain insight about how the choice of split sites affects the thermodynamics and kinetics of protein reassembly and folding upon fragmentation. Specifically, we focused our studies on understanding how the splitting of critical native contacts, which are located in the critical core that leads to folding, contribute to productive folding. In general, our results showed that the folding process for different split fragment systems is slower relative to the case of an unsplit protein, consistent with experimental observations 10,11,17. Furthermore, the nature and magnitude of reassembly retardation was highly dependent on the distribution of the critical nuclei between the two split fragments. Strategic splitting of the critical core was shown to 1), prevent the permanent preassembly of an individual fragment that would otherwise inhibit the assembly of the two chains; and 2), drive the formation of interchain native contacts that lead to productive folding. The importance of a shared folding core was particularly evident by the slower folding kinetics that were observed in the N-split system, where the critical core was localized in a single fragment, as compared with the C-split system, where the critical core was more equally shared between the two fragments.
Although a precise characterization of the folding mechanism or of the transition states for the N-, Mid-, and C- split systems was not determined, we observed that the concentration of the core native contacts in a single fragment changed the folding mechanism from a cooperative coassembly process, where the two fragments fold together, to a two-step assembly process, where an individual chain preassembles and then forms interchain connections with the second chain. Coassembly was observed for the