| Analyzing Forced Unfolding of Protein Tandems by Ordered Variates, 1: Independent Unfolding Times Biophysical Journal, Volume 93, Issue 4, 15 August 2007, Pages 1100-1115 E. Bura, D.K. Klimov and V. Barsegov Abstract Most of the mechanically active proteins are organized into tandems of identical repeats, (), or heterogeneous tandems, ––…–. In current atomic force microscopy experiments, conformational transitions of protein tandems can be accessed by employing constant stretching force (force-clamp) and by analyzing the recorded unfolding times of individual domains. Analysis of unfolding data for homogeneous tandems relies on the assumption that unfolding times are independent and identically distributed, and involves inference of the (parent) probability density of unfolding times from the histogram of the combined unfolding times. This procedure cannot be used to describe tandems characterized by interdomain interactions, or heteregoneous tandems. In this article, we introduce an alternative approach that is based on recognizing that the observed data are ordered, i.e., first, second, third, etc., unfolding times. The approach is exemplified through the analysis of unfolding times for a computer model of the homogeneous and heterogeneous tandems, subjected to constant force. We show that, in the experimentally accessible range of stretching forces, the independent and identically distributed assumption may not hold. Specifically, the uncorrelated unfolding transitions of individual domains at lower force may become correlated (dependent) at elevated force levels. The proposed formalism can be used in atomic force microscopy experiments to infer the unfolding time distributions of individual domains from experimental histograms of ordered unfolding times, and it can be extended to analyzing protein tandems that exhibit interdomain interactions. Abstract | Full Text | PDF (496 kb) |
| Electric Field-Induced Changes in Lipids Investigated by Modulated Excitation FTIR Spectroscopy Biophysical Journal, Volume 86, Issue 1, 1 January 2004, Pages 285-295 Michael Schwarzott, Peter Lasch, Dieter Baurecht, Dieter Naumann and Urs Peter Fringeli Abstract The effect of electric fields on dry oriented multibilayers of dimyristoylphosphatidylcholine (DMPC) was investigated by transmission Fourier transform infrared electric field modulated excitation (E-ME) spectroscopy. A periodic rectangular electric potential (0–150V, 1.25Hz, 28.4°C±0.2°C) was applied across the sample. To discriminate electric field-induced effects from possible temperature-induced effects resulting from a current flow (<1 pA) across the sample, corresponding temperature-modulated excitation (T-ME) measurements within the temperature uncertainty limits of ±0.2°C at 28.4°C were performed. T-ME induced reversible defects in the hydrocarbon chains, whereas E-ME resulted in reversible compression of dry DMPC bilayers. Periodic variation of the tilt angle of the hydrocarbon chains is suggested. The degree of absorbance modulation in the CH-stretching region was found to be in the order of 1:700, corresponding to a variation of the bilayer thickness of Δ=0.0054nm. Using a series connection of capacitors as equivalent circuit of the cell resulted in =(1.2±0.7)×10 V/m for the electric field in DMPC. Young's elasticity modulus of DMPC could be calculated to be =2.2×10 Pa±1.8×10 Pa, which is in good agreement with published data obtained by electric field-dependent capacitance measurements. Abstract | Full Text | PDF (261 kb) |
| Influence of Monolayer-Monolayer Coupling on the Phase Behavior of a Fluid Lipid Bilayer Biophysical Journal, Volume 93, Issue 12, 15 December 2007, Pages 4268-4277 Alexander J. Wagner, Stephan Loew and Sylvio May Abstract We suggest a minimal model for the coupling of the lateral phase behavior in an asymmetric lipid membrane across its two monolayers. Our model employs one single order parameter for each monolayer leaflet, namely its composition. Regular solution theory on the mean-field level is used to describe the free energy in each individual leaflet. Coupling between monolayers entails an energy penalty for any local compositional differences across the membrane. We calculate and analyze the phase behavior of this model. It predicts a range of possible scenarios. A monolayer with a propensity for phase separation is able to induce phase separation in the apposed monolayer. Conversely, a monolayer without this propensity is able to prevent phase separation in the apposed monolayer. If there is phase separation in the membrane, it may lead to either complete or partial registration of the monolayer domains across the membrane. The latter case which corresponds to a three-phase coexistence is only found below a critical coupling strength. We calculate that critical coupling strength. Above the critical coupling strength, the membrane adopts a uniform compositional difference between its two monolayers everywhere in the membrane, implying phase coexistence between only two phases and thus perfect spatial registration of all domains on the apposed membrane leafs. We use the lattice Boltzmann simulation method to also study the morphologies that form during phase separation within the three-phase coexistence region. Generally, domains in one monolayer diffuse but remain fully enclosed within domains in the other monolayer. Abstract | Full Text | PDF (618 kb) |
Copyright © 2008 The Biophysical Society. All rights reserved.
Biophysical Journal, Volume 94, Issue 7, 2516-2528, 1 April 2008
doi:10.1529/biophysj.107.113225
Biophysical Theory and Modeling
E. Bura*, D.K. Klimov† and V. Barsegov‡,
, 
* Department of Statistics, George Washington University, Washington, District of Columbia 20052
† Department of Bioinformatics and Computational Biology, George Mason University, Manassas, Virginia 20110
‡ Department of Chemistry, University of Massachusetts, Lowell, Massachusetts 01854
Address reprint requests to V. Barsegov, Tel.: 978-934-3661; Fax: 978-934-3013.Most of mechanically active proteins perform their biological function in linear tandems of “head-to-tail” connected repeats. For example, ubiquitin (Ub), a naturally occurring multimer of identical Ub repeats, is involved in protein degradation and several signaling pathways 1,2. A giant protein, titin plays a crucial role in muscle contraction and relaxation. Titin spans almost half of the muscle sacromer and consists of ∼300 domains and 30,000 amino acids 3,4. There are two types of titin domains, immunoglobulin (Ig) and fibronectin (Fn) modules, which are linked in a tandem. The number of Ig domains varies from 37 to 90 in different titin molecules 5,6. Fibronectin is composed of ∼20 distinct Fn domains of type FnI–FnIII. FnIII contains multiple binding sites for integrin receptors of the extracellular matrix 5. In ddFLN, a dimeric filamin from Dictyostelium discoideum, and in human filamin (A protein), a single chain is composed of a rod-like tandem of several Ig domains 6,7,8. Filamins, which also form multidomain tandems, play an important role in cellular locomotion 6,7.
In single-molecule atomic force microscopy (AFM) experiments, the consecutive unfolding transitions of protein domains in a tandem or a polyprotein are analyzed by applying constant mechanical force (force-clamp mode) or time-dependent force (force-ramp) 9,10,11,12,13,14. In force-ramp AFM experiments, the force-induced unraveling of protein tandems results in sawtooth profiles of the unfolding forces, {f1, f2, …, fn}, which correspond to the unfolding of individual protein domains. In force-clamp AFM probes, the force-induced tension in the tandem chain results in the stepwise elongation of the tandem end-to-end distance, X. For the polyubiquitin chain (Ubn, 3<n<12), elongation of X in steps of ΔX≈20nm was used to identify the unfolding transitions in the individual Ub domains 15,16.
Current statistical analyses of forced unfolding data for protein tandems ((D)n) rely on the assumption that a), the forced unfolding transitions of individual domains (D) are mutually independent (uncorrelated), and that b), the recorded unfolding forces (force-ramp) and unfolding times (force-clamp) are realizations of the same probability density function (pdf) 13,14,16,17,18. Said differently, these analyses are based on the assumption that the unfolding times and forces form a set of independent identically distributed (iid) random variables. In our previous computer simulation studies of forced unfolding, hereafter referred to as Study 1 19, we tested the validity of the “iid assumption” in an experimentally accessible piconewton range of applied constant force. We showed that the uncorrelated forced unfolding transitions, observed for the model tandem S2–S2–S2, become correlated when the applied force is increased 19.
In a typical force-clamp AFM experiment on a tandem D1–D2– … −Dn, the recorded first, second, etc. forced unfolding times, t1:n, t2:n, …, tn:n, are ordered, i.e., t1:n≤t2:n≤…tn:n19. Because any domain Di, i=1, 2, …n, could have unfolded at any time, there is no direct correspondence between the observed ordered unfolding time data, {t1:n, t2:n, …, tn:n}, and the unobserved parent unfolding times {t1, t2, …, tn} for individual domains D1 (t1), D2 (t2), …, Dn (tn). The main goal of unfolding time data analysis is to characterize the forced unfolding times of individual domains. This is equivalent to inferring the parent unfolding time distributions for individual domains, ψ1(t) (D1), ψ2(t) (D2), …, ψn(t) (Dn), from the distributions of ordered time variates, t1:n, t2:n, …, tn:n. As we showed in Study 1 19, only when the unfolding times are iid, which is the case for the uncorrelated unfolding times for a homogeneous tandem (D)n, the connection between ordered unfolding times and the parent densities is direct, and ψ (t)=ψ1(t)=ψ2(t)=…=ψn(t) can be estimated by combining all ordered time variates into a single histogram. However, when the parent distributions are nonidentical, i.e., ψ1(t)≠ψ2(t)≠…≠ψn(t) (heterogeneous tandem D1–D2– … −Dn) and/or the unfolding times, t1, t2, …, tn are correlated (dependent), the relationship between the observed ordered time data and the unobserved parent time data is more complex, and data analysis based on the iid assumption is inappropriate. We will show in this study that when the unfolding times are correlated, the use of the iid assumption could result in an inaccurate description of protein unfolding. Hence, statistical tools for testing whether the iid assumption holds are much needed.
In the case of noninteracting domains, such as domains S2 in tandem S2–S2–S2 (Study 1), the emergence of correlations among the unfolding transitions is due to dynamic competition between the unfolding kinetics and tension propagation along the tandem chain 19. However, in wild-type protein tandems, correlations can also build up due to interdomain interactions. Recent experiments on tandems of I27–I28 repeats showed enhanced domain stabilization against applied pulling force, which causes the increase of the average unfolding force from 260 pN (for the tandem of domains I27) to 300 pN (for the tandem of I27–I28 repeats) 20. Similar domain stabilization effect has been reported for the tandem of FnIII domains 21. Also, recent force-ramp AFM measurements on the homogeneous tandems of fibrinogen, performed at a pulling speed of 1μm/s, revealed that the consecutive unfolding transitions are strongly correlated (A. Brown and J. Weisel, University of Pennsylvania Medical School, private communication, 2008). This behavior is most likely due to interaction between fibrinogen's αC-domains and its central region 22.
These experimental findings demonstrate the importance of the inter- and intramolecular protein-protein interactions and show that current AFM technology can be used to probe these interactions by analyzing correlated (dependent) forced unfolding transitions in protein tandems. In force-ramp AFM measurements on protein tandems, mutual independence between the unfolding transitions can be accessed by applying standard tests for independence, such as the Pearson correlation 23, Spearman rank correlation coefficient 23, or Hoeffding's D statistic- 24,25 based test to the recorded unfolding forces. In the case of force-clamp AFM measurements, however, the observed forced unfolding times are ordered. To assess independence of the parent unfolding times, one would have to use statistical tests designed to detect possible correlations of the unobserved unfolding time data by analyzing the observed ordered unfolding times. Yet such tests do not exist. Standard tests for independence can only be applied to the unobserved parent unfolding times. In this study, we develop statistical tools for assessing 1), independence of the forced unfolding times and 2), equality of their (parent) pdfs from observed ordered time data. We illustrate the use of these tests by analyzing the unfolding times for a model of the homogeneous dimer S2–S2 and the heterogeneous dimer S2–S1 of connected domains S2 and S1.
To model correlated unfolding transitions and interdomain interactions in protein tandems, novel theoretical approaches that go beyond the iid assumption are needed. In Study 1, we introduced an order statistics-based approach to analyze the ordered unfolding transitions in protein tandems 19. The key elements of the order statistics formalism are the cumulative distribution function (cdf) of the r-th order statistic (r=1, …, n) in a tandem of length n, Φr:n(t)≡Prob(tr:n≤t), and the corresponding probability density function (pdf), ϕr:n(t)=dΦr:n(t)/dt. Because the order statistics cdfs and pdfs, Φr:n(t) and ϕr:n(t), depend on the parent cdfs and pdfs, Ψ(t) and ψ (t), order statistics-based theory can be used to infer Ψ(t) and ψ (t) from the ordered time data. In this study, we extend the use of order statistics to analyzing correlated unfolding transitions in model tandems S2–S2 and S2–S1, characterized by dependent and identically distributed (did) and dependent and nonidentically distributed (dnid) unfolding times, respectively. In our test studies, we use single domains S2 and S1, and the dimers S2–S2 and S2–S1 to represent protein tandems of short and long length, respectively. The order statistics-based analysis, presented here, can be performed by using experimental unfolding time data for homogeneous as well as heterogeneous tandems of any length. In AFM experiments on a tandem (D)N of length, say N=12, the unfolding data for short (long) tandems can be obtained by grouping together and analyzing separately the unfolding times for tandems of length n=1–3 (n=9–12). Because in a typical AFM experiment the cantilever tip randomly picks up a tandem of any length n, 1≤n≤N, this can always be done.
The rest of this study is organized as follows. First, we describe Langevin dynamics simulations of the forced unfolding for single domains S2 and S1, and tandems S2–S2 and S2–S1. Second, we model the unfolding time distributions for single domains S2 and S1. The models of forced unfolding for single domains are used to assess the prediction accuracy of the order statistics-based analysis. Third, we perform a preliminary analysis of the forced unfolding times for tandems S2–S2 and S2–S1. Because in computer simulations we can access the parent unfolding times, we use standard tests for independence, based on Spearman rank correlation coefficient and Hoeffding's D statistic, and the quantile-quantile (Q-Q) plots to probe, respectively, the independence of unfolding times and their distributional equality. This allows us to classify the forced unfolding times as iid, inid, did, and dnid random variables (Table 5, Study 1) 19. Next, we use these data to generate ordered time variates, as observed in force-clamp experiments. The ordered unfolding times are then used to assess the performance of proposed tests for independence of the unobserved (parent) unfolding times and equality of their (parent) distributions. Finally, the dependent (did and dnid) unfolding times are used to illustrate the order statistics-based analysis of correlated unfolding transitions in tandems S2–S2 and S2–S1.
We performed Langevin simulations of forced unfolding using coarse-grained models of the homogeneous dimer S2–S2 and the heterogeneous dimer S2–S1, formed by domains S2 and S1 (Fig. 1) 26,27. The off-lattice Cα-based coarse-grained model of protein tandems serves as a conceptual representation of the wild-type multidomain proteins 27,28,29,30.
The domains S2 and S1 consist of 46 hydrophobic (B), hydrophilic (L), and neutral (N) residues. Each bead is represented by a united atom at the position of the Cα atom (Fig. 1). The distance between Cα carbons is a=3.8Å. The tandems S2–S2 and S2–S1 are constructed by connecting domains S2 and S1 “head-to-tail” by a flexible linker of five Gly residues (Fig. 1) 19. The potential energy V=VBL+VBA+VDIH+VNB includes the bond-length potential VBL, bond-angle potential VBA, dihedral angle potential VDIH, and nonbonded potential VNB26,30. The nonbonded distance R dependent interaction between a pair of B residues is given by
where λ accounts for variation in the strength of hydrophobic interactions, and ɛh=1.25 kcal/mol is the average strength of hydrophobic contacts. In the native state, S2 and S1 form four-stranded β-barrels, stabilized by Q0=106 native contacts (6.8Å cut-off), with the potential energies of −85.5kcal/mol and −88.0kcal/mol, respectively. Interdomain interactions are limited to steric repulsion.
The forced unfolding kinetics are obtained by integrating the Langevin equations for each residue coordinate xj, subject to the total potential Vtot=V−fX, i.e., ηdxj/dt =−∂Vtot/∂xj+gj(t), where η is the friction coefficient and gj is Gaussian white noise. The force f=fn of magnitude f is applied to C- and N-terminals of the tandem in the direction of the end-to-end vector X (Fig. 1). Numerical integration is performed with a step size δt=0.05τL, where τL=(ma2/ɛh)1/2=3ps is the unit of time, and m≈3×10−22 g is the residue mass. The simulation temperature Ts=0.69ɛh/kB<TF≈0.79ɛh/kB, where TF≈0.79ɛh/kB is the equilibrium folding temperature for S1 and S2, is defined as the temperature at which the average fraction of contacts 〈Q(Ts)〉≈0.7Q0. The unfolding time for domain S2 (or S1) is defined as the time at which all contacts are disrupted. Throughout this study, the unfolding times and rates are expressed in terms of the number of integration steps Ntot (t=Ntotδt).
To prepare the stage for the use of order statistics, we analyze the forced unfolding times for single S2 and S1 domains, and characterize their parent pdfs, ψS2(t) and ψS1(t). We also analyze the parent unfolding times for first (S21) and second (S22) domain in tandem S2–S2, and first (S21) domain and second (S12) domain in tandem S2–S1 for independence and equality of their parent pdfs. The tests used in this section should not be confused with the statistical tests for independence and distributional equality for ordered unfolding times introduced in the following section.
Histograms of the unfolding times for single S2 and S1 domains, obtained at constant force f=66 pN and f=88 pN, and corresponding nonparametric density estimates are presented in Fig. 2. A nonparametric density estimate provides a visual assessment of the distribution and fits the density by locally weighting the observations 19,31,32. In force-clamp AFM experiments on a protein tandem of length n, a suitable model for the parent unfolding time pdfs can be obtained by using trial densities for the distribution of the first unfolding times, ϕ1:n(t), and fitting ϕ1:n(t) to the histograms of the first unfolding times {t1:n} (see Eqs. (7) in the next section). Here, as in Study 1, we used the Gamma density to describe the parent unfolding time pdfs for single domains S2 and S1,
![]() | (1) |
used in the calculations is the default value used in the R software for statistical computing 36, where SD is the standard deviation, and IQR is the interquantile range of the data 32). In the histograms presented here and in Figure 4 and Figure 7, the number of bins nb and the bandwidth bw are estimated as described above. In this figure and in Figure 3 and Figure 4 and Figure 5 and Figure 6 and Figure 7, the time t is expressed in units of the number of integration steps Ntot (t=Ntot0.15ps).| Table 1 Maximum likelihood estimates and 95% standard errors for the dimensionless shape parameter α and unfolding rate k (in units of integration step) for single domains S2 and S1 obtained at f=66 pN and f=88 pN |
| Force, pN | αS2 | αS1 | kS2 | kS1 | ||
|---|---|---|---|---|---|---|
| 66 | 0.98±0.09 | 4.62±0.18 | (2.27±0.26)×10−7 | (1.52±0.06)×10−5 | ||
| 88 | 2.02±0.14 | 7.03±0.23 | (3.84±0.31)×10−6 | (4.59±0.16)×10−5 | ||
The unfolding time histograms for domains S2 and S1 are shown in Fig. 4. The parent unfolding times for the first S21 domain (t1) and second S22 domain (t2) in tandem S2–S2, and unfolding times for the first S21 domain (t1), and second S12 domain (t2) in tandem S2–S1, were analyzed for independence and equality of their parent distributions.
In Study 1, we used the Spearman rank correlation coefficient 23,33, a nonparametric and scale-invariant measure of dependence. This measure detects linear and some nonlinear yet always monotonic relationships between two data sets {t1} and {t2}, when the sets either change in the same or in the opposite direction, i.e., when the values {t1} and {t2} both increase or decrease, or when the values {t1} always increase (decrease) while the values {t2} decrease (increase). Hoeffding's nonparametric test for independence, described in Appendix A 24,25, and its asymptotic equivalent 34 detect all dependence alternatives, including highly nonmonotonic relationships. The values of D range from −0.5 to 1, with larger D(t1, t2) values signifying stronger dependence between t1 and t2. In statistical data analyses, both tests of independence are typically carried out so that monotonic as well as nonmonotonic associations between two variables can be detected.
The values of D(t1, t2) and the Spearman rank correlations for the unfolding times {t1} and {t2} obtained at f=66 pN and f=88 pN for tandems S2–S2 and S2–S1 are reported in Table 2. The associated p-values for testing independence are given in parentheses. The threshold p-value, which represents the level of tolerance for rejecting the independence hypothesis, was set to 0.01 (in statistical hypothesis testing, the null is rejected if the p-value does not exceed the threshold). At f=66 pN, both dependence measures conclude that domains S21 and S22 in tandem S2–S2 and domains S21 and S12 in tandem S2–S1 unfold independently. In contrast, at f=88 pN, Hoeffding's test for independence finds the forced unfolding times for the same domains in the tandems S2–S2 and S2–S1 to be dependent. The Spearman rank correlation coefficient test, on the other hand, is not significant at level 0.01 for either tandem and does not detect dependence. Since Hoeffding's test is significant, the dependence between the unfolding times for the two domains in both tandems, obtained at f=88 pN, is nonmonotonic. This result supports our previous finding (Study 1) that increasing the magnitude of applied force, f, may result in dependent unfolding transitions 19.
| Table 2 Preliminary analysis of the forced unfolding times |
| f=66 pN | f=88 pN | |||||
|---|---|---|---|---|---|---|
| Tandem | Hoeffding's D | Spearman correlation | Hoeffding's D | Spearman correlation | ||
| S2–S2 | 0.0003 (0.25) | −0.06 (0.15) | 0.0032 (0.01) | −0.03 (0.59) | ||
| S2–S1 | −6.08291×10−6 (0.37) | −0.05 (0.26) | 0.0043 (0.0052) | −0.10 (0.02) | ||
| Hoeffding's D statistics and Spearman rank correlation coefficients of the unfolding times for domains S21 and S22 in tandem S2–S2, and domains S21 and S12 in tandem S2–S1, obtained at f=66 pN and f=88 pN. The numbers in parentheses are the p-values for testing for independence of the two variables. |
Q-Q plots were used for the empirical assessment of the equality of the unfolding time pdfs for domains S2 and S1 in tandems S2–S2 and S2–S1 (Fig. 5). The Q-Q plot for the first S21 domain against the second S22 domain in tandem S2–S2, obtained at f=66 pN, shows that almost all data points fall on the reference line, indicating equality of the parent pdfs, i.e.,
A small parallel deviation of the time quantiles from the reference line for the same domains, obtained at increased force f=88 pN, indicates only approximate distributional equality, i.e.,
Indeed, the unfolding times of the S21 domain are consistently shorter than the unfolding times of the S22 domain by a small time constant, Δt≈0.4×106. This can also be seen by comparing the unfolding time histograms (Figure 4ab). This time difference (Δt) induces the dependence detected by Hoeffding's D statistic. The Q-Q plots for the first S21 domain against the second S12 domain in tandem S2–S1 strongly indicate lack of equality of the parent pdfs both at f=66 pN and f=88 pN, i.e.,
(Figure 5cd). This can also be seen from the bimodal shape of the unfolding time density for the S2 domain (Figure 4c).
To summarize this section, we showed that the parent unfolding times for S2 domains in tandem S2–S2 are iid for f=66 pN and did for f=88 pN, whereas the parent unfolding times for S2 and S1 domains in tandem S2–S1 are inid for f=66 pN and dnid for f=88 pN.
We use simulated unfolding time data for model tandems S2–S2 and S2–S1 to assess the performance of the proposed tests for independence of the (parent) unfolding times and equality of the parent unfolding time pdfs from the ordered time data, t1:n≤t2:n≤…≤tn:n. To generate ordered time variates as observed in force-clamp AFM experiments, the unfolding times {t1} and {t2} for domains S21 ({t1}) and S22 ({t2}) in tandem S2–S2, and domains S21 ({t1}) and S12 ({t2}) in tandem S2–S1 were rearranged in increasing time order. That is, tmin<tmax, where tmin=min(t1, t2) and tmax=max(t1, t2) are the minimum and maximum unfolding times, respectively. The ordered variates from 500 runs for each dimer were grouped into ordered sets of the first {tmin}={t1:2}, and second {tmax}={t2:2} unfolding times.
A simple empirical test for assessing distributional equality of the parent unfolding time pdfs for individual domains in a tandem D1–D2– … −Dn can be based on a recurrence relation for order statistics 19. When the forced unfolding times are iid, the pdfs of the r-th and (r+1)-st unfolding times (order statistics) in a tandem of length n are related to the pdf of the r-th unfolding times in a tandem of length n−1 via the recurrence relation 34,35
![]() | (2) |
Equation (2) also holds when the unfolding times are “exchangeable”, i.e., when they are identically distributed but could be dependent (did) 34,35, and when the parent unfolding time pdfs are identical in the sense that they have the same shape but may differ in the location of the peak, which quantifies the most probable unfolding time t*. This is the case for tandem S2–S2, at f=88 pN. Hence, Eq. (2) applies both when the parent unfolding times for domains Di and Dj are strictly identically distributed, and when the unfolding times for, say, domain Dj are “shifted” from the unfolding times for domain Di by a time constant Δt=|tj*−ti*|.
By applying Eq. (2) recursively, we can obtain the parent unfolding time pdf for a single domain D, ψ (t)≡ϕ1:1(t), i.e.,
![]() | (3) |
Equation (3) provides a means to infer the parent distribution for a domain in a tandem from the order statistics pdfs ϕr:n, 1≤r≤n, when the forced unfolding times are iid or did; that is, regardless of their dependence structure. In particular, Eq. (3) implies that when the unfolding times are identically distributed with common parent pdf ψ (t), then the latter can be obtained by “mixing” all the order statistics pdfs, ϕr:n, r=1, 2, …, n, with equal weight 1/n, i.e.,
![]() | (4) |
A simple test for equality of the parent unfolding time pdfs for individual domains in a tandem can be constructed as follows. First, the ordered unfolding times, collected at a fixed force, are grouped into two time sets, one for unfolding times for a shorter tandem of length, say n1=1–3, and the other for unfolding times for a longer tandem of length say, n2=9–12. As noted in the introduction, in AFM experiments, the cantilever tip randomly picks up a tandem of any length, so that this separation is implementable in practice. The corresponding pdfs,
and
are estimated by using Eq. (4). Next,
and
are compared via a Q-Q plot. If the time quantiles for
and
fall close (far) to (from) the reference line, then the parent pdfs for individual domains in tandems of length n1 and n2 are identically (nonidentically) distributed. The difference between the time quantiles for
and
if any, can be used as a signature of the distributional inequality of the parent pdfs.
The above arguments lead us to the following computational algorithm:
for a tandem of shorter length n1.
If U∈(1/n1, 2/n1), randomly select a point from the second order statistic,
and so on.
(Eq. (4)).
for a tandem of longer length n2, and repeat Steps 2–4 to obtain a sample of size M from 
against the time quantiles of
and estimate the distance of the time quantiles from the reference line.If the unfolding time quantiles fall close to the reference line, i.e., they are either aligned with or are parallel and close to the reference line, then Eq. (4) is satisfied and the parent unfolding times for individual domains (Ds) in a tandem D1–D2– … −Dn are identically distributed, regardless of whether they are dependent. Significant nonlinear divergence from the reference line would indicate their distributional inequality.
We tested the performance of the proposed algorithm by using ordered unfolding time data for tandems S2–S2 and S2–S1. For two-domain tandems, Eq. (4) becomes
![]() | (5) |
The Q-Q plots of the time quantiles for single domain S2 versus the quantiles for tandem S2–S2, sampled from the mixture of the order statistics pdfs (Eq. (5)), are displayed in Fig. 6. At f=66 pN, the unfolding time quantiles run almost parallel to the reference line, indicating an approximate distributional equality (up to the time shift Δt) of the parent unfolding times for the first S21 domain and the second S22 domain, i.e.,
The time shift at the median (50% quantile) from the reference line is ∼Δt≈3×106 integration steps (Figure 7a). At f=88 pN, the time quantiles show a shorter time shift, Δt≈0.5×106 integration steps, still running almost parallel to the reference line, which indicates an approximate distributional equality (up to Δt) of the parent unfolding times for S2 domains in S2–S2, i.e.,
(Figure 6b).
(Eq. (5)) for the ordered unfolding times t1:2 and t2:2, obtained at f=66 pN (a) and f=88 pN (b). (c and d) Q-Q plots of the unfolding times for single S1 domain versus the unfolding times for tandem S2–S1, generated by mixing the first and second order statistics pdfs for t1:2 and t2:2, obtained at f=66 pN (c) and f=88 pN (d).The observed time shift Δt is due to the tension drop in the tandem chain, which occurs after the first unfolding transition in one of the two domains at time t=t1:2. The resulting chain elongation lowers the force-induced tension and the instantaneous force to a lower value, f′<f=66 pN, and hence it takes time Δt to ramp it up back to the initial level (f′→f). As a result, the time quantiles obtained for the longer tandem (S2–S2) are above the reference line, indicating prolonged unfolding for S2 domains in the tandem compared to a single S2 domain. Although in our case study we used a single S2 domain and the dimer S2–S2 to represent, respectively, the tandems of shorter and longer length, this algorithm can be used to analyze protein tandems of any length n1 and n2>n1. The Q-Q plots of the time quantiles for single domain S1 versus the quantiles for tandem S2–S1, sampled from the mixture of the order statistics pdfs (Eq. (5)), are also displayed in Fig. 6 for comparison. We observe much greater nonparallel divergence from the reference line with a larger time shift, Δt≈8×106 integration steps (f=66 pN) and Δt≈1×106 integration steps (f=88 pN), at the 50% quantile, compared with tandem S2–S2. Such strong nonlinear divergence is indicative of the fact that the forced unfolding times for domains S2 and S1 in tandem S2–S1 are differently distributed both at f=66 pN and f=88 pN.
The results of the proposed test for distributional equality of the parent unfolding time pdfs, applied to the ordered unfolding times, agree with the results of preliminary data analysis, and confirm that the parent unfolding times, obtained at f=66 pN and f=88 pN, are identically distributed for tandem S2–S2 and nonidentically distributed for tandem S2–S1. The proposed algorithm can be used in statistical analyses of unfolding data available from force-clamp AFM measurements. In addition, for homogeneous tandems, the difference between the unfolding time quantiles for tandems of short and long length, parameterized by Δt, can be used to estimate the timescale of force-induced tension propagation along the tandem chain, τf. Indeed, there are n-1 intervals of dropped tension of duration Δt in a tandem of length n. When the pdfs for tandems of different length n1≠n2,
and
are compared via Q-Q plots, τf can be estimated as τf≈Δt/|n2−n1|.
In this section, we propose a permutation test for iid versus did parent unfolding times and an overlap fraction test for inid versus dnid unfolding times using the ordered unfolding times for tandems S2–S2 and S2–S1.
Let us assume that we record n ordered unfolding times sampled from the joint distribution Ψ(t1, …, tn) and joint pdf ψ (t1, …, tn), where as before ti denotes the unfolding time of the ith domain (i=1, …, n) in a tandem of length n. Suppose we observe the unfolding time order statistics, t1:n≤t2:n≤…≤tn:n, sampled from the joint distribution Ψ(t1, …, tn). We want to infer if the (unobserved) parent data, t1, t2, …, tn are uncorrelated from their order statistics, t1:n≤t2:n≤…≤tn:n. Suppose now that the parent unfolding time data are indeed iid; that is, Ψ(T1, …, Tn)=Ψ(T1≤t1)Ψ(T1≤t2)…Ψ(T1≤tn), where Ψ(t) is their common cdf, and ψ (t1, …, tn)=ψ (t1)ψ (t2)…ψ (tn), where ψ (t)=dΨ(t)/dt is their common pdf. This factorization implies that if the parent data were iid, then the order statistics, t1:n≤t2:n≤…≤tn:n, could have had resulted from any permutation of the original data with equal probability. For example, the parent sample t1, t2, …, tn could have resulted in t1:n≤t2:n≤…≤tn:n with equal probability as the sample t1, t3, …, tn or the sample tn, t3, …, t1, and so on. The order in which the n-tuple (t1, …, tn) is arranged is irrelevant because all n! permutations of the n parent data points are equally likely to be observed, since they are independent realizations of the same distribution. Let us generalize the above arguments to M measurements. Suppose M ordered n-tuples,
are observed, i=1, …, M. If the parent unfolding time data were iid, the unfolding time order statistics obtained in the ith experiment,
could have had resulted from any permutation of the parent data with equal probability. For each i=1, …, M, all n! permutations of the n data points are equally likely to be the parent sample of the observed order statistics. This leads to the following algorithm for testing pairwise independence:
be the b-th permuted order statistics, where b is a permutation number. Store the result in matrix
of dimension M×n, where 
pairwise tests for independence of all pairs of the n columns of Tb at a fixed significance level. Compute and store the fraction of rejections of the null hypothesis of independence.In Step 3, both Spearman's rank correlation and Hoeffding's D statistic should be used so that most types of dependence are checked for 23,24,25. Both measures are based on test statistics with known asymptotic distributions, which allow the computation of the p-values for testing independence. If the parent unfolding time data are independent, the test for independence in Step 3 will not be significant. An illustration of the algorithm is given in Appendix B .
Table 3 summarizes the results of the application of the permutation algorithm to the ordered unfolding times for tandem S2–S2. The entries are the fractions of p-values >0.05 over 500 replicates (B=500). We used a 5% cutoff, i.e., we assumed that if the obtained p-value≤0.05, then there exists statistically significant dependence among the parent unfolding times for domains S2. At f=66 pN, Hoeffding's test rejected independence only 100−99.6=0.4% of the time, thus providing strong support for the independence of unfolding times for the first S21 domain (t1) and second S22 domain (t2) in tandem S2–S2. The Spearman rank correlation coefficient also detected independence 100% of the time (Table 3). At f=88 pN, the fraction of the p-values exceeding 0.05 for the Hoeffding's test is 0. That is, all 500 p-values for testing independence were highly significant, i.e., below the 5% cutoff, providing strong evidence for lack of independence between the parent unfolding times for the first S21 domains (t1) and the second S22 domain (t1) in tandem S2–S2. Thus, the permutation test for independence, applied to iid and did unfolding times for tandem S2–S2, recovers the results of the preliminary data analysis.
An empirical approach for deducing independence of the parent inid and dnid unfolding times can be based on the overlap fraction F(r, r+1;n), r=1, …, n−1, defined as the fraction of values shared by the r-th order statistic, tr:n, and the (r+1)-st order statistic, tr+1:n, in an heterogeneous tandem (D1–D2)n/2 of length n. That is,
![]() | (6) |
If F(r, r+1;n) is smaller than a threshold value F*, then the unfolding times for, say, domain D1, differ from the unfolding times of domain D2 in a consistent fashion. Since domains D1 and D2 have different (parent) pdfs, i.e.,
this would mean that unfolding of D1 domains does not affect unfolding of D2 domains, and that these domains unravel independently. For example, the forced unfolding of domain S1 occurs on a faster timescale compared to the unfolding of the S2 domain (Fig. 5). Hence, the first unfolding transitions (t1:2) occur more frequently for domain S1 as compared to the S2 domain, and the consecutive unfolding transitions t1:2 and t2:2 are separated in time (uncorrelated). On the other hand, large values of F(r, r+1;n), i.e., F(r, r+1;n)>F* would indicate mixing among the unfolding times for domains D1 and D2 and signify their dependence.
We applied the overlap fraction test to assess independence of the parent unfolding times for S21 domain (t1) and S12 domain (t2) in tandem S2–S1. We set the threshold value for the overlap fraction to F*=50%. For a heterogeneous tandem of length n=2, the heuristic argument that led to this choice follows along these lines: if there were perfect mixing, that is the first order statistic originated with equal probability from both domains, then the ordered pair
would be observed 50% of the time, and the ordered pair
would be observed 50% of the time as well, where
denotes the unfolding time of domain Di, i=1, 2. This would lead to no separation between the values of the two order statistics (they would fall in the same range) and the overlap fraction would be close to one. Lack of mixing would mean that, say, the pair
would be observed nearly always and the complement pair
would be observed almost never, so that the overlap fraction would be close to zero. Of course, because of sampling variability, the overlap fraction would never be exactly equal to zero or one but rather close to either value. The closeness would depend on the magnitude of correlations and the size of the sample. The cutoff of 50% is simply the midpoint of the unit interval. In principle, one can estimate the cutoff much more accurately using resampling methods; here we simply use this subjective cutoff. For this choice, values of F(1, 2;2)<50% would imply that one of the two domains S2 and S1 unfolds on a faster timescale, compared to the other domain. In the opposite case, i.e., when F(1, 2;n)>50%, we would conclude that S2 and S1 domains unfold on a similar timescale and that the unfolding times are correlated. We found that at f=66 pN, F(1, 2;n)=24%<50%, and at f=88 pN, F(1, 2;n)=61%>50%. Hence, we recover the results of the preliminary analysis for tandem S2–S1, namely that the parent unfolding times for S2 and S1 domains in the tandem are independent at f=66 pN, but dependent at f=88 pN.
For tandems of length larger than two, if the tandem is fully heterogeneous, i.e., all its domains are distinct, perfect mixing is equivalent to all permutations of the n-tuple (t1:n, …, tn:n) being equally likely, and the overlap fraction of any two order statistics would be close to one. In particular, the overlap fraction of any two consecutive order statistics, F(r, r+1;n), would also be close to one. In the other extreme of no mixing, the overlap fraction would be close to zero. Thus, even when the tandem consists of more than two domains, the midpoint cutoff of 50% can also be used. To conclude independence, all overlap fractions F(r, r+1;n), r=1, …, n−1, must be smaller than the cutoff. We plan to examine the more general case of tandems composed of a mix of the same and distinct domains in a separate study.
The application of the test for distributional equality to ordered unfolding times obtained at f=88 pN revealed a pronounced time shift Δt ≈ 0.5×106 integration steps for tandem S2–S2 and Δt ≈ 1×106 integration steps for tandem S2–S1 (Fig. 6). As we argued before, the origin of Δt is a tension drop in the tandem chain, which accompanies each unfolding transition. As a result, every next unfolding transition (t2:n, t3:n, …, tn:n) after the first transition (t1:n) in a tandem of length n is delayed by Δt. This builds up correlations (dependence). However, the dependence structure, defined by the time shift Δt, is trivial and affects only the second (t2:n), third (t3:n), etc., unfolding transition, but does not affect the first transition (t1:n). Therefore, for correlated unfolding events characterized by did and dnid unfolding times with such trivial dependence, the first order statistic t1:n can be described by using the order statistics for iid and inid unfolding times (Study 1) 19.
To illustrate our approach, here we use previously generated ordered time variates, i.e., the first unfolding times, {tmin}={t1:2}, and second unfolding times, {tmax}={t2:2}, for tandems S2–S2 and S2–S1 of length n=2, to analyze did and dnid unfolding times for these tandems. Clearly, this approach can be generalized to a homogeneous ((D)n) and heterogeneous tandem ((D1–D2)n/2) of any length n. The first order statistics pdfs, ϕ1:2(t), for tandems S2–S2 and S2–S1 are given by
![]() | (7) |
![]() | (8) |
| Table 4 Numerical values of the shape parameter, α, and unfolding rate, k, (in units of integration steps) for domains S21 and S22 in tandem S2–S2, and domains S21 and S12 in tandem S2–S1 |
| Parameters | ||||||
|---|---|---|---|---|---|---|
![]() | ![]() | ![]() | ![]() | |||
| S2–S2 | 2.5 | 2.6 | 2.8×10−6 | 2.9×10−6 | ||
| Parameters | ||||||
![]() | ![]() | ![]() | ![]() | |||
| S2–S1 | 2.6 | 9.4 | 2.9×10−6 | 3.3×10−5 | ||
| Values are obtained from the fit of the first order (min) statistics pdf, ϕ1:2(t)=ϕmin(t) to the histograms of the ordered unfolding times, t1:2, obtained at f=88 pN (Fig. 7). |
The increased (decreased) values of α (k), inferred from the order statistics pdf ϕ1:2(t) for domains S2 and S1 in tandems S2–S2 and S2–S1 are due to the presence of a short linker, which tends to prolong the forced unfolding times of protein domains in tandems. We estimated the effect of linkers on the unfolding timescale for domains S2 in tandem S2–S2 by taking the difference between the average unfolding times
for domain S2 in tandem S2–S2 and the average unfolding time τS2 for single S2 domain, i.e.,
![]() | (9) |
were taken from Table 1 (Table 4). Applying Eq. (9) yields ΔτS2≈8.3ns. Although for the models of protein dimers connected by a short linker of five Gly residues this time is negligible compared to the average unfolding time of S2 domain in the dimer
and for a single S2 domain τS2≈0.08μs, the effect of linkers may become more pronounced in long protein tandems, especially at a low force and/or for longer linkers. In force-clamp AFM experiments on a protein tandem of length n, the influence of linkers on the unfolding kinetics can be estimated by comparing the average first unfolding time (first order statistics) for a linker of a shorter length l1, τ1:n(l1), and a longer length l2>l1, τ1:n(l2). The ratio (τ1:n(l2)−τ1:n(l1))/(l2−l1) can then be used as an estimate for the unfolding time delay per unit length of the linker.Let us now calculate the error in the estimates of the shape parameter, α, and unfolding rate, k, we would make if we were using the iid assumption in the analysis of did unfolding times for tandem S2–S2 obtained at f=88 pN. When the unfolding times are iid, the parent unfolding time pdf, ψ(t), is obtained by pulling all unfolding times into a single histogram, i.e.,
(Eq. (6) in Study 1) 19. For n=2, ψ(t)=ϕ1:2(t)/2+ϕ2:2(t)/2. By fitting the Gamma density (Eq. (1)) to the histogram of combined first and second unfolding times (t1:2 and t2:2), we obtain αS2=2.4 and kS2=2.2×106. The relative difference in the shape parameter αS2 and the unfolding rate kS2 between the estimates, obtained by using order statistics (αS2=2.55, kS2=2.85×10−6 (Table 4)) and by using the iid assumption, is small, ∼6% for αS2, but fairly large, ≈23%, for kS2. This comparison indicates that employing the iid assumption when the data are not iid may result in substantial estimation error of the forced unfolding rate.
In our previous work (Study 1) 19, we proposed what to our knowledge is a new theory for describing the forced unfolding transitions in wild-type protein tandems and engineered polyproteins, available from force-clamp AFM experiments. The theory is inspired by the experimental AFM setup, in which only the ordered, i.e., first, second, etc., unfolding times in a tandem D1–D2– … −Dn of length n are recorded. Given the stochastic nature of unfolding, it is not possible to tell which domain Di (i=1, 2, …, n) has unfolded at any given time, t1:n, t2:n, …, tn:n. Order statistics overcomes this difficulty by analyzing ordered variates, and because the distributions of ordered unfolding times, ϕ1:n, ϕ2:n, …, ϕn:n, depend on the parent distributions for protein domains,
the order statistics-based theory can be used to infer the parent pdfs (ψ) from the order statistics pdfs (ϕ).
We showed in Study 1 19 that the iid assumption, that the (parent) unfolding times are independent (uncorrelated) and identically distributed (iid), may or may not hold depending on the tandem composition, the presence of interdomain interactions, and the magnitude of applied force. For example, in the heterogeneous tandems (D1–D2)n the unfolding times of nonidentical domains D1 and D2 are expected to be nonidentically distributed. Also, the domain stabilization effect, observed in the heterogeneous tandems of I27–I28 repeats of titin, in tandems of FnIII domains 20,21, and in the homogeneous tandems of fibrinogen, makes the forced unfolding transitions strongly correlated. We showed that in tandems with no interdomain interactions, such as the model trimers S2–S2–S2 and S2–S1–S2 (Study 1, 19) and dimers S2–S2 and S2–S1, analyzed here, the dynamic competition between tension propagation along the tandem chain and forced unfolding may couple the consecutive unfolding transitions at an elevated force level (f=88 pN). As we argued in Study 1, in force-clamp AFM experiments on protein tandems, the forced unfolding transitions can be characterized by four different types of unfolding times, namely iid, inid, did, or dnid unfolding times (Table 5 in Study 1) 19. Only when the parent unfolding times are iid, which is not known a priori, can conventional unfolding data analyses, in which the unfolding times are pooled together into a single histogram, be used. However, when the parent unfolding times are correlated and/or nonidentically distributed, i.e., when the unfolding data are did, inid, or dnid, this approach is inappropriate. To illustrate the latter, we showed that the use of iid assumption in analyzing dependent unfolding times results in large estimation errors for the forced unfolding rate.
To take advantage of the proposed formalism, the unfolding times must be first classified as iid or inid or did or dnid unfolding times. In this study, we developed statistical tests for assessing the independence of parent unfolding times and their distributional equality. These tests allow one to gain information on the unobserved (parent) unfolding times for individual domains by analyzing the observed ordered unfolding times. The tests can be used in statistical analysis of unfolding data available from force-clamp AFM measurements to assess the validity of the iid assumption and to classify the forced unfolding transitions. We assessed the performance of these tests against the results of computer simulations of forced unfolding for the model dimers S2–S2 and S2–S1. We recovered the results of preliminary analysis, namely that the parent unfolding times for the homogeneous dimer S2–S2 are iid at f=66 pN and did at f=88 pN, whereas the parent unfolding times for the heterogeneous dimer S2–S1 are inid at f=66 pN and dnid at f=88 pN, which validates the order statistics-based theory. Although in our studies we employed the dimers (n2=2) and single domains (n1=1) to represent protein tandems of longer and shorter length, the tests can be used to assess the validity of the iid assumption and to classify the forced unfolding transitions for tandems of arbitrary lengths n1 and n2>n1. The monomers and dimers serve as prototypes for tandems of short and long lengths as observed in force-clamp AFM probes on a protein tandem, (D)N, where unfolding data are available for tandems of different length, 1<n<N. For the convenience of the reader, in Fig. 8 we outline the main steps for testing the distributional equality of the parent unfolding times and their mutual independence. We also give reference to the relevant Eqs. (3) and 10 presented in Study 1 19, and Eqs. (7) in this study, which can be used to model the parent unfolding time distributions.
In tandems formed by the noninteracting domains, such as domains S2 and S1 in dimers S2–S2 and S2–S1, the dependence among the consecutive unfolding transitions can be induced by the dynamic competition between the force-induced tension propagation along the tandem chain and the forced unfolding kinetics. It is likely that the dynamic coupling between tension propagation and unfolding kinetics occurs in wild-type tandems and engineered polyproteins as well. As we showed in this study, in such a case the dependence structure between the consecutive unfolding transitions is rather trivial, namely that every next unfolding transition after the first one in a tandem of length n, i.e., the second (t2:n), third (t3:n), etc., are delayed by constant time Δt of dropped tension. The test for distributional equality can be used to estimate the timescale for tension propagation, τf. This can be done, e.g., by comparing the parent unfolding time pdfs,
and
generated by using recurrence relation 3 for tandems of different length n1 and n2>n1 via a Q-Q plot. Specifically, τf can be estimated from the time shift, Δt, as τf≈Δt/(n2−n1). For the tandem S2–S2, we found that τf≈0.5μs for f=66 pN and τf≈0.07μs for f=88 pN. Hence, a moderate 33% change in applied force shifts τf by an order of magnitude.
We showed that in protein tandems with no interdomain interaction, yet characterized by the correlated unfolding transitions with the constant time shift, the first unfolding events (t1:n) are unaffected by the tension drop. Because of this, the pdf of the first order statistic ϕ1:n(t), can be still described by the order statistics for independent random variables (iid and inid, Study 1 19). To illustrate this point, we modeled ϕ1:2(t), for tandems S2–S2 and S2–S1 by using Eqs. (3) and 10 of Study 1. The shape parameter, α, and unfolding rate, k, obtained from the fit of ϕ1:2(t) to the histograms of the first unfolding times (t1:2) for tandems S2–S2 and S2–S1 (Table 4) agree with the same quantities obtained for single domains S2 and S1 (Table 1), thus validating our theory. We also showed that due to the presence of flexible linkers, the unfold