| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Center for the Study of Systems Biology, School of Biology, Georgia Institute of Technology, Atlanta, Georgia 30318
Correspondence: Address reprint requests to Jeffrey Skolnick, Tel.: 404-407-8976; Fax: 404-385-7478; E-mail: skolnick{at}gatech.edu.
| ABSTRACT |
|---|
|
|
|---|
29 h of CPU time per sequence. Since homologous proteins are unlikely to require the extent of conformational search as weakly/nonhomologous proteins, TASSER's parameters were optimized to reduce the required CPU time to
17 min, while retaining TASSER's ability to improve structure quality. Using this optimized TASSER (TASSER-Lite), we find an average improvement in the aligned region of
10% in root mean-square deviation from native over the initial template. Comparison of TASSER-Lite with the widely used comparative modeling tool MODELLER showed that TASSER-Lite yields final models that are closer to the native. TASSER-Lite is provided on the web at http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index.html. | INTRODUCTION |
|---|
|
|
|---|
In practice, homology modeling proceeds as follows: First, an evolutionarily related template protein is identified. Second, an alignment between the target and template sequences is constructed. Third, a three-dimensional model including loops in the unaligned regions is built (5
). A variety of methods could be used to construct the protein's three-dimensional structure. One involves modeling by rigid-body assembly as in COMPOSER (16
,17
). Another method uses segment matching, which relies on the approximate positions of the conserved template atoms (18
20
); a representative approach is SEGMOD. The third group of methods incorporates modeling by satisfaction of the spatial restraints obtained from the alignment by using either distance geometry or optimization techniques (21
23
); such an approach is implemented in MODELLER (24
), one of the most widely used comparative modeling tools. Despite improvements in homology modeling procedures, the ability to accurately predict the conformation of the intervening loops between the aligned regions has been rather limited (25
,26
). Moreover, the accuracy of the resulting model depends mainly on the template selection and alignment accuracy between the target and the template. Indeed, the resulting models (in the aligned regions) are generally closer to the template structure than that of the target sequence being modeled. This is an essential problem that must be addressed; this forms the major focus of this work.
Recently, we developed a methodology, Threading/ASSembly/Refinement (TASSER) (27
), for the automated tertiary structure prediction that proceeds in a two-step fashion: First, we employ the threading algorithm PROSPECTOR_3 to provide continuous aligned fragments and predicted tertiary restraints (28
). TASSER uses PROSPECTOR_3 provided fragments and tertiary restraints to assemble the structure under the influence of a knowledge-based force field. TASSER has been benchmarked on a comprehensive set of weakly/nonhomologous single domain proteins (27
) as well as medium to larger sized, possibly multi-domain, proteins (29
). This benchmarking showed that TASSER could significantly refine the structures and provide final models that are often considerably closer to the native structure than the input templates, and it could generate good predictions for the unaligned (loop) regions. Moreover, the performance of TASSER in CASP6 (30
) was consistent with that of the benchmark.
Although TASSER often generates good models for weakly/nonhomologous proteins, the procedure is rather CPU intensive, requiring several CPU hours to days/sequence for a complete run. However, when the sequence identity between the target and template is >35%, viz. in the comparative modeling regime, the alignment to the template is usually good and such long simulations might not be required; however, TASSER's ability to refine proteins over their initial template alignment in the comparative modeling regime where the initial alignments are in general quite good has not been systematically explored. Thus, this study systematically benchmarks TASSER in the comparative modeling regime. The benchmark set consists of representative single domain protein structures in the Protein Data Bank (PDB) (31
) of the length between 41200 residues having a sequence identity
35% with respect to the templates. We optimize the run time parameters of TASSER so that a single calculation gives essentially the same results as the original procedure but does so in considerably less computer time. The resulting fast and effective search version of TASSER, TASSER-Lite, is a rapid comparative modeling tool that is readily applicable to the large-scale comparative modeling.
| METHODS AND MATERIALS |
|---|
|
|
|---|
We constructed an initial data set from which the benchmark set was derived. Each member of the PDB template library has its own cluster, which consists of the PDB sequences having sequence identity >35%. Those PDB sequences, which satisfy the criteria mentioned above, were selected from each of these template clusters to form the initial data set. In addition, sequences having sequence identity
98% among the cluster members were removed from each template cluster to reduce redundancy. From the initial data set, sequences having two or more domains were identified using the protein domain parser (32
), scrutinized manually, and removed from the data set. For the systematic analysis, sequences in the 3590% sequence identity range are subdivided into six categories: 3540%, 4050%, 5060%, 6070%, 7080%, and 8090%. From this initial data set, one representative target per template cluster was selected to form the benchmark set, except for the category 3540%. For 3540%, all members are included to form the benchmark set. The list of all proteins belonging to the six sets of cluster can be found at http://cssb.biology.gatech.edu/skolnick/files/tasserlite/tasserlite_data.html.
Overview of TASSER
Since TASSER has been previously described (27
,28
,33
36
), here we just outline its essentials. Structural templates for a target sequence are selected from a representative PDB library using our iterative threading procedure PROSPECTOR_3 (28
) designed to identify homologous as well as analogous templates. The scoring function of PROSPECTOR_3 includes sequence profiles, secondary structure propensities from PSIPRED (37
), and consensus contact predictions from the previous threading iterations. A target sequence is classified into three categories based on the confidence of the template identification and likely alignment accuracy as "Easy", both the template identification and alignments are likely to be quite accurate; "Medium", the template is reasonable, viz., has a good structural alignment with the target structure, but the threading-based alignment may be quite inaccurate; and "Hard", where the template selection is likely incorrect.
Based on the threading template, the target sequences are split into the continuous aligned regions and unaligned regions. For a given threading template, an initial full-length model is built by connecting the continuous template fragments (building blocks) by a random walk confined to lattice bond vectors. If a gap is too long to be spanned by the specified number of unaligned residues, a long C
-C
bond remains and a spring-like force that acts to draw sequential fragments together is used until a physically reasonable bond length is achieved. Parallel hyperbolic Monte Carlo (MC) sampling (38
) samples conformational space by rearranging the continuous fragments excised from the template. During assembly, building blocks are kept rigid and are off-lattice to retain their geometric accuracy; unaligned regions are modeled on a cubic lattice by an ab initio procedure and serve as linkage points for rigid body fragment rotations. Conformations are selected using an optimized force field, which includes knowledge-based statistical potentials describing short-range backbone correlations, pairwise interactions, hydrogen-bonding, secondary structure propensities from PSIPRED (37
), and consensus contact restraints extracted from the PROSPECTOR_3 identified template alignments.
In a standard TASSER run, for each protein, five MC runs (Nrun) are performed. Each MC simulation contains 4050 replicas (Nrep), depending on the size of the protein, with each replica simulated at a different temperature. The number of MC steps, Nstep, before a temperature exchange or a swap is performed is 200. The total number of such swaps, Nswap, is 1000. After each MC swap, the structures of the 16 lowest temperature replicas are stored. Finally, the structures generated in these 16 lowest temperature replicas for all the five independent runs are submitted to an iterative clustering program, SPICKER (36
). The final models are combined from the clustered structures and are ranked by the cluster density, and the five highest structural density clusters are selected. Thus, no knowledge of the native structure is used in either generation of the models or in their selection. Solely for the purpose of subsequent analysis, the final model is the one among the top five cluster centroids that has the lowest root mean-square deviation (RMSD) from the native structure in the aligned region. We construct a detailed atomic model using PULCHRA (unpublished) using the best cluster centroid model.
The set of parameters (Nrun, Nrep, Nstep, Nswap) described above are those of a standard TASSER simulation and were obtained based on the optimization of TASSER on a weakly/nonhomologous protein benchmark set of 1489 proteins (27
). Since with the above-mentioned parameters TASSER takes hours/days of CPU time, our goal here is to develop TASSER into a reliable fast comparative modeling tool, which we achieve by tuning the run time parameters of TASSER. Although we found that the parameters Nrun, Nstep, and Nswap could be significantly reduced during the optimization, Nrep could not (data not shown).
We have used the template modeling score (TM-score) (39
) as one means of comparing the improvement over the initial template, which is defined as
![]() |
| RESULTS |
|---|
|
|
|---|
-class, 248 targets are in the ß-class, and 387 targets belong to either the
/ß or
+ ß class. Of the rest, either they belong to peptides or membrane proteins or could not be classified into any of the above classes.
In general, the RMSD is used to assess the quality of the full-length models between the equivalent atoms in the model and the native structure (41
). For the weakly/nonhomologous pairs of proteins where only substructures of the target and template may be related, the RMSD is a poor measure to estimate the quality of different initial templates because the alignment coverage could be very different even when the RMSD is the same (28
,41
,42
). When the models are of low to moderate quality (say with an RMSD above 3 Å), the TM-score has a relatively good correlation between the initial template alignment and the final model (39
). However, for very good full-length models without large local deviations, because of its greater sensitivity to details, the RMSD is the more appropriate measure. Hence, in this work, the RMSD from native of the C
atoms has been used to assess the quality of the structure template and the predicted full-length model.
The threading results of PROSPECTOR_3 for the 901 targets are summarized in Table 1 under the columns labeled by Tali. In the threading process, for each of the six categories (as mentioned in Methods), homologous templates with a sequence identity greater than the upper limit of identity ranges are excluded from the template library. Among the 901 target sequences, PROSPECTOR_3 assigns 897 to the Easy set with an average RMSD and TM-score to native of 2.1 Å and 0.86 respectively with an average alignment coverage of 97% (Table 1). Four targets are classified as belonging to the Medium set. Analysis of these cases, showed that either they are small proteins or have few secondary structures, which might have resulted in poor alignment and poor Z-scores. In further discussions, we focus on the Easy set of proteins. In general and not surprisingly, PROSPECTOR_3 identifies good templates with increasing sequence identity as shown by an average decrease in the RMSD of the template to native over the aligned region (Table 1). This is a minimal requirement for any acceptable threading algorithm.
|
10%. When we compare the improvement in the average RMSD of the final model (Mali) with respect to the initial template (Tali) for the different sequence identity ranges, as is evident from Table 1, with the increase in sequence identity, there is no relative improvement in the RMSD. This suggests that when the sequence identity is high, while the room for further structure improvements is reduced, then refinement by TASSER with respect to the initial template is limited essentially because the distance between the target and template structures is below the inherent resolution of the TASSER potential.
In the above analysis, we have calculated the RMSD of the template or model to native with an a priori specified equivalence between pairs of residues provided by the threading method PROSPECTOR_3. To clarify the relationship between the threading alignments and the best structural alignments, we compare the above results with the RMSD calculated by finding the best structural alignment between the template/model to native using TM-align (43
). We align the substructure identified by threading (using PROSPECTOR_3) to the native structure. The average RMSD of 897 proteins in the Easy set, for the template aligned region to native for the structural alignment is 1.4 Å (Table 1 under column Taln in the row TMalign A) in comparison to the 2.1 Å RMSD given by PROSPECTOR_3. The average RMSD of the template to native becomes better by 0.7 Å, when we use the alignment provided TM-align instead of the threading alignment; however, the average alignment coverage drops by 2% (9795%) for the structural alignment. For the full-length final models (897 proteins in the Easy set), a similar calculation shows that the average RMSD of the final models evaluated in the aligned region is 1.5 Å, (Table 1 under column Maln in the row TMalign A) with TM-align, which is better than the RMSD obtained without using the structural alignment, 1.9 Å. In Table 1 (row TMalign A), comparison of the average RMSD for the template (under column Taln), with the final model (under column Maln) for the higher sequence identity range, shows marginal improvement in the RMSD for the model. This reflects the fact that models of this quality are at the limit of the resolution of TASSER.
Using the threading alignment of template to native and structural alignment of template (threading aligned region) to native, we extracted the residues of the target sequence that are identically aligned by both threading and structural alignment, with respect to the template. These common aligned residues cover
95% of the threading aligned region. Thus, as would be expected, there is good agreement between the threading and structural alignments. The other
5% of residues, which show disagreements in the alignment are, mostly, in the loop region at the start or end of the secondary structures and at the N- or C-termini of the protein. For these (
5% of the residues that are aligned in threading), the average shift per residue between the structural and threading alignments is 2.1. Furthermore, using the set of residues that are aligned to the template by threading, we calculated the average RMSD between the final TASSER model to the native structure. The obtained average value is 1.9 Å. If we consider these residues in the structural alignment, 98.6% are aligned on average with an average RMSD of 1.5 Å. Of the residues that contribute to the structural alignment, 97% are identical to those of the TASSER model. For the remaining 3%, the average shift in alignment from the TASSER model is 1.7 residues.
Next, we have used TM-align for the structural alignment of the full-length template or full-length model to native (for the Easy set) to see if there is any improvement in the alignment by including all residues in the template whether or not they are aligned by PROSPECTOR_3. The result is listed in Table 1 in the row TMalign F under the columns Taln and Maln for template and final model, respectively. The structural alignment, using either the aligned region of the template or the full-length template to the native, results in an alignment coverage of
95% and an average RMSD of 1.4 Å. This implies that including the unaligned region of the template does not result in any improved alignment compared to the one that is restricted to the threading aligned region. The threading alignment has apparently extracted the best portion of the template proteins. In a similar comparison for the final models, when we include the unaligned region in the structural alignment, the average RMSD of the full-length model shows an increase of 0.1 Å (from 1.5 Å, only considering the aligned region) to 1.6 Å and an increase in average alignment coverage of
3% (from 95% to 98%) for the full-length model. We also looked at the standard deviation of the average RMSD from TM-align and the direct superposition of the threading aligned region. In general, TM-align shows less variation compared to the one obtained using direct superposition of equivalent residues. Most sequences in the Medium set show a trend similar to that observed for the Easy set of proteins.
On average, a standard TASSER run needs
29 h of CPU time on a 1.28-GHz PIII Pentium processor for the sequences with the lengths ranging between 41200 residues. Longer sequences take more CPU time in comparison to the short sequence (a 200-residue protein needs
74 h, whereas a 43-residue protein takes
4 h). The clustering procedure, SPICKER, needs an additional average CPU time of
47 m on a 1.28-GHz PIII Pentium processor for one sequence. Hence, with the parameters used here, TASSER is not suitable for fast comparative modeling. To reduce the simulation time, we next turn to the optimization of the run time parameters.
Over a broad initial RMSD range, TASSER can refine the structure over the template
We explored the RMSD as a function of the number of total MC steps from 250 to 25000. A general decreasing trend could be observed which increases slightly after a certain number of MC steps (Fig. 1 A). We have investigated the reason for the minimum in RMSD. The targets are divided into five bins of 1 Å based on the RMSD of the template to the native, ranging from 0 to 5 Å. The dependence of average RMSD on total simulation time is shown in Fig. 1, BF, for targets in the 3540% and 8090% sequence identity ranges. As shown in Fig. 1, BF, except for the 01-Å bin (Fig. 1 B), the average RMSD of the final model (aligned region) to the native decreases with increasing number of MC steps and then reaches a plateau. For structures whose initial template has an RMSD from native in the range 01 Å, the RMSD does not improverather it becomes worse. This is simply due to the inherent resolution of the TASSER potential which is
1.2 Å. There are
16% of targets in this category and with, as would be expected, more such proteins in the high sequence identity range. The combined trend shown by targets in the 01-Å category and the other targets give rise to the observed trend of an average RMSD decrease followed by a slight increase with the total number of MC steps as in Fig. 1 A. Nevertheless, on average, the net trend is to improve the RMSD over the initial template alignment. A similar trend is observed for the other sequence identity ranges as well.
|
17 min of CPU time as compared to the original 29 h, with the requisite CPU time, and the average CPU time for clustering using SPICKER is reduced to
7 min. Next, we examined the effect of reducing Nrun from 5 to 1 On average, the RMSD with Nrun = 1 is slightly worse by
2% in comparison to Nrun = 5. Using this, Nrun is set to 1, which resulted in nearly the same result as Nrun = 5. This also resulted in reduction of the CPU time for structure clustering from
7 min (Nrun = 5) to 16 s when Nrun = 1. Thus, the various optimized parameters are Nrun = 1, Nstep = 25, and Nswap = 80 for homologous sequences, which on average requires a CPU time of 17.26 min per sequence.
Comparison of TASSER-Lite with MODELLER
We compared the results from TASSER-Lite refined models for the homologous sequences in the Easy set with the widely used homology modeling tool, MODELLER (version 8v0) (14
,22
). We provided MODELLER with the same input alignment from PROSPECTOR_3, and five models were generated per sequence. The best model for MODELLER is the one with the lowest RMSD from the native structure in the aligned region. The criterion shows the upper bound of refinement for both procedures. A summary of the RMSD for the final models obtained using MODELLER and TASSER-Lite is tabulated in Table 2. TASSER-Lite improves the RMSD in the aligned region by
10%, whereas MODELLER improves by
1.2%. This is mainly because MODELLER produces models by optimally satisfying tertiary restraints and threading templates govern the final model. However, TASSER allows movements in the relative orientation of template fragments that can generate a final model that could be significantly different from the initial template. TASSER does not improve the RMSD (in the aligned region) with respect to the initial templates for high sequence identity targets, where the distance between the target and template structure is below the inherent resolution of the TASSER potential. As observed before in Fig. 1 B, TASSER's ability to improve over the initial templates for targets with an RMSD of the template to native in the 01-Å range is limited. The number of cases increases in the high sequence identity ranges. For such targets, TASSER-Lite might not improve over the initial templates; however it will result in final models within
1 Å.
|
|
|
54% of very good templates with an initial 23-Å RMSD improve by at least 0.5 Å. Even for an initial RMSD of
45 Å, 42% of the targets improve by at least 2 Å. However, as shown in Fig. 4 B, MODELLER does not show such an improvement in the RMSD. Furthermore, we compared the corresponding overall decrease in RMSD over the aligned region. Fig. 5 A shows the plot of the fraction of targets whose RMSD becomes worse by at least the given threshold, dworse, against various initial RMSD values. In comparison to MODELLER (Fig. 5 B), the increase in RMSD is on average smaller for the TASSER models than for those generated by MODELLER. This indicates that even when TASSER is unable to refine some models over their initial template, in general, it does not make the final models worse. The investigation of 259 targets in which the RMSD over the aligned region has increased for the final model in comparison to the initial template by TASSER showed that in most of the cases (174), the native structures have extended tails, have a ligand bound, or are involved in a protein-protein interaction. The latter cases could need other partners to generate the native structure.
|
|
|
79%) have the rank-one model as the selected (best) model. Further, we compared the average RMSD in the aligned region of the rank-one model with the best model (Table 2). On average, in the aligned region the average RMSD of the rank-one model is worse (2.1 Å) than the best (1.9 Å) model. We calculated the RMSD difference (D) in the aligned region between the rank-one model and best model. The average (standard deviation) for D is 0.2 Å (1.9 Å). The high standard deviation suggests that for some of the targets the difference D is large. For 21 targets, D > 3 Å. This provides a plausible explanation for the observed poorer average RMSD with the rank-one model, despite the fact that the average rank is 1.5 for the best model.
Next, we considered the percentage of cases in which the RMSD shows an improvement in the aligned region over the initial template for the selected (best) model and rank-one model. For the selected (best) model, this is observed in 61% of cases, whereas for the rank-one model the improvement of RMSD (over the aligned region) is seen in 57% of the cases. For 10% of the targets, the best model is not the rank-one model; however, even the rank-one model shows an improvement in the RMSD over aligned region with respect to the initial template. This shows that the rank-one model shows an improvement in the RMSD with respect to the initial alignment. For both the rank-one model and best model comparison, in
10% of the cases, the RMSD for the final model remains invariant with respect to the initial template. A detailed table summarizing the results is provided at http://cssb.biology.gatech.edu/skolnick/files/tasserlite/tasserlite_data.html. Thus, the rank-one model is a reasonable choice for real world protein structure prediction.
In all the above calculations, the cluster centroid structures were used. Subsequently, we generated full-atom models using PULCHRA and compared it with the cluster centroid model, which shows an average deviation of 0.4 Å. This indicates that the above results could be used even for the full-atom models generated after PULCHRA.
The accurate modeling of loops has been a long-standing problem in comparative modeling (25
). Here, we compare the results of the unaligned loop and tail regions generated by both TASSER and MODELLER. An unaligned loop (tail) region is defined as a piece of continuous sequence that has no coordinate assignments in the middle (terminus) of a target protein in the PROSPECTOR_3 threading alignments. There are 712 unaligned regions ranging from 1 to 31 residues in length in the 897 proteins. Most loops (
97%) are
10 residues in length. We calculated two types of modeling errors for each loop (25
): RMSDlocal (the RMSD between the native and model after direct superposition of the unaligned region) and RMSDglobal (the RMSD obtained after the superposition of up to five neighboring residues). The former provides the modeling accuracy of the local conformation of the loop, and the latter value examines both the local conformation and the global orientation of the loop regions. RMSDlocal and RMSDglobal increase with increasing length of the loop in the final models in both TASSER and MODELLER protocols. However, the average deviation of the RMSDglobal from RMSDlocal for the TASSER models (0.8 Å) is less in comparison to the average deviation (1.5 Å) than those generated using MODELLER. For example, the average deviation of RMSDglobal from RMSDlocal for seven residue loops is 0.9 Å for TASSER, whereas for MODELLER it is 1.7 Å. This suggests that the global loop orientations are relatively better predicted by TASSER.
There are 607 unaligned regions either at the N- or C-terminus as given by the alignment of PROSPECTOR_3 with lengths ranging from 1 to 46 residues. Most tails (
94%) are shorter than or equal to 10 residues in length. On average, the RMSDglobal is
14% greater than RMSDlocal in the final TASSER models, whereas for the same comparison using MODELLER, the increase is
23%, which suggests that TASSER better predicts the overall tail orientation in comparison with MODELLER. For example, the TASSER final model for a 20-residue tail in 1qkkA has an RMSDlocal of 2.3 Å and an RMSDglobal of 3.6 Å, whereas the same 20-residue tail model from MODELLER has an RMSDlocal and an RMSDglobal of 7.2 Å and 9.5 Å, respectively.
On average, the CPU time for MODELLER is
1.8 min per sequence. Although TASSER requires more CPU time (
17 min), the final models are more accurate in comparison to the models generated by MODELLER. Hence, such accurate models could be used for more precise protein function prediction such as identification of ligand binding substrate specificity.
With the optimized condition of TASSER, we have a fast and efficient modeling tool referred to as TASSER-Lite. This tool is publicly available on the world wide web (http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index.html) for use by the scientific community.
| CONCLUSIONS |
|---|
|
|
|---|
29 h to
17 min for one sequence. Furthermore, on comparing TASSER-Lite with the widely used modeling tool (MODELLER), we showed that TASSER performs, on average, better than MODELLER in improving both the aligned and unaligned regions of the targets. Hence, TASSER-Lite forms an effective and fast modeling tool for the homologous sequences. | ACKNOWLEDGEMENTS |
|---|
|
|
|---|
Submitted on March 1, 2006; accepted for publication August 22, 2006.
| REFERENCES |
|---|
|
|
|---|
2. Baker, D., and A. Sali. 2001. Protein structure prediction and structural genomics. Science. 294:9396.
3. Murzin, A. G. 2001. Progress in protein structure prediction. Nat. Struct. Biol. 8:110112.[CrossRef][Medline]
4. Guex, N., and M. C. Peitsch. 1997. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis. 18:27142723.[CrossRef][Medline]
5. Sanchez, R., and A. Sali. 1997. Advances in comparative protein-structure modelling. Curr. Opin. Struct. Biol. 7:206214.[CrossRef][Medline]
6. Bowie, J. U., R. Luthy, and D. Eisenberg. 1991. A method to identify protein sequences that fold into a known three-dimensional structure. Science. 253:164170.
7. Panchenko, A. R., A. Marchler-Bauer, and S. H. Bryant. 2000. Combination of threading potentials and sequence profiles improves fold recognition. J. Mol. Biol. 296:13191331.[CrossRef][Medline]
8. Skolnick, J., and D. Kihara. 2001. Defrosting the frozen approximation: PROSPECTORa new approach to threading. Proteins. 42:319331.[CrossRef][Medline]
9. Pillardy, J., C. Czaplewski, A. Liwo, J. Lee, D. R. Ripoll, R. Kazmierkiewicz, S. Oldziej, W. J. Wedemeyer, K. D. Gibson, Y. A. Arnautova, J. Saunders, Y. J. Ye, and H. A. Scheraga. 2001. Recent improvements in prediction of protein structure by global optimization of a potential energy function. Proc. Natl. Acad. Sci. USA. 98:23292333.
10. Simons, K. T., C. Strauss, and D. Baker. 2001. Prospects for ab initio protein structural genomics. J. Mol. Biol. 306:11911199.[CrossRef][Medline]
11. Kolinski, A., and J. Skolnick. 1998. Assembly of protein structure from sparse experimental data: an efficient Monte Carlo model. Proteins. 32:475494.[CrossRef][Medline]
12. Holm, L., and C. Sander. 1996. Mapping the protein universe. Science. 273:595603.
13. Chothia, C., and A. M. Lesk. 1986. The relation between the divergence of sequence and structure in proteins. EMBO J. 5:823826.[Medline]
14. Marti-Renom, M. A., A. C. Stuart, A. Fiser, R. Sanchez, F. Melo, and A. Sali. 2000. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. 29:291325.[CrossRef][Medline]
15. Zhang, Y., and J. Skolnick. 2005. The protein structure prediction problem could be solved using the current PDB library. Proc. Natl. Acad. Sci. USA. 102:10291034.
16. Blundell, T. L., B. L. Sibanda, M. J. Sternberg, and J. M. Thornton. 1987. Knowledge-based prediction of protein structures and the design of novel molecules. Nature. 326:347352.[CrossRef][Medline]
17. Srinivasan, N., and T. L. Blundell. 1993. An evaluation of the performance of an automated procedure for comparative modelling of protein tertiary structure. Protein Eng. 6:501512.
18. Claessens, M., E. Van Cutsem, I. Lasters, and S. Wodak. 1989. Modelling the polypeptide backbone with spare parts from known protein structures. Protein Eng. 2:335345.
19. Jones, T. A., and S. Thirup. 1986. Using known substructures in protein model building and crystallography. EMBO J. 5:819822.[Medline]
20. Levitt, M. 1992. Accurate modeling of protein conformation by automatic segment matching. J. Mol. Biol. 226:507533.[CrossRef][Medline]
21. Aszodi, A., and W. R. Taylor. 1996. Homology modeling by distance geometry. Fold. Des. 1:325334.[CrossRef][Medline]
22. Sali, A., and T. L. Blundell. 1993. Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol. 234:779815.[CrossRef][Medline]
23. Srinivasan, S., C. J. March, and S. Sudarsanam. 1993. An automated method for modeling proteins on known templates using distance geometry. Protein Sci. 2:277289.[Abstract]
24. Sali, A., L. Potterton, F. Yuan, H. van Vlijmen, and M. Karplus. 1995. Evaluation of comparative protein modeling by MODELLER. Proteins. 23:318326.[CrossRef][Medline]
25. Fiser, A., R. K. Do, and A. Sali. 2000. Modeling of loops in protein structures. Protein Sci. 9:17531773.[Abstract]
26. Tramontano, A., and V. Morea. 2003. Exploiting evolutionary relationships for predicting protein structures. Biotechnol. Bioeng. 84:756762.[CrossRef][Medline]
27. Zhang, Y., and J. Skolnick. 2004. Automated structure prediction of weakly homologous proteins on a genomic scale. Proc. Natl. Acad. Sci. USA. 101:75947599.
28. Skolnick, J., D. Kihara, and Y. Zhang. 2004. Development and large scale benchmark testing of the PROSPECTOR_3 threading algorithm. Proteins. 56:502518.[CrossRef][Medline]
29. Zhang, Y., and J. Skolnick. 2004. Tertiary structure predictions on a comprehensive benchmark of medium to large size proteins. Biophys. J. 87:26472655.
30. Zhang, Y., A. K. Arakaki, and J. Skolnick. 2005. TASSER: an automated method for the prediction of protein tertiary structures in CASP6. Proteins. 61(Suppl. 7):9198.[CrossRef][Medline]
31. Berman, H. M., J. Westbrook, Z. Feng, G. Gilliland, T. N. Bhat, H. Weissig, I. N. Shindyalov, and P. E. Bourne. 2000. The Protein Data Bank. Nucleic Acids Res. 28:235242.
32. Alexandrov, N., and I. Shindyalov. 2003. PDP: protein domain parser. Bioinformatics. 19:429430.
33. Zhang, Y., A. Kolinski, and J. Skolnick. 2003. TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys. J. 85:11451164.
34. Skolnick, J., Y. Zhang, A. K. Arakaki, A. Kolinski, M. Boniecki, A. Szilagyi, and D. Kihara. 2003. TOUCHSTONE: a unified approach to protein structure prediction. Proteins. 53(Suppl. 6):469479.[CrossRef][Medline]
35. Li, W., Y. Zhang, D. Kihara, Y. J. Huang, D. Zheng, G. T. Montelione, A. Kolinski, and J. Skolnick. 2003. TOUCHSTONEX: protein structure prediction with sparse NMR data. Proteins. 53:290306.[CrossRef][Medline]
36. Zhang, Y., and J. Skolnick. 2004. SPICKER: a clustering approach to identify near-native protein folds. J. Comput. Chem. 25:865871.[CrossRef][Medline]
37. Jones, D. T. 1999. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292:195202.[CrossRef][Medline]
38. Zhang, Y., D. Kihara, and J. Skolnick. 2002. Local energy landscape flattening: parallel hyperbolic Monte Carlo sampling of protein folding. Proteins. 48:192201.[CrossRef][Medline]
39. Zhang, Y., and J. Skolnick. 2004. Scoring function for automated assessment of protein structure template quality. Proteins. 57:702710.[CrossRef][Medline]
40. Murzin, A. G., S. E. Brenner, T. Hubbard, and C. Chothia. 1995. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247:536540.[CrossRef][Medline]
41. Kabash, W. 1978. A solution for the best rotation to relate two sets of vectors. Acta Crystallogr. A32:922923.
42. Siew, N., A. Elofsson, L. Rychlewski, and D. Fischer. 2000. MaxSub: an automated measure for the assessment of protein structure prediction quality. Bioinformatics. 16:776785.
43. Zhang, Y., and J. Skolnick. 2005. TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res. 33:23022309.
This article has been cited by other articles:
![]() |
S. Y. Lee and J. Skolnick Benchmarking of TASSER_2.0: An Improved Protein Structure Prediction Algorithm with More Accurate Predicted Contact Restraints Biophys. J., August 15, 2008; 95(4): 1956 - 1964. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Guerler and E.-W. Knapp Novel protein folds and their nonsequential structural analogs Protein Sci., August 1, 2008; 17(8): 1374 - 1382. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and J. Skolnick Ab Initio Protein Structure Prediction Using Chunk-TASSER Biophys. J., September 1, 2007; 93(5): 1510 - 1518. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
| HOME | HELP | FEEDBACK | SUBSCRIPTIONS | ARCHIVE | SEARCH | TABLE OF CONTENTS |