1
|
Siew N, Fischer D. Unravelling the ORFan Puzzle. Comp Funct Genomics 2010; 4:432-41. [PMID: 18629076 PMCID: PMC2447361 DOI: 10.1002/cfg.311] [Citation(s) in RCA: 20] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2003] [Revised: 06/05/2003] [Accepted: 06/05/2003] [Indexed: 12/27/2022] Open
Abstract
ORFans are open reading frames (ORFs) with no detectable sequence similarity
to any other sequence in the databases. Each newly sequenced genome contains a
significant number of ORFans. Therefore, ORFans entail interesting evolutionary
puzzles. However, little can be learned about them using bioinformatics tools, and
their study seems to have been underemphasized. Here we present some of the
questions that the existence of so many ORFans have raised and review some of
the studies aimed at understanding ORFans, their functions and their origins. These
works have demonstrated that ORFans are an untapped source of research, requiring
further computational and experimental studies.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University, Beer-Sheva 84105, Israel
| | | |
Collapse
|
2
|
|
3
|
Fischer D. Servers for protein structure prediction. Curr Opin Struct Biol 2006; 16:178-82. [PMID: 16546376 DOI: 10.1016/j.sbi.2006.03.004] [Citation(s) in RCA: 67] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2006] [Revised: 02/14/2006] [Accepted: 03/07/2006] [Indexed: 11/18/2022]
Abstract
The 1990s cultivated a generation of protein structure human predictors. As a result of structural genomics and genome sequencing projects, and significant improvements in the performance of protein structure prediction methods, a generation of automated servers has evolved in the past few years. Servers for close and distant homology modeling are now routinely used by many biologists, and have already been applied to the experimental structure determination process itself, and to the interpretation and annotation of genome sequences. Because dozens of servers are currently available, it is hard for a biologist to know which server(s) to use; however, the state of the art of these methods is now assessed through the LiveBench and CAFASP experiments. Meta-servers--servers that use the results of other autonomous servers to produce a consensus prediction--have proven to be the best performers, and are already challenging all but a handful of expert human predictors. The difference in performance of the top ten autonomous (non-meta) servers is small and hard to assess using relatively small test sets. Recent experiments suggest that servers will soon free humans from most of the burden of protein structure prediction.
Collapse
Affiliation(s)
- Daniel Fischer
- Buffalo Center of Excellence in Bioinformatics, and Department of Computer Science and Engineering, State University of New York at Buffalo, Buffalo, NY 14260, USA.
| |
Collapse
|
4
|
Rychlewski L, Fischer D. LiveBench-8: the large-scale, continuous assessment of automated protein structure prediction. Protein Sci 2005; 14:240-5. [PMID: 15608124 PMCID: PMC2253323 DOI: 10.1110/ps.04888805] [Citation(s) in RCA: 64] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/26/2022]
Abstract
We present the results of the evaluation of the latest LiveBench-8 experiment. These results provide a snapshot view of the state of the art in automated protein structure prediction, just before the 2004 CAFASP-4/CASP-6 experiments begin. The last CAFASP/CASP experiments demonstrated that automated meta-predictors entail a significant advance in the field, already challenging most human expert predictors. LiveBench-8 corroborates the superior performance of meta-predictors, which are able to produce useful predictions for over one-half of the test targets. More importantly, LiveBench-8 identifies a handful of recently developed autonomous (nonmeta) servers that perform at the very top, suggesting that further progress in the individual methods has recently been obtained.
Collapse
|
5
|
Abstract
The performance of the 3DS3 and 3DS5 3D-SHOTGUN meta-predictors in CAFASP3 is reported. The 3D-SHOTGUN meta-predictors are fully automatic fold recognition servers that attempt to incorporate into the prediction process a number of successful strategies that human predictors often apply. Namely, the input to 3D-SHOTGUN are the top five models predicted by a number of independent fold recognition servers and its output are hybrid models, assembled by using the recurrent structural information from the input models. The resulting hybrid models are, on average, more accurate and more complete than the input models. When evaluated on a large set of prediction targets, the 3D-SHOTGUN servers show increased sensitivities and significantly better specificities. For CAFASP3, the 3DS3 and 3DS3 and 3DS5 used a preliminary implementation of the 3D-SHOTGUN method, which lacked a refinement step. Although this did not have a significant effect on the easier targets, for the hardest prediction targets, where the input models had significant structural conflicts, the 3D-SHOTGUN models contained a number of non-native-like features such as fragmentation and overlaps. The CAFASP3 evaluation identified the 3D-SHOTGUN meta-predictors within the top three most sensitive and most specific servers. A fully automated refinement step to the 3D-SHOTGUN method is currently being implemented, and preliminary results indicate that in addition to "cleaning up" such undesirable features, it is able to further increase the accuracy of the resulting models.
Collapse
Affiliation(s)
- Daniel Fischer
- Bioinformatics, Department of Computer Science, Ben Gurion University, Beer-Sheva, Israel.
| |
Collapse
|
6
|
Sasson I, Fischer D. Modeling three-dimensional protein structures for CASP5 using the 3D-SHOTGUN meta-predictors. Proteins 2004; 53 Suppl 6:389-94. [PMID: 14579327 DOI: 10.1002/prot.10544] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
Full-atom models were generated for all CASP5 targets by using the fully automated 3D-SHOTGUN fold recognition meta-predictors (Fischer D, Proteins 2003;51:434-441). The 3D-SHOTGUN meta-predictors assemble hybrid 3D models by combining structural information of a number of independently generated, fold recognition models. At the time CASP5 took place, the 3D-SHOTGUN servers generated unrefined C(alpha)-only models. Fischer's participation in CASP had three main goals. The first was to test the value of using 3D-SHOTGUN models as input to a refinement procedure. The second goal was to test whether human intervention could result in a better performance than that of the automated servers. The third goal was to evaluate which human procedures, not yet implemented within the 3D-SHOTGUN servers, can be implemented in the future. For CASP5, our group's predictions applied a very simple approach using the multiple parent option of the Modeller program (Sali and Blundell, J Mol Biol 1993;234:779-815). The input to Modeller was different combinations of the unrefined 3D-SHOTGUN models and the sequence-template alignments used by 3D-SHOTGUN's assembly step. Our evaluation of the accuracies of the refined versus the SHOTGUN models shows that the refined models were consistently slightly more accurate than SHOTGUN's. For a few targets, the manual use of the information from the CAFASP servers resulted in better human models. This manual intervention was particularly valuable in the identification of domains, still a difficult feature for automated servers. The CASP5 results indicate that 3D-SHOTGUN's hybrid models can be a valuable starting point for full-atom refinement and that the resulting refined models are, on average, more accurate than those produced by the servers. Thus, we conclude that our three goals were achieved. A preliminary automated version of the refinement procedure, named SHGUM, is now available.
Collapse
Affiliation(s)
- Iris Sasson
- Bioinformatics, Department of Computer Science, Ben Gurion University, Beer-Sheva, Israel
| | | |
Collapse
|
7
|
Abstract
Singleton sequence ORFans are orphan ORFs (open reading frames) that have no detectable sequence similarity to any other sequence in the databases. ORFans are of particular interest not only as evolutionary puzzles but also because we can learn little about them using bioinformatics tools. Here, we present a first systematic analysis of singleton ORFans in the first 60 fully sequenced microbial genomes. We show that although ORFans have been underemphasized, the number of ORFans is steadily growing, currently accounting for 23,634 sequences. At the same time, the percentage of ORFans as a fraction of all sequences is slowly diminishing, and is currently about 14%. Short ORFans comprise about 61% of all ORFans. The abundance of short ORFans may be due to a yet unexplained artifact. The data also suggest that the number of longer ORFans may soon diminish as more genomes of closely related organisms become available. To better address the questions about the functions and origins of ORFans, we propose to focus further studies on the longer ORFans, with emphasis on three new types of ORFans: ORFan modules, paralogous ORFans, and orthologous ORFans. We conclude that the large number of ORFans reflects an intrinsic property of the genetic material not yet fully understood. Further computational and experimental studies aimed at understanding Nature's protein diversity should also include ORFans.
Collapse
Affiliation(s)
- Naomi Siew
- Department of Chemistry, Ben Gurion University, Beer-Sheva, Israel
| | | |
Collapse
|
8
|
Ginalski K, Rychlewski L. Detection of reliable and unexpected protein fold predictions using 3D-Jury. Nucleic Acids Res 2003; 31:3291-2. [PMID: 12824309 PMCID: PMC168910 DOI: 10.1093/nar/gkg503] [Citation(s) in RCA: 55] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/14/2022] Open
Abstract
3D-Jury is a fully automated protein structure meta prediction system accessible via the Meta Server interface (http://BioInfo.PL/Meta). This is one of the meta predictors, which have made a dramatic, unprecedented impact on the last CASP-5 experiment. The 3D-Jury is comparable with other meta servers but it has the highest combined specificity and sensitivity. The presented method is also very simple and versatile and can be used to create meta predictions even from sets of models produced by humans. An additional and very important and novel feature of the system is the high correlation between the reported confidence score and the accuracy of the model. The number of correctly predicted residues can be estimated directly from the prediction score. The high reliability of the method enables any biologist to submit a target of interest to the Meta Server and screen with relatively high confidence, whether the target can be predicted by fold recognition methods while being unpredictable using standard approaches like PSI-Blast. This can point to interesting relationships which could have been missed in annotations of proteins or genomes and provide very valuable information for novel scientific discoveries.
Collapse
Affiliation(s)
- Krzysztof Ginalski
- Bioinformatics Laboratory, BioInfoBank Institute, ul. Limanowskiego 24A, 60-744 Poznan, Poland
| | | |
Collapse
|
9
|
Abstract
The summer of every even year is considered by the protein structure prediction community as the Olympic Games season, because in addition to a number of continuous benchmarking experiments such as LiveBench, much effort is invested in the blind prediction experiments CASP and CAFASP. Here we report the major advances registered in the field since the last Games of 2000, as measured by the recently completed LiveBench-4 experiment. These results provide a timely measure of the capabilities of current methods and of their expected performance in the upcoming CASP-5 and CAFASP-3 experiments. We also describe the initiation of the two new, community-wide experiments, PDB-CAFASP and MR-CAFASP. These new experiments extend the scope of previous efforts and may have important implications for structural genomics.
Collapse
Affiliation(s)
- Daniel Fischer
- Bioinformatics, Department of Computer Science, Ben Gurion University, Beer-Sheva 84015, Israel.
| | | |
Collapse
|
10
|
Fischer D, Elofsson A, Rychlewski L, Pazos F, Valencia A, Rost B, Ortiz AR, Dunbrack RL. CAFASP2: the second critical assessment of fully automated structure prediction methods. Proteins 2002; Suppl 5:171-83. [PMID: 11835495 DOI: 10.1002/prot.10036] [Citation(s) in RCA: 86] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
The results of the second Critical Assessment of Fully Automated Structure Prediction (CAFASP2) are presented. The goals of CAFASP are to (i) assess the performance of fully automatic web servers for structure prediction, by using the same blind prediction targets as those used at CASP4, (ii) inform the community of users about the capabilities of the servers, (iii) allow human groups participating in CASP to use and analyze the results of the servers while preparing their nonautomated predictions for CASP, and (iv) compare the performance of the automated servers to that of the human-expert groups of CASP. More than 30 servers from around the world participated in CAFASP2, covering all categories of structure prediction. The category with the largest participation was fold recognition, where 24 CAFASP servers filed predictions along with 103 other CASP human groups. The CAFASP evaluation indicated that it is difficult to establish an exact ranking of the servers because the number of prediction targets was relatively small and the differences among many servers were also small. However, roughly a group of five "best" fold recognition servers could be identified. The CASP evaluation identified the same group of top servers albeit with a slightly different relative order. Both evaluations ranked a semiautomated method named CAFASP-CONSENSUS, that filed predictions using the CAFASP results of the servers, above any of the individual servers. Although the predictions of the CAFASP servers were available to human CASP predictors before the CASP submission deadline, the CASP assessment identified only 11 human groups that performed better than the best server. Furthermore, about one fourth of the top 30 performing groups corresponded to automated servers. At least half of the top 11 groups corresponded to human groups that also had a server in CAFASP or to human groups that used the CAFASP results to prepare their predictions. In particular, the CAFASP-CONSENSUS group was ranked 7. This shows that the automated predictions of the servers can be very helpful to human predictors. We conclude that as servers continue to improve, they will become increasingly important in any prediction process, especially when dealing with genome-scale prediction tasks. We expect that in the near future, the performance difference between humans and machines will continue to narrow and that fully automated structure prediction will become an effective companion and complement to experimental structural genomics.
Collapse
Affiliation(s)
- D Fischer
- Bioinformatics, Department of Computer Science, Ben Gurion University, Beer-Sheva, Israel.
| | | | | | | | | | | | | | | |
Collapse
|