Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008;4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

For:	Punta M, Ofran Y. The rough guide to in silico function prediction, or how to use sequence and structure information to predict protein function. PLoS Comput Biol 2008;4:e1000160. [PMID: 18974821 PMCID: PMC2518264 DOI: 10.1371/journal.pcbi.1000160] [Citation(s) in RCA: 66] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022] Open

Number

Cited by Other Article(s)

Cui XC, Zheng Y, Liu Y, Yuchi Z, Yuan YJ. AI-driven de novo enzyme design: Strategies, applications, and future prospects. Biotechnol Adv 2025;82:108603. [PMID: 40368118 DOI: 10.1016/j.biotechadv.2025.108603] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/02/2025] [Revised: 04/22/2025] [Accepted: 05/10/2025] [Indexed: 05/16/2025]

Schottlender G, Prieto JM, Clemente C, Schuster CD, Dumas V, Fernández Do Porto D, Martí MA. Bacterial cytochrome P450s: a bioinformatics odyssey of substrate discovery. Front Microbiol 2024;15:1343029. [PMID: 38384262 PMCID: PMC10879549 DOI: 10.3389/fmicb.2024.1343029] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/22/2023] [Accepted: 01/23/2024] [Indexed: 02/23/2024] Open

Russo ET, Barone F, Bateman A, Cozzini S, Punta M, Laio A. DPCfam: Unsupervised protein family classification by Density Peak Clustering of large sequence datasets. PLoS Comput Biol 2022;18:e1010610. [PMID: 36260616 PMCID: PMC9621593 DOI: 10.1371/journal.pcbi.1010610] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Revised: 10/31/2022] [Accepted: 09/26/2022] [Indexed: 11/07/2022] Open

Raghavan V, Kraft L, Mesny F, Rigerte L. A simple guide to de novo transcriptome assembly and annotation. Brief Bioinform 2022;23:6514404. [PMID: 35076693 PMCID: PMC8921630 DOI: 10.1093/bib/bbab563] [Citation(s) in RCA: 53] [Impact Index Per Article: 17.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2021] [Revised: 12/03/2021] [Accepted: 12/09/2021] [Indexed: 12/13/2022] Open

Russo ET, Laio A, Punta M. Density Peak clustering of protein sequences associated to a Pfam clan reveals clear similarities and interesting differences with respect to manual family annotation. BMC Bioinformatics 2021;22:121. [PMID: 33711918 PMCID: PMC7955657 DOI: 10.1186/s12859-021-04013-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2020] [Accepted: 02/09/2021] [Indexed: 11/24/2022] Open

Abstract

BACKGROUND

The identification of protein families is of outstanding practical importance for in silico protein annotation and is at the basis of several bioinformatic resources. Pfam is possibly the most well known protein family database, built in many years of work by domain experts with extensive use of manual curation. This approach is generally very accurate, but it is quite time consuming and it may suffer from a bias generated from the hand-curation itself, which is often guided by the available experimental evidence.

RESULTS

We introduce a procedure that aims to identify automatically putative protein families. The procedure is based on Density Peak Clustering and uses as input only local pairwise alignments between protein sequences. In the experiment we present here, we ran the algorithm on about 4000 full-length proteins with at least one domain classified by Pfam as belonging to the Pseudouridine synthase and Archaeosine transglycosylase (PUA) clan. We obtained 71 automatically-generated sequence clusters with at least 100 members. While our clusters were largely consistent with the Pfam classification, showing good overlap with either single or multi-domain Pfam family architectures, we also observed some inconsistencies. The latter were inspected using structural and sequence based evidence, which suggested that the automatic classification captured evolutionary signals reflecting non-trivial features of protein family architectures. Based on this analysis we identified a putative novel pre-PUA domain as well as alternative boundaries for a few PUA or PUA-associated families. As a first indication that our approach was unlikely to be clan-specific, we performed the same analysis on the P53 clan, obtaining comparable results.

CONCLUSIONS

The clustering procedure described in this work takes advantage of the information contained in a large set of pairwise alignments and successfully identifies a set of putative families and family architectures in an unsupervised manner. Comparison with the Pfam classification highlights significant overlap and points to interesting differences, suggesting that our new algorithm could have potential in applications related to automatic protein classification. Testing this hypothesis, however, will require further experiments on large and diverse sequence datasets.

Collapse

Nethathe B, Abera A, Naidoo V. Expression and phylogeny of multidrug resistance protein 2 and 4 in African white backed vulture (Gyps africanus). PeerJ 2020;8:e10422. [PMID: 33344079 PMCID: PMC7718797 DOI: 10.7717/peerj.10422] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2020] [Accepted: 11/02/2020] [Indexed: 11/20/2022] Open

Gulyaeva AA, Sigorskih AI, Ocheredko ES, Samborskiy DV, Gorbalenya AE. LAMPA, LArge Multidomain Protein Annotator, and its application to RNA virus polyproteins. Bioinformatics 2020;36:2731-2739. [PMID: 32003788 PMCID: PMC7203729 DOI: 10.1093/bioinformatics/btaa065] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2019] [Revised: 01/02/2020] [Accepted: 01/23/2020] [Indexed: 12/28/2022] Open

Abstract

Motivation

To facilitate accurate estimation of statistical significance of sequence similarity in profile–profile searches, queries should ideally correspond to protein domains. For multidomain proteins, using domains as queries depends on delineation of domain borders, which may be unknown. Thus, proteins are commonly used as queries that complicate establishing homology for similarities close to cutoff levels of statistical significance.

Results

In this article, we describe an iterative approach, called LAMPA, LArge Multidomain Protein Annotator, that resolves the above conundrum by gradual expansion of hit coverage of multidomain proteins through re-evaluating statistical significance of hit similarity using ever smaller queries defined at each iteration. LAMPA employs TMHMM and HHsearch for recognition of transmembrane regions and homology, respectively. We used Pfam database for annotating 2985 multidomain proteins (polyproteins) composed of >1000 amino acid residues, which dominate proteomes of RNA viruses. Under strict cutoffs, LAMPA outperformed HHsearch-mediated runs using intact polyproteins as queries by three measures: number of and coverage by identified homologous regions, and number of hit Pfam profiles. Compared to HHsearch, LAMPA identified 507 extra homologous regions in 14.4% of polyproteins. This Pfam-based annotation of RNA virus polyproteins by LAMPA was also superior to RefSeq expert annotation by two measures, region number and annotated length, for 69.3% of RNA virus polyprotein entries. We rationalized the obtained results based on dependencies of HHsearch hit statistical significance for local alignment similarity score from lengths and diversities of query-target pairs in computational experiments.

Availability and implementation

LAMPA 1.0.0 R package is placed at github (https://github.com/Gorbalenya-Lab/LAMPA).

Supplementary information

Supplementary data are available at Bioinformatics online.

Collapse

Mishra S, Rastogi YP, Jabin S, Kaur P, Amir M, Khatoon S. A bacterial phyla dataset for protein function prediction. Data Brief 2019;28:105002. [PMID: 31921945 PMCID: PMC6950771 DOI: 10.1016/j.dib.2019.105002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2019] [Accepted: 12/06/2019] [Indexed: 11/29/2022] Open

Abstract

Protein function prediction has been the most worked upon and the most challenging problem for computational biologists. The vast majority of known proteins have yet not been characterised experimentally, and there is significant gap between their structures and functions. New un-annotated sequences are being added to the public protein databases (e.g. UniprotKB) at an enormous pace [1]. Such proteins with unknown functions might play key role in the metabolism, growth and development regulation. Thus, if functions of unknown proteins left undiscovered, researchers may skip important information(s). Based on their sequence, structure, evolutionary history, and their association with other proteins, tools of computational biology can provide insights into the function of proteins [2]. For proteins with well characterised close relatives, it is trivial to infer function. Orphan proteins without discernible sequence relatives present a greater challenge [3]. Here the task of experimental characterisation is blind and becomes unwieldy. It is highly unlikely that all known proteins will ever be completely experimentally characterised [4]. Thus, there is an emergent need to develop fast and accurate computational approaches to fulfil this requirement. Towards this end, we prepared a dataset for protein function prediction by extracting protein sequences and annotations of reviewed prokaryotic proteins (total count 323,719 as accessed on date March 10, 2019) belonging to 9 bacterial phyla Actinobacteria, Bacteroidetes, Chlamydiae, Cyanobacteria, Firmicutes, Fusobacteria, Proteobacteria, Spirochaetes and Tenericutes. Corresponding to the most frequent 1739 Gene Ontology (Molecular Function) terms, samples were filtered, and 171,212 proteins were retrieved for feature generation. The Dataset was generated by calculating the sequence, sub-sequence, physiochemical, annotation-based features for each 171,212 reviewed proteins using method in [10].

These features constitute a total of 9890 attributes for each sequence of protein along with 1739 Gene Ontology terms. Each protein sequence is assigned one or more of 1739 Gene Ontology (Molecular Function) term as its target label. The Dataset contains the Entry and Entry name of each sequence corresponding to UniprotKB Database. This dataset being huge in size (171,212 samples X 9890 features, 1739 classes with multiple values) and equipped with enough number of positive and negative samples of each 1739 class, is good for testing efficiency of any upcoming deep learning models [5]. We divided the full dataset of 171,212 reviewed proteins in the ratio 3:1 to form Train/Test dataset 1; train dataset with 128,409 samples and test dataset with 42,803 samples to facilitate training of a deep learning model. The train and test datasets are stratified to contain good proportion of each 1739 classes. We then prepared a dataset 2 of pathogenic unreviewed proteins of the 9 bacterial phyla each with 9890 features same as train/train dataset of reviewed proteins but without target labels in order to predict their functions using deep learning model proposed in [5].

Collapse

Sim EUH, Talwar SP. In silico evidence of de novo interactions between ribosomal and Epstein - Barr virus proteins. BMC Mol Cell Biol 2019;20:34. [PMID: 31416416 PMCID: PMC6694676 DOI: 10.1186/s12860-019-0219-y] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2019] [Accepted: 08/08/2019] [Indexed: 12/29/2022] Open

Hitch TCA, Clavel T. A proposed update for the classification and description of bacterial lipolytic enzymes. PeerJ 2019;7:e7249. [PMID: 31328034 PMCID: PMC6622161 DOI: 10.7717/peerj.7249] [Citation(s) in RCA: 34] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2019] [Accepted: 06/03/2019] [Indexed: 11/23/2022] Open

Sánchez-Reyez A, Batista-García RA, Valdés-García G, Ortiz E, Perezgasga L, Zárate-Romero A, Pastor N, Folch-Mallol JL. A family 13 thioesterase isolated from an activated sludge metagenome: Insights into aromatic compounds metabolism. Proteins 2017;85:1222-1237. [DOI: 10.1002/prot.25282] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/26/2016] [Revised: 02/21/2017] [Accepted: 02/27/2017] [Indexed: 12/23/2022]

De-novo protein function prediction using DNA binding and RNA binding proteins as a test case. Nat Commun 2016;7:13424. [PMID: 27869118 PMCID: PMC5121330 DOI: 10.1038/ncomms13424] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/14/2016] [Accepted: 10/03/2016] [Indexed: 12/14/2022] Open

Kaushik R, Jayaram B. Structural difficulty index: a reliable measure for modelability of protein tertiary structures. Protein Eng Des Sel 2016;29:391-7. [PMID: 27334454 DOI: 10.1093/protein/gzw025] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/13/2016] [Accepted: 05/27/2016] [Indexed: 11/13/2022] Open

Rivera-Borroto OM, García-de la Vega JM, Marrero-Ponce Y, Grau R. Relational Agreement Measures for Similarity Searching of Cheminformatic Data Sets. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:158-67. [PMID: 26886740 DOI: 10.1109/tcbb.2015.2424435] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/27/2023]

GoFDR: A sequence alignment based method for predicting protein functions. Methods 2016;93:3-14. [DOI: 10.1016/j.ymeth.2015.08.009] [Citation(s) in RCA: 42] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2015] [Revised: 07/27/2015] [Accepted: 08/11/2015] [Indexed: 01/01/2023] Open

Mudgal R, Sandhya S, Chandra N, Srinivasan N. De-DUFing the DUFs: Deciphering distant evolutionary relationships of Domains of Unknown Function using sensitive homology detection methods. Biol Direct 2015;10:38. [PMID: 26228684 PMCID: PMC4520260 DOI: 10.1186/s13062-015-0069-2] [Citation(s) in RCA: 32] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2015] [Accepted: 07/20/2015] [Indexed: 12/23/2022] Open

Abstract

Background

In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.

Results

We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.

Conclusions

This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.

Reviewers

This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.

Electronic supplementary material

The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.

Collapse

Guna A, Butcher NJ, Bassett AS. Comparative mapping of the 22q11.2 deletion region and the potential of simple model organisms. J Neurodev Disord 2015;7:18. [PMID: 26137170 PMCID: PMC4487986 DOI: 10.1186/s11689-015-9113-x] [Citation(s) in RCA: 81] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 02/23/2015] [Accepted: 05/26/2015] [Indexed: 01/18/2023] Open

Abstract

Background

22q11.2 deletion syndrome (22q11.2DS) is the most common micro-deletion syndrome. The associated 22q11.2 deletion conveys the strongest known molecular risk for schizophrenia. Neurodevelopmental phenotypes, including intellectual disability, are also prominent though variable in severity. Other developmental features include congenital cardiac and craniofacial anomalies. Whereas existing mouse models have been helpful in determining the role of some genes overlapped by the hemizygous 22q11.2 deletion in phenotypic expression, much remains unknown. Simple model organisms remain largely unexploited in exploring these genotype-phenotype relationships.

Methods

We first developed a comprehensive map of the human 22q11.2 deletion region, delineating gene content, and brain expression. To identify putative orthologs, standard methods were used to interrogate the proteomes of the zebrafish (D. rerio), fruit fly (D. melanogaster), and worm (C. elegans), in addition to the mouse. Spatial locations of conserved homologues were mapped to examine syntenic relationships. We systematically cataloged available knockout and knockdown models of all conserved genes across these organisms, including a comprehensive review of associated phenotypes.

Results

There are 90 genes overlapped by the typical 2.5 Mb deletion 22q11.2 region. Of the 46 protein-coding genes, 41 (89.1 %) have documented expression in the human brain. Identified homologues in the zebrafish (n = 37, 80.4 %) were comparable to those in the mouse (n = 40, 86.9 %) and included some conserved gene cluster structures. There were 22 (47.8 %) putative homologues in the fruit fly and 17 (37.0 %) in the worm involving multiple chromosomes. Individual gene knockdown mutants were available for the simple model organisms, but not for mouse. Although phenotypic data were relatively limited for knockout and knockdown models of the 17 genes conserved across all species, there was some evidence for roles in neurodevelopmental phenotypes, including four of the six mitochondrial genes in the 22q11.2 deletion region.

Conclusions

Simple model organisms represent a powerful but underutilized means of investigating the molecular mechanisms underlying the elevated risk for neurodevelopmental disorders in 22q11.2DS. This comparative multi-species study provides novel resources and support for the potential utility of non-mouse models in expression studies and high-throughput drug screening. The approach has implications for other recurrent copy number variations associated with neurodevelopmental phenotypes.

Electronic supplementary material

The online version of this article (doi:10.1186/s11689-015-9113-x) contains supplementary material, which is available to authorized users.

Collapse

Das S, Sillitoe I, Lee D, Lees JG, Dawson NL, Ward J, Orengo CA. CATH FunFHMMer web server: protein functional annotations using functional family assignments. Nucleic Acids Res 2015;43:W148-53. [PMID: 25964299 PMCID: PMC4489299 DOI: 10.1093/nar/gkv488] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 05/02/2015] [Indexed: 12/20/2022] Open

In-depth characterisation of the lamb meat proteome from longissimus lumborum. EUPA OPEN PROTEOMICS 2015. [DOI: 10.1016/j.euprot.2015.01.001] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 02/06/2023]

Chiang Z, Vastermark A, Punta M, Coggill PC, Mistry J, Finn RD, Saier MH. The complexity, challenges and benefits of comparing two transporter classification systems in TCDB and Pfam. Brief Bioinform 2015;16:865-72. [PMID: 25614388 PMCID: PMC4570203 DOI: 10.1093/bib/bbu053] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2014] [Indexed: 01/04/2023] Open

Jiang Y, Clark WT, Friedberg I, Radivojac P. The impact of incomplete knowledge on the evaluation of protein function prediction: a structured-output learning perspective. ACTA ACUST UNITED AC 2015;30:i609-16. [PMID: 25161254 PMCID: PMC4147924 DOI: 10.1093/bioinformatics/btu472] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Abstract

Motivation: The automated functional annotation of biological macromolecules is a problem of computational assignment of biological concepts or ontological terms to genes and gene products. A number of methods have been developed to computationally annotate genes using standardized nomenclature such as Gene Ontology (GO). However, questions remain about the possibility for development of accurate methods that can integrate disparate molecular data as well as about an unbiased evaluation of these methods. One important concern is that experimental annotations of proteins are incomplete. This raises questions as to whether and to what degree currently available data can be reliably used to train computational models and estimate their performance accuracy.

Results: We study the effect of incomplete experimental annotations on the reliability of performance evaluation in protein function prediction. Using the structured-output learning framework, we provide theoretical analyses and carry out simulations to characterize the effect of growing experimental annotations on the correctness and stability of performance estimates corresponding to different types of methods. We then analyze real biological data by simulating the prediction, evaluation and subsequent re-evaluation (after additional experimental annotations become available) of GO term predictions. Our results agree with previous observations that incomplete and accumulating experimental annotations have the potential to significantly impact accuracy assessments. We find that their influence reflects a complex interplay between the prediction algorithm, performance metric and underlying ontology. However, using the available experimental data and under realistic assumptions, our results also suggest that current large-scale evaluations are meaningful and almost surprisingly reliable.

Contact:predrag@indiana.edu

Supplementary information:Supplementary data are available at Bioinformatics online.

Collapse

Koskinen P, Törönen P, Nokso-Koivisto J, Holm L. PANNZER: high-throughput functional annotation of uncharacterized proteins in an error-prone environment. ACTA ACUST UNITED AC 2015;31:1544-52. [PMID: 25653249 DOI: 10.1093/bioinformatics/btu851] [Citation(s) in RCA: 90] [Impact Index Per Article: 9.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2014] [Accepted: 12/24/2014] [Indexed: 01/06/2023]

Naqvi AAT, Ahmad F, Hassan MI. Identification of functional candidates amongst hypothetical proteins of Mycobacterium leprae Br4923, a causative agent of leprosy. Genome 2015;58:25-42. [DOI: 10.1139/gen-2014-0178] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]

An ontology for microbial phenotypes. BMC Microbiol 2014;14:294. [PMID: 25433798 PMCID: PMC4287307 DOI: 10.1186/s12866-014-0294-3] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2014] [Accepted: 11/11/2014] [Indexed: 01/09/2023] Open

Exploring function prediction in protein interaction networks via clustering methods. PLoS One 2014;9:e99755. [PMID: 24972109 PMCID: PMC4074043 DOI: 10.1371/journal.pone.0099755] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2014] [Accepted: 05/17/2014] [Indexed: 01/06/2023] Open

Pitkänen E, Jouhten P, Hou J, Syed MF, Blomberg P, Kludas J, Oja M, Holm L, Penttilä M, Rousu J, Arvas M. Comparative genome-scale reconstruction of gapless metabolic networks for present and ancestral species. PLoS Comput Biol 2014;10:e1003465. [PMID: 24516375 PMCID: PMC3916221 DOI: 10.1371/journal.pcbi.1003465] [Citation(s) in RCA: 66] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2013] [Accepted: 12/18/2013] [Indexed: 12/12/2022] Open

Abstract

We introduce a novel computational approach, CoReCo, for comparative metabolic reconstruction and provide genome-scale metabolic network models for 49 important fungal species. Leveraging on the exponential growth in sequenced genome availability, our method reconstructs genome-scale gapless metabolic networks simultaneously for a large number of species by integrating sequence data in a probabilistic framework. High reconstruction accuracy is demonstrated by comparisons to the well-curated Saccharomyces cerevisiae consensus model and large-scale knock-out experiments. Our comparative approach is particularly useful in scenarios where the quality of available sequence data is lacking, and when reconstructing evolutionary distant species. Moreover, the reconstructed networks are fully carbon mapped, allowing their use in 13C flux analysis. We demonstrate the functionality and usability of the reconstructed fungal models with computational steady-state biomass production experiment, as these fungi include some of the most important production organisms in industrial biotechnology. In contrast to many existing reconstruction techniques, only minimal manual effort is required before the reconstructed models are usable in flux balance experiments. CoReCo is available at http://esaskar.github.io/CoReCo/.

Advances in next-generation sequencing technologies are revolutionizing molecular biology. Sequencing-enabled cost-effective characterization of microbial genomes is a particularly exciting development in metabolic engineering. There, considerable effort has been put to reconstructing genome-scale metabolic networks that describe the collection of hundreds to thousands of biochemical reactions available for a microbial cell. These network models are instrumental in understanding microbial metabolism and guiding metabolic engineering efforts to improve biochemical yields. We have developed a novel computational method, CoReCo, which bridges the growing gap between the availability of sequenced genomes and respective reconstructed metabolic networks. The method reconstructs genome-scale metabolic networks simultaneously for related microbial species. It utilizes the available sequencing data from these species to correct for incomplete and missing data. We used the method to reconstruct metabolic networks for a set of 49 fungal species providing the method protein sequence data and a phylogenetic tree describing the evolutionary relationships between the species. We demonstrate the applicability of the method by comparing a metabolic reconstruction of Saccharomyces cerevisiae to the manually curated, high-quality consensus network. We also provide an easy-to-use implementation of the method, usable both in single computer and distributed computing environments.

Collapse

Feiglin A, Ashkenazi S, Schlessinger A, Rost B, Ofran Y. Co-expression and co-localization of hub proteins and their partners are encoded in protein sequence. MOLECULAR BIOSYSTEMS 2014;10:787-94. [PMID: 24457447 DOI: 10.1039/c3mb70411d] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Puggioni V, Dondi A, Folli C, Shin I, Rhee S, Percudani R. Gene Context Analysis Reveals Functional Divergence between Hypothetically Equivalent Enzymes of the Purine–Ureide Pathway. Biochemistry 2014;53:735-45. [DOI: 10.1021/bi4010107] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022]

Rahimi A, Madadkar-Sobhani A, Touserkani R, Goliaei B. Efficacy of function specific 3D-motifs in enzyme classification according to their EC-numbers. J Theor Biol 2013;336:36-43. [PMID: 23871713 DOI: 10.1016/j.jtbi.2013.07.003] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/08/2013] [Revised: 06/05/2013] [Accepted: 07/02/2013] [Indexed: 11/28/2022]

Diniz MC, Pacheco ACL, Farias KM, de Oliveira DM. The eukaryotic flagellum makes the day: novel and unforeseen roles uncovered after post-genomics and proteomics data. Curr Protein Pept Sci 2013;13:524-46. [PMID: 22708495 PMCID: PMC3499766 DOI: 10.2174/138920312803582951] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2011] [Revised: 05/22/2012] [Accepted: 05/23/2012] [Indexed: 12/21/2022]

Cock PJA, Grüning BA, Paszkiewicz K, Pritchard L. Galaxy tools and workflows for sequence analysis with applications in molecular plant pathology. PeerJ 2013;1:e167. [PMID: 24109552 PMCID: PMC3792188 DOI: 10.7717/peerj.167] [Citation(s) in RCA: 112] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2013] [Accepted: 08/30/2013] [Indexed: 12/28/2022] Open

Malhotra A, Creer S, Harris JB, Stöcklin R, Favreau P, Thorpe RS. Predicting function from sequence in a large multifunctional toxin family. Toxicon 2013;72:113-25. [PMID: 23831284 DOI: 10.1016/j.toxicon.2013.06.019] [Citation(s) in RCA: 11] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2013] [Revised: 06/21/2013] [Accepted: 06/26/2013] [Indexed: 11/30/2022]

Bromberg Y. Chapter 15: disease gene prioritization. PLoS Comput Biol 2013;9:e1002902. [PMID: 23633938 PMCID: PMC3635969 DOI: 10.1371/journal.pcbi.1002902] [Citation(s) in RCA: 47] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/29/2022] Open

Mistry J, Finn RD, Eddy SR, Bateman A, Punta M. Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions. Nucleic Acids Res 2013;41:e121. [PMID: 23598997 PMCID: PMC3695513 DOI: 10.1093/nar/gkt263] [Citation(s) in RCA: 1032] [Impact Index Per Article: 86.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022] Open

Nam HJ, Han SK, Bowie JU, Kim S. Rampant exchange of the structure and function of extramembrane domains between membrane and water soluble proteins. PLoS Comput Biol 2013;9:e1002997. [PMID: 23555228 PMCID: PMC3605051 DOI: 10.1371/journal.pcbi.1002997] [Citation(s) in RCA: 8] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2012] [Accepted: 02/04/2013] [Indexed: 11/19/2022] Open

Kolodny R, Kosloff M. From Protein Structure to Function via Computational Tools and Approaches. Isr J Chem 2013. [DOI: 10.1002/ijch.201200078] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/08/2023]

Fang H, Gough J. A domain-centric solution to functional genomics via dcGO Predictor. BMC Bioinformatics 2013;14 Suppl 3:S9. [PMID: 23514627 PMCID: PMC3584936 DOI: 10.1186/1471-2105-14-s3-s9] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Abstract

Background

Computational/manual annotations of protein functions are one of the first routes to making sense of a newly sequenced genome. Protein domain predictions form an essential part of this annotation process. This is due to the natural modularity of proteins with domains as structural, evolutionary and functional units. Sometimes two, three, or more adjacent domains (called supra-domains) are the operational unit responsible for a function, e.g. via a binding site at the interface. These supra-domains have contributed to functional diversification in higher organisms. Traditionally functional ontologies have been applied to individual proteins, rather than families of related domains and supra-domains. We expect, however, to some extent functional signals can be carried by protein domains and supra-domains, and consequently used in function prediction and functional genomics.

Results

Here we present a domain-centric Gene Ontology (dcGO) perspective. We generalize a framework for automatically inferring ontological terms associated with domains and supra-domains from full-length sequence annotations. This general framework has been applied specifically to primary protein-level annotations from UniProtKB-GOA, generating GO term associations with SCOP domains and supra-domains. The resulting 'dcGO Predictor', can be used to provide functional annotation to protein sequences. The functional annotation of sequences in the Critical Assessment of Function Annotation (CAFA) has been used as a valuable opportunity to validate our method and to be assessed by the community. The functional annotation of all completely sequenced genomes has demonstrated the potential for domain-centric GO enrichment analysis to yield functional insights into newly sequenced or yet-to-be-annotated genomes. This generalized framework we have presented has also been applied to other domain classifications such as InterPro and Pfam, and other ontologies such as mammalian phenotype and disease ontology. The dcGO and its predictor are available at http://supfam.org/SUPERFAMILY/dcGO including an enrichment analysis tool.

Conclusions

As functional units, domains offer a unique perspective on function prediction regardless of whether proteins are multi-domain or single-domain. The 'dcGO Predictor' holds great promise for contributing to a domain-centric functional understanding of genomes in the next generation sequencing era.

Collapse

MAURER-STROH SEBASTIAN, GAO HE, HAN HAO, BAETEN LIES, SCHYMKOWITZ JOOST, ROUSSEAU FREDERIC, ZHANG LOUXIN, EISENHABER FRANK. MOTIF DISCOVERY WITH DATA MINING IN 3D PROTEIN STRUCTURE DATABASES: DISCOVERY, VALIDATION AND PREDICTION OF THE U-SHAPE ZINC BINDING ("HUF-ZINC") MOTIF. J Bioinform Comput Biol 2013;11:1340008. [DOI: 10.1142/s0219720013400088] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, et alRadivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I. A large-scale evaluation of computational protein function prediction. Nat Methods 2013;10:221-7. [PMID: 23353650 PMCID: PMC3584181 DOI: 10.1038/nmeth.2340] [Show More Authors] [Citation(s) in RCA: 625] [Impact Index Per Article: 52.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/10/2012] [Indexed: 01/03/2023]

Konopka BM, Nebel JC, Kotulska M. Quality assessment of protein model-structures based on structural and functional similarities. BMC Bioinformatics 2012;13:242. [PMID: 22998498 PMCID: PMC3526563 DOI: 10.1186/1471-2105-13-242] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2012] [Accepted: 09/14/2012] [Indexed: 11/10/2022] Open

Abstract

Background

Experimental determination of protein 3D structures is expensive, time consuming and sometimes impossible. A gap between number of protein structures deposited in the World Wide Protein Data Bank and the number of sequenced proteins constantly broadens. Computational modeling is deemed to be one of the ways to deal with the problem. Although protein 3D structure prediction is a difficult task, many tools are available. These tools can model it from a sequence or partial structural information, e.g. contact maps. Consequently, biologists have the ability to generate automatically a putative 3D structure model of any protein. However, the main issue becomes evaluation of the model quality, which is one of the most important challenges of structural biology.

Results

GOBA - Gene Ontology-Based Assessment is a novel Protein Model Quality Assessment Program. It estimates the compatibility between a model-structure and its expected function. GOBA is based on the assumption that a high quality model is expected to be structurally similar to proteins functionally similar to the prediction target. Whereas DALI is used to measure structure similarity, protein functional similarity is quantified using standardized and hierarchical description of proteins provided by Gene Ontology combined with Wang's algorithm for calculating semantic similarity. Two approaches are proposed to express the quality of protein model-structures. One is a single model quality assessment method, the other is its modification, which provides a relative measure of model quality. Exhaustive evaluation is performed on data sets of model-structures submitted to the CASP8 and CASP9 contests.

Conclusions

The validation shows that the method is able to discriminate between good and bad model-structures. The best of tested GOBA scores achieved 0.74 and 0.8 as a mean Pearson correlation to the observed quality of models in our CASP8 and CASP9-based validation sets. GOBA also obtained the best result for two targets of CASP8, and one of CASP9, compared to the contest participants. Consequently, GOBA offers a novel single model quality assessment program that addresses the practical needs of biologists. In conjunction with other Model Quality Assessment Programs (MQAPs), it would prove useful for the evaluation of single protein models.

Collapse

MUC16/CA125 in the context of modular proteins with an annotated role in adhesion-related processes: in silico analysis. Int J Mol Sci 2012;13:10387-10400. [PMID: 22949868 PMCID: PMC3431866 DOI: 10.3390/ijms130810387] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Revised: 07/23/2012] [Accepted: 08/09/2012] [Indexed: 11/25/2022] Open

Capriotti E, Nehrt NL, Kann MG, Bromberg Y. Bioinformatics for personal genome interpretation. Brief Bioinform 2012;13:495-512. [PMID: 22247263 PMCID: PMC3404395 DOI: 10.1093/bib/bbr070] [Citation(s) in RCA: 56] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2011] [Revised: 11/08/2011] [Indexed: 01/02/2023] Open

Klie S, Nikoloski Z. The Choice between MapMan and Gene Ontology for Automated Gene Function Prediction in Plant Science. Front Genet 2012;3:115. [PMID: 22754563 PMCID: PMC3384976 DOI: 10.3389/fgene.2012.00115] [Citation(s) in RCA: 55] [Impact Index Per Article: 4.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 06/05/2012] [Indexed: 12/23/2022] Open

Pires DEV, de Melo-Minardi RC, dos Santos MA, da Silveira CH, Santoro MM, Meira W. Cutoff Scanning Matrix (CSM): structural classification and function prediction by protein inter-residue distance patterns. BMC Genomics 2011;12 Suppl 4:S12. [PMID: 22369665 PMCID: PMC3287581 DOI: 10.1186/1471-2164-12-s4-s12] [Citation(s) in RCA: 46] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022] Open

Abstract

Background

The unforgiving pace of growth of available biological data has increased the demand for efficient and scalable paradigms, models and methodologies for automatic annotation. In this paper, we present a novel structure-based protein function prediction and structural classification method: Cutoff Scanning Matrix (CSM). CSM generates feature vectors that represent distance patterns between protein residues. These feature vectors are then used as evidence for classification. Singular value decomposition is used as a preprocessing step to reduce dimensionality and noise. The aspect of protein function considered in the present work is enzyme activity. A series of experiments was performed on datasets based on Enzyme Commission (EC) numbers and mechanistically different enzyme superfamilies as well as other datasets derived from SCOP release 1.75.

Results

CSM was able to achieve a precision of up to 99% after SVD preprocessing for a database derived from manually curated protein superfamilies and up to 95% for a dataset of the 950 most-populated EC numbers. Moreover, we conducted experiments to verify our ability to assign SCOP class, superfamily, family and fold to protein domains. An experiment using the whole set of domains found in last SCOP version yielded high levels of precision and recall (up to 95%). Finally, we compared our structural classification results with those in the literature to place this work into context. Our method was capable of significantly improving the recall of a previous study while preserving a compatible precision level.

Conclusions

We showed that the patterns derived from CSMs could effectively be used to predict protein function and thus help with automatic function annotation. We also demonstrated that our method is effective in structural classification tasks. These facts reinforce the idea that the pattern of inter-residue distances is an important component of family structural signatures. Furthermore, singular value decomposition provided a consistent increase in precision and recall, which makes it an important preprocessing step when dealing with noisy data.

Collapse

The roles and evolutionary patterns of intronless genes in deuterostomes. Comp Funct Genomics 2011;2011:680673. [PMID: 21860604 PMCID: PMC3155783 DOI: 10.1155/2011/680673] [Citation(s) in RCA: 40] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2010] [Revised: 04/13/2011] [Accepted: 06/22/2011] [Indexed: 12/26/2022] Open

Clark WT, Radivojac P. Analysis of protein function and its prediction from amino acid sequence. Proteins 2011;79:2086-96. [PMID: 21671271 DOI: 10.1002/prot.23029] [Citation(s) in RCA: 93] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2010] [Revised: 02/15/2011] [Accepted: 03/03/2011] [Indexed: 01/02/2023]

Flores CL, Gancedo C. Unraveling moonlighting functions with yeasts. IUBMB Life 2011;63:457-62. [PMID: 21491559 DOI: 10.1002/iub.454] [Citation(s) in RCA: 23] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/11/2011] [Accepted: 02/22/2011] [Indexed: 01/21/2023]

Pritchard L, Birch P. A systems biology perspective on plant-microbe interactions: biochemical and structural targets of pathogen effectors. PLANT SCIENCE : AN INTERNATIONAL JOURNAL OF EXPERIMENTAL PLANT BIOLOGY 2011;180:584-603. [PMID: 21421407 DOI: 10.1016/j.plantsci.2010.12.008] [Citation(s) in RCA: 32] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/01/2010] [Revised: 12/13/2010] [Accepted: 12/15/2010] [Indexed: 05/22/2023]

Sendiña-Nadal I, Ofran Y, Almendral JA, Buldú JM, Leyva I, Li D, Havlin S, Boccaletti S. Unveiling protein functions through the dynamics of the interaction network. PLoS One 2011;6:e17679. [PMID: 21408013 PMCID: PMC3052369 DOI: 10.1371/journal.pone.0017679] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2010] [Accepted: 02/05/2011] [Indexed: 01/02/2023] Open

Jaeger S, Sers CT, Leser U. Combining modularity, conservation, and interactions of proteins significantly increases precision and coverage of protein function prediction. BMC Genomics 2010;11:717. [PMID: 21171995 PMCID: PMC3017542 DOI: 10.1186/1471-2164-11-717] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/06/2010] [Accepted: 12/20/2010] [Indexed: 11/10/2022] Open