Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: King OD, Foulger RE, Dwight SS, White JV, Roth FP. Predicting gene function from patterns of annotation. Genome Res 2003;13:896-904. [PMID: 12695322 PMCID: PMC430892 DOI: 10.1101/gr.440803] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

For:	King OD, Foulger RE, Dwight SS, White JV, Roth FP. Predicting gene function from patterns of annotation. Genome Res 2003;13:896-904. [PMID: 12695322 PMCID: PMC430892 DOI: 10.1101/gr.440803] [Citation(s) in RCA: 104] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]

Number

Cited by Other Article(s)

Wang S, Pan C, Sheng H, Yang M, Yang C, Feng X, Hu C, Ma Y. Construction of a molecular regulatory network related to fat deposition by multi-tissue transcriptome sequencing of Jiaxian red cattle. iScience 2023;26:108346. [PMID: 38026203 PMCID: PMC10665818 DOI: 10.1016/j.isci.2023.108346] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2023] [Revised: 09/26/2023] [Accepted: 10/23/2023] [Indexed: 12/01/2023] Open

Di Persia L, Lopez T, Arce A, Milone DH, Stegmayer G. exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:999-1008. [PMID: 35417352 DOI: 10.1109/tcbb.2022.3167245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Chen Y, Qin Y, Fu Y, Gao Z, Deng Y. Integrated Analysis of Bulk RNA-Seq and Single-Cell RNA-Seq Unravels the Influences of SARS-CoV-2 Infections to Cancer Patients. Int J Mol Sci 2022;23:ijms232415698. [PMID: 36555339 PMCID: PMC9779348 DOI: 10.3390/ijms232415698] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/22/2022] [Revised: 12/02/2022] [Accepted: 12/06/2022] [Indexed: 12/14/2022] Open

Abstract

Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a highly contagious and pathogenic coronavirus that emerged in late 2019 and caused a pandemic of respiratory illness termed as coronavirus disease 2019 (COVID-19). Cancer patients are more susceptible to SARS-CoV-2 infection. The treatment of cancer patients infected with SARS-CoV-2 is more complicated, and the patients are at risk of poor prognosis compared to other populations. Patients infected with SARS-CoV-2 are prone to rapid development of acute respiratory distress syndrome (ARDS) of which pulmonary fibrosis (PF) is considered a sequelae. Both ARDS and PF are factors that contribute to poor prognosis in COVID-19 patients. However, the molecular mechanisms among COVID-19, ARDS and PF in COVID-19 patients with cancer are not well-understood. In this study, the common differentially expressed genes (DEGs) between COVID-19 patients with and without cancer were identified. Based on the common DEGs, a series of analyses were performed, including Gene Ontology (GO) and pathway analysis, protein-protein interaction (PPI) network construction and hub gene extraction, transcription factor (TF)-DEG regulatory network construction, TF-DEG-miRNA coregulatory network construction and drug molecule identification. The candidate drug molecules (e.g., Tamibarotene CTD 00002527) obtained by this study might be helpful for effective therapeutic targets in COVID-19 patients with cancer. In addition, the common DEGs among ARDS, PF and COVID-19 patients with and without cancer are TNFSF10 and IFITM2. These two genes may serve as potential therapeutic targets in the treatment of COVID-19 patients with cancer. Changes in the expression levels of TNFSF10 and IFITM2 in CD14+/CD16+ monocytes may affect the immune response of COVID-19 patients. Specifically, changes in the expression level of TNFSF10 in monocytes can be considered as an immune signature in COVID-19 patients with hematologic cancer. Targeting N⁶-methyladenosine (m6A) pathways (e.g., METTL3/SERPINA1 axis) to restrict SARS-CoV-2 reproduction has therapeutic potential for COVID-19 patients.

Collapse

Processes in DNA damage response from a whole-cell multi-omics perspective. iScience 2022;25:105341. [PMID: 36339253 PMCID: PMC9633746 DOI: 10.1016/j.isci.2022.105341] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2021] [Revised: 08/10/2022] [Accepted: 10/10/2022] [Indexed: 11/09/2022] Open

Fung KW, Xu J, Ameye F, Burelle L, MacNeil J. Evaluation of the International Classification of Health Interventions (ICHI) in the coding of common surgical procedures. J Am Med Inform Assoc 2021;29:43-51. [PMID: 34643710 DOI: 10.1093/jamia/ocab220] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2021] [Accepted: 10/27/2021] [Indexed: 11/12/2022] Open

Abstract

OBJECTIVE

: To evaluate the International Classification of Health Interventions (ICHI) in the clinical and statistical use cases.

MATERIALS AND METHODS

: We identified 300 most-performed surgical procedures as represented by their display names in an electronic health record. For comparison with existing coding systems, we coded the procedures in ICHI, SNOMED CT, International Classification of Diseases (ICD)-10-PCS, and CCI (Canadian Classification of Health Interventions), using postcoordination (modification of existing codes by adding other codes), when applicable. Failure analysis was done for cases where full representation was not achieved. The ICHI encoding was further evaluated for adequacy to support statistical reporting by the Organisation for Economic Co-operation and Development (OECD) and European Union (EU) categories of surgical procedures.

RESULTS

: After deduplication, 229 distinct procedures remained. Without postcoordination, ICHI achieved full representation in 52.8%. A further 19.2% could be fully represented with postcoordination. SNOMED CT was the best performing overall, with 94.3% full representation without postcoordination, and 99.6% with postcoordination. Failure analysis showed that "method" and "target" constituted most of the missing information for ICHI encoding. For all OECD/EU surgical categories, ICHI coding was adequate to support statistical reporting. One OECD/EU category ("Hip replacement, secondary") required postcoordination for correct assignment.

CONCLUSION

: In the clinical use case of capturing information in the electronic health record, ICHI was outperformed by the clinically oriented procedure coding systems (SNOMED CT and CCI), but was comparable to ICD-10-PCS. Postcoordination could be an effective and efficient means of improving coverage. ICHI is generally adequate for the collection of international statistics.

Collapse

Zhao Y, Wang J, Guo M, Zhang X, Yu G. Cross-Species Protein Function Prediction with Asynchronous-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021;18:1439-1450. [PMID: 31562099 DOI: 10.1109/tcbb.2019.2943342] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]

Moro G, Masseroli M. Gene function finding through cross-organism ensemble learning. BioData Min 2021;14:14. [PMID: 33579334 PMCID: PMC7879670 DOI: 10.1186/s13040-021-00239-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/15/2020] [Accepted: 01/10/2021] [Indexed: 11/12/2022] Open

Abstract

Background

Structured biological information about genes and proteins is a valuable resource to improve discovery and understanding of complex biological processes via machine learning algorithms. Gene Ontology (GO) controlled annotations describe, in a structured form, features and functions of genes and proteins of many organisms. However, such valuable annotations are not always reliable and sometimes are incomplete, especially for rarely studied organisms. Here, we present GeFF (Gene Function Finder), a novel cross-organism ensemble learning method able to reliably predict new GO annotations of a target organism from GO annotations of another source organism evolutionarily related and better studied.

Results

Using a supervised method, GeFF predicts unknown annotations from random perturbations of existing annotations. The perturbation consists in randomly deleting a fraction of known annotations in order to produce a reduced annotation set. The key idea is to train a supervised machine learning algorithm with the reduced annotation set to predict, namely to rebuild, the original annotations. The resulting prediction model, in addition to accurately rebuilding the original known annotations for an organism from their perturbed version, also effectively predicts new unknown annotations for the organism. Moreover, the prediction model is also able to discover new unknown annotations in different target organisms without retraining.We combined our novel method with different ensemble learning approaches and compared them to each other and to an equivalent single model technique. We tested the method with five different organisms using their GO annotations: Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum. The outcomes demonstrate the effectiveness of the cross-organism ensemble approach, which can be customized with a trade-off between the desired number of predicted new annotations and their precision.A Web application to browse both input annotations used and predicted ones, choosing the ensemble prediction method to use, is publicly available at http://tiny.cc/geff/.

Conclusions

Our novel cross-organism ensemble learning method provides reliable predicted novel gene annotations, i.e., functions, ranked according to an associated likelihood value. They are very valuable both to speed the annotation curation, focusing it on the prioritized new annotations predicted, and to complement known annotations available.

Collapse

Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020;11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open

Cohen I, David E(O, Netanyahu NS. Supervised and Unsupervised End-to-End Deep Learning for Gene Ontology Classification of Neural In Situ Hybridization Images. ENTROPY 2019;21:e21030221. [PMID: 33266936 PMCID: PMC7514702 DOI: 10.3390/e21030221] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/27/2018] [Revised: 10/22/2018] [Accepted: 12/19/2018] [Indexed: 11/16/2022]

Hadarovich A, Anishchenko I, Tuzikov AV, Kundrotas PJ, Vakser IA. Gene ontology improves template selection in comparative protein docking. Proteins 2018;87:245-253. [PMID: 30520123 DOI: 10.1002/prot.25645] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 10/21/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023]

Zhao Y, Fu G, Wang J, Guo M, Yu G. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing. Genomics 2018;111:334-342. [PMID: 29477548 DOI: 10.1016/j.ygeno.2018.02.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/02/2018] [Accepted: 02/16/2018] [Indexed: 12/27/2022]

Protein Function Prediction Using Deep Restricted Boltzmann Machines. BIOMED RESEARCH INTERNATIONAL 2017;2017:1729301. [PMID: 28744460 PMCID: PMC5506480 DOI: 10.1155/2017/1729301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 05/30/2017] [Indexed: 11/17/2022]

Fortelny N, Butler GS, Overall CM, Pavlidis P. Protease-Inhibitor Interaction Predictions: Lessons on the Complexity of Protein-Protein Interactions. Mol Cell Proteomics 2017;16:1038-1051. [PMID: 28385878 PMCID: PMC5461536 DOI: 10.1074/mcp.m116.065706] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/21/2016] [Revised: 03/24/2017] [Indexed: 01/18/2023] Open

Abstract

Protein interactions shape proteome function and thus biology. Identification of protein interactions is a major goal in molecular biology, but biochemical methods, although improving, remain limited in coverage and accuracy. Whereas computational predictions can guide biochemical experiments, low validation rates of predictions remain a major limitation. Here, we investigated computational methods in the prediction of a specific type of interaction, the inhibitory interactions between proteases and their inhibitors. Proteases generate thousands of proteoforms that dynamically shape the functional state of proteomes. Despite the important regulatory role of proteases, knowledge of their inhibitors remains largely incomplete with the vast majority of proteases lacking an annotated inhibitor. To link inhibitors to their target proteases on a large scale, we applied computational methods to predict inhibitory interactions between proteases and their inhibitors based on complementary data, including coexpression, phylogenetic similarity, structural information, co-annotation, and colocalization, and also surveyed general protein interaction networks for potential inhibitory interactions. In testing nine predicted interactions biochemically, we validated the inhibition of kallikrein 5 by serpin B12. Despite the use of a wide array of complementary data, we found a high false positive rate of computational predictions in biochemical follow-up. Based on a protease-specific definition of true negatives derived from the biochemical classification of proteases and inhibitors, we analyzed prediction accuracy of individual features, thereby we identified feature-specific limitations, which also affected general protein interaction prediction methods. Interestingly, proteases were often not coexpressed with most of their functional inhibitors, contrary to what is commonly assumed and extrapolated predominantly from cell culture experiments. Predictions of inhibitory interactions were indeed more challenging than predictions of nonproteolytic and noninhibitory interactions. In summary, we describe a novel and well-defined but difficult protein interaction prediction task and thereby highlight limitations of computational interaction prediction methods.

Collapse

Jiang B, Kloster K, Gleich DF, Gribskov M. AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs. Bioinformatics 2017;33:1829-1836. [PMID: 28200073 DOI: 10.1093/bioinformatics/btx029] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 02/14/2017] [Indexed: 11/15/2022] Open

Yu G, Luo W, Fu G, Wang J. Interspecies gene function prediction using semantic similarity. BMC SYSTEMS BIOLOGY 2016;10:121. [PMID: 28155711 PMCID: PMC5260010 DOI: 10.1186/s12918-016-0361-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

BACKGROUND

Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.

RESULTS

Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other.

CONCLUSIONS

Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.

Collapse

Domeniconi G, Masseroli M, Moro G, Pinoli P. Cross-organism learning method to discover new gene functionalities. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2016;126:20-34. [PMID: 26724853 DOI: 10.1016/j.cmpb.2015.12.002] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/04/2015] [Revised: 11/16/2015] [Accepted: 12/08/2015] [Indexed: 06/05/2023]

Abstract

BACKGROUND

Knowledge of gene and protein functions is paramount for the understanding of physiological and pathological biological processes, as well as in the development of new drugs and therapies. Analyses for biomedical knowledge discovery greatly benefit from the availability of gene and protein functional feature descriptions expressed through controlled terminologies and ontologies, i.e., of gene and protein biomedical controlled annotations. In the last years, several databases of such annotations have become available; yet, these valuable annotations are incomplete, include errors and only some of them represent highly reliable human curated information. Computational techniques able to reliably predict new gene or protein annotations with an associated likelihood value are thus paramount.

METHODS

Here, we propose a novel cross-organisms learning approach to reliably predict new functionalities for the genes of an organism based on the known controlled annotations of the genes of another, evolutionarily related and better studied, organism. We leverage a new representation of the annotation discovery problem and a random perturbation of the available controlled annotations to allow the application of supervised algorithms to predict with good accuracy unknown gene annotations. Taking advantage of the numerous gene annotations available for a well-studied organism, our cross-organisms learning method creates and trains better prediction models, which can then be applied to predict new gene annotations of a target organism.

RESULTS

We tested and compared our method with the equivalent single organism approach on different gene annotation datasets of five evolutionarily related organisms (Homo sapiens, Mus musculus, Bos taurus, Gallus gallus and Dictyostelium discoideum). Results show both the usefulness of the perturbation method of available annotations for better prediction model training and a great improvement of the cross-organism models with respect to the single-organism ones, without influence of the evolutionary distance between the considered organisms. The generated ranked lists of reliably predicted annotations, which describe novel gene functionalities and have an associated likelihood value, are very valuable both to complement available annotations, for better coverage in biomedical knowledge discovery analyses, and to quicken the annotation curation process, by focusing it on the prioritized novel annotations predicted.

Collapse

Chicco D, Masseroli M. Ontology-Based Prediction and Prioritization of Gene Functional Annotations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:248-260. [PMID: 27045825 DOI: 10.1109/tcbb.2015.2459694] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Yu G, Fu G, Wang J, Zhu H. Predicting Protein Function via Semantic Integration of Multiple Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:220-232. [PMID: 26800544 DOI: 10.1109/tcbb.2015.2459713] [Citation(s) in RCA: 25] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Masseroli M, Canakoglu A, Ceri S. Integration and Querying of Genomic and Proteomic Semantic Annotations for Biomedical Knowledge Extraction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:209-219. [PMID: 27045824 DOI: 10.1109/tcbb.2015.2453944] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Glass K, Girvan M. Finding New Order in Biological Functions from the Network Structure of Gene Annotations. PLoS Comput Biol 2015;11:e1004565. [PMID: 26588252 PMCID: PMC4654495 DOI: 10.1371/journal.pcbi.1004565] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2015] [Accepted: 09/23/2015] [Indexed: 11/19/2022] Open

Yang W, Dierking K, Schulenburg H. WormExp: a web-based application for a Caenorhabditis elegans-specific gene expression enrichment analysis. Bioinformatics 2015;32:943-5. [DOI: 10.1093/bioinformatics/btv667] [Citation(s) in RCA: 65] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2015] [Accepted: 11/06/2015] [Indexed: 11/13/2022] Open

Jennen DGJ, van Leeuwen DM, Hendrickx DM, Gottschalk RWH, van Delft JHM, Kleinjans JCS. Bayesian Network Inference Enables Unbiased Phenotypic Anchoring of Transcriptomic Responses to Cigarette Smoke in Humans. Chem Res Toxicol 2015;28:1936-48. [PMID: 26360787 DOI: 10.1021/acs.chemrestox.5b00145] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]

Abstract

Microarray-based transcriptomic analysis has been demonstrated to hold the opportunity to study the effects of human exposure to, e.g., chemical carcinogens at the whole genome level, thus yielding broad-ranging molecular information on possible carcinogenic effects. Since genes do not operate individually but rather through concerted interactions, analyzing and visualizing networks of genes should provide important mechanistic information, especially upon connecting them to functional parameters, such as those derived from measurements of biomarkers for exposure and carcinogenic risk. Conventional methods such as hierarchical clustering and correlation analyses are frequently used to address these complex interactions but are limited as they do not provide directional causal dependence relationships. Therefore, our aim was to apply Bayesian network inference with the purpose of phenotypic anchoring of modified gene expressions. We investigated a use case on transcriptomic responses to cigarette smoking in humans, in association with plasma cotinine levels as biomarkers of exposure and aromatic DNA-adducts in blood cells as biomarkers of carcinogenic risk. Many of the genes that appear in the Bayesian networks surrounding plasma cotinine, and to a lesser extent around aromatic DNA-adducts, hold biologically relevant functions in inducing severe adverse effects of smoking. In conclusion, this study shows that Bayesian network inference enables unbiased phenotypic anchoring of transcriptomics responses. Furthermore, in all inferred Bayesian networks several dependencies are found which point to known but also to new relationships between the expression of specific genes, cigarette smoke exposure, DNA damaging-effects, and smoking-related diseases, in particular associated with apoptosis, DNA repair, and tumor suppression, as well as with autoimmunity.

Collapse

Yu G, Zhu H, Domeniconi C, Liu J. Predicting protein function via downward random walks on a gene ontology. BMC Bioinformatics 2015;16:271. [PMID: 26310806 PMCID: PMC4551531 DOI: 10.1186/s12859-015-0713-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 08/20/2015] [Indexed: 12/24/2022] Open

Abstract

Background

High-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research bias in biology, the regularly updated functional annotation databases, i.e., the Gene Ontology (GO), are far from being complete. Given the importance of protein functions for biological studies and drug design, proteins should be more comprehensively and precisely annotated.

Results

We proposed downward Random Walks (dRW) to predict missing (or new) functions of partially annotated proteins. Particularly, we apply downward random walks with restart on the GO directed acyclic graph, along with the available functions of a protein, to estimate the probability of missing functions. To further boost the prediction accuracy, we extend dRW to dRW-kNN. dRW-kNN computes the semantic similarity between proteins based on the functional annotations of proteins; it then predicts functions based on the functions estimated by dRW, together with the functions associated with the k nearest proteins. Our proposed models can predict two kinds of missing functions: (i) the ones that are missing for a protein but associated with other proteins of interest; (ii) the ones that are not available for any protein of interest, but exist in the GO hierarchy. Experimental results on the proteins of Yeast and Human show that dRW and dRW-kNN can replenish functions more accurately than other related approaches, especially for sparse functions associated with no more than 10 proteins.

Conclusion

The empirical study shows that the semantic similarity between GO terms and the ontology hierarchy play important roles in predicting protein function. The proposed dRW and dRW-kNN can serve as tools for replenishing functions of partially annotated proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0713-y) contains supplementary material, which is available to authorized users.

Collapse

Foulger RE, Osumi-Sutherland D, McIntosh BK, Hulo C, Masson P, Poux S, Le Mercier P, Lomax J. Representing virus-host interactions and other multi-organism processes in the Gene Ontology. BMC Microbiol 2015;15:146. [PMID: 26215368 PMCID: PMC4517558 DOI: 10.1186/s12866-015-0481-x] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2014] [Accepted: 07/10/2015] [Indexed: 01/25/2023] Open

Masseroli M, Canakoglu A, Quigliatti M. Detection of gene annotations and protein-protein interaction associated disorders through transitive relationships between integrated annotations. BMC Genomics 2015;16:S5. [PMID: 26046679 PMCID: PMC4460591 DOI: 10.1186/1471-2164-16-s6-s5] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022] Open

Abstract

Background

Increasingly high amounts of heterogeneous and valuable controlled biomolecular annotations are available, but far from exhaustive and scattered in many databases. Several annotation integration and prediction approaches have been proposed, but these issues are still unsolved. We previously created a Genomic and Proteomic Knowledge Base (GPKB) that efficiently integrates many distributed biomolecular annotation and interaction data of several organisms, including 32,956,102 gene annotations, 273,522,470 protein annotations and 277,095 protein-protein interactions (PPIs).

Results

By comprehensively leveraging transitive relationships defined by the numerous association data integrated in GPKB, we developed a software procedure that effectively detects and supplement consistent biomolecular annotations not present in the integrated sources. According to some defined logic rules, it does so only when the semantic type of data and of their relationships, as well as the cardinality of the relationships, allow identifying molecular biology compliant annotations. Thanks to controlled consistency and quality enforced on data integrated in GPKB, and to the procedures used to avoid error propagation during their automatic processing, we could reliably identify many annotations, which we integrated in GPKB. They comprise 3,144 gene to pathway and 21,942 gene to biological function annotations of many organisms, and 1,027 candidate associations between 317 genetic disorders and 782 human PPIs. Overall estimated recall and precision of our approach were 90.56 % and 96.61 %, respectively. Co-functional evaluation of genes with known function showed high functional similarity between genes with new detected and known annotation to the same pathway; considering also the new detected gene functional annotations enhanced such functional similarity, which resembled the one existing between genes known to be annotated to the same pathway. Strong evidence was also found in the literature for the candidate associations detected between Cystic fibrosis disorder and the PPIs between the CFTR_HUMAN, DERL1_HUMAN, RNF5_HUMAN, AHSA1_HUMAN and GOPC_HUMAN proteins, and between the CHIP_HUMAN and HSP7C_HUMAN proteins.

Conclusions

Although identified gene annotations and PPI-genetic disorder candidate associations require biological validation, our approach intrinsically provides their in silico evidence based on available data. Public availability within the GPKB (http://www.bioinformatics.deib.polimi.it/GPKB/) of all identified and integrated annotations offers a valuable resource fostering new biomedical-molecular knowledge discoveries.

Collapse

Computational algorithms to predict Gene Ontology annotations. BMC Bioinformatics 2015;16 Suppl 6:S4. [PMID: 25916950 PMCID: PMC4416163 DOI: 10.1186/1471-2105-16-s6-s4] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open

Abstract

Background

Gene function annotations, which are associations between a gene and a term of a controlled vocabulary describing gene functional features, are of paramount importance in modern biology. Datasets of these annotations, such as the ones provided by the Gene Ontology Consortium, are used to design novel biological experiments and interpret their results. Despite their importance, these sources of information have some known issues. They are incomplete, since biological knowledge is far from being definitive and it rapidly evolves, and some erroneous annotations may be present. Since the curation process of novel annotations is a costly procedure, both in economical and time terms, computational tools that can reliably predict likely annotations, and thus quicken the discovery of new gene annotations, are very useful.

Methods

We used a set of computational algorithms and weighting schemes to infer novel gene annotations from a set of known ones. We used the latent semantic analysis approach, implementing two popular algorithms (Latent Semantic Indexing and Probabilistic Latent Semantic Analysis) and propose a novel method, the Semantic IMproved Latent Semantic Analysis, which adds a clustering step on the set of considered genes. Furthermore, we propose the improvement of these algorithms by weighting the annotations in the input set.

Results

We tested our methods and their weighted variants on the Gene Ontology annotation sets of three model organism genes (Bos taurus, Danio rerio and Drosophila melanogaster ). The methods showed their ability in predicting novel gene annotations and the weighting procedures demonstrated to lead to a valuable improvement, although the obtained results vary according to the dimension of the input annotation set and the considered algorithm.

Conclusions

Out of the three considered methods, the Semantic IMproved Latent Semantic Analysis is the one that provides better results. In particular, when coupled with a proper weighting policy, it is able to predict a significant number of novel annotations, demonstrating to actually be a helpful tool in supporting scientists in the curation process of gene functional annotations.

Collapse

Youngs N, Penfold-Brown D, Bonneau R, Shasha D. Negative example selection for protein function prediction: the NoGO database. PLoS Comput Biol 2014;10:e1003644. [PMID: 24922051 PMCID: PMC4055410 DOI: 10.1371/journal.pcbi.1003644] [Citation(s) in RCA: 29] [Impact Index Per Article: 2.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2013] [Accepted: 04/08/2014] [Indexed: 12/28/2022] Open

Abstract

Negative examples – genes that are known not to carry out a given protein function – are rarely recorded in genome and proteome annotation databases, such as the Gene Ontology database. Negative examples are required, however, for several of the most powerful machine learning methods for integrative protein function prediction. Most protein function prediction efforts have relied on a variety of heuristics for the choice of negative examples. Determining the accuracy of methods for negative example prediction is itself a non-trivial task, given that the Open World Assumption as applied to gene annotations rules out many traditional validation metrics. We present a rigorous comparison of these heuristics, utilizing a temporal holdout, and a novel evaluation strategy for negative examples. We add to this comparison several algorithms adapted from Positive-Unlabeled learning scenarios in text-classification, which are the current state of the art methods for generating negative examples in low-density annotation contexts. Lastly, we present two novel algorithms of our own construction, one based on empirical conditional probability, and the other using topic modeling applied to genes and annotations. We demonstrate that our algorithms achieve significantly fewer incorrect negative example predictions than the current state of the art, using multiple benchmarks covering multiple organisms. Our methods may be applied to generate negative examples for any type of method that deals with protein function, and to this end we provide a database of negative examples in several well-studied organisms, for general use (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

Many machine learning methods have been applied to the task of predicting the biological function of proteins based on a variety of available data. The majority of these methods require negative examples: proteins that are known not to perform a function, in order to achieve meaningful predictions, but negative examples are often not available. In addition, past heuristic methods for negative example selection suffer from a high error rate. Here, we rigorously compare two novel algorithms against past heuristics, as well as some algorithms adapted from a similar task in text-classification. Through this comparison, performed on several different benchmarks, we demonstrate that our algorithms make significantly fewer mistakes when predicting negative examples. We also provide a database of negative examples for general use in machine learning for protein function prediction (The NoGO database, available at: bonneaulab.bio.nyu.edu/nogo.html).

Collapse

Kuppuswamy U, Ananthasubramanian S, Wang Y, Balakrishnan N, Ganapathiraju MK. Predicting gene ontology annotations of orphan GWAS genes using protein-protein interactions. Algorithms Mol Biol 2014;9:10. [PMID: 24708602 PMCID: PMC4124845 DOI: 10.1186/1748-7188-9-10] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2013] [Accepted: 03/11/2014] [Indexed: 01/30/2023] Open

Abstract

Background

The number of genome-wide association studies (GWAS) has increased rapidly in the past couple of years, resulting in the identification of genes associated with different diseases. The next step in translating these findings into biomedically useful information is to find out the mechanism of the action of these genes. However, GWAS studies often implicate genes whose functions are currently unknown; for example, MYEOV, ANKLE1, TMEM45B and ORAOV1 are found to be associated with breast cancer, but their molecular function is unknown.

Results

We carried out Bayesian inference of Gene Ontology (GO) term annotations of genes by employing the directed acyclic graph structure of GO and the network of protein-protein interactions (PPIs). The approach is designed based on the fact that two proteins that interact biophysically would be in physical proximity of each other, would possess complementary molecular function, and play role in related biological processes. Predicted GO terms were ranked according to their relative association scores and the approach was evaluated quantitatively by plotting the precision versus recall values and F-scores (the harmonic mean of precision and recall) versus varying thresholds. Precisions of ~58% and ~ 40% for localization and functions respectively of proteins were determined at a threshold of ~30 (top 30 GO terms in the ranked list). Comparison with function prediction based on semantic similarity among nodes in an ontology and incorporation of those similarities in a k-nearest neighbor classifier confirmed that our results compared favorably.

Conclusions

This approach was applied to predict the cellular component and molecular function GO terms of all human proteins that have interacting partners possessing at least one known GO annotation. The list of predictions is available at http://severus.dbmi.pitt.edu/engo/GOPRED.html. We present the algorithm, evaluations and the results of the computational predictions, especially for genes identified in GWAS studies to be associated with diseases, which are of translational interest.

Collapse

Glass K, Girvan M. Annotation enrichment analysis: an alternative method for evaluating the functional properties of gene sets. Sci Rep 2014;4:4191. [PMID: 24569707 PMCID: PMC3935204 DOI: 10.1038/srep04191] [Citation(s) in RCA: 43] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/16/2013] [Accepted: 01/28/2014] [Indexed: 12/18/2022] Open

Suárez-Obando F, Camacho Sánchez J. [Standards in Medical Informatics: Fundamentals and Applications]. REVISTA COLOMBIANA DE PSIQUIATRIA 2013;42:295-302. [PMID: 26572951 DOI: 10.1016/s0034-7450(13)70023-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/05/2012] [Accepted: 01/23/2013] [Indexed: 06/05/2023]

Genome-wide gene expression profiling of stress response in a spinal cord clip compression injury model. BMC Genomics 2013;14:583. [PMID: 23984903 PMCID: PMC3846681 DOI: 10.1186/1471-2164-14-583] [Citation(s) in RCA: 39] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2013] [Accepted: 08/13/2013] [Indexed: 12/23/2022] Open

Abstract

Background

The aneurysm clip impact-compression model of spinal cord injury (SCI) is a standard injury model in animals that closely mimics the primary mechanism of most human injuries: acute impact and persisting compression. Its histo-pathological and behavioural outcomes are extensively similar to human SCI. To understand the distinct molecular events underlying this injury model we analyzed global mRNA abundance changes during the acute, subacute and chronic stages of a moderate to severe injury to the rat spinal cord.

Results

Time-series expression analyses resulted in clustering of the majority of deregulated transcripts into eight statistically significant expression profiles. Systematic application of Gene Ontology (GO) enrichment pathway analysis allowed inference of biological processes participating in SCI pathology. Temporal analysis identified events specific to and common between acute, subacute and chronic time-points. Processes common to all phases of injury include blood coagulation, cellular extravasation, leukocyte cell-cell adhesion, the integrin-mediated signaling pathway, cytokine production and secretion, neutrophil chemotaxis, phagocytosis, response to hypoxia and reactive oxygen species, angiogenesis, apoptosis, inflammatory processes and ossification. Importantly, various elements of adaptive and induced innate immune responses span, not only the acute and subacute phases, but also persist throughout the chronic phase of SCI. Induced innate responses, such as Toll-like receptor signaling, are more active during the acute phase but persist throughout the chronic phase. However, adaptive immune response processes such as B and T cell activation, proliferation, and migration, T cell differentiation, B and T cell receptor-mediated signaling, and B cell- and immunoglobulin-mediated immune response become more significant during the chronic phase.

Conclusions

This analysis showed that, surprisingly, the diverse series of molecular events that occur in the acute and subacute stages persist into the chronic stage of SCI. The strong agreement between our results and previous findings suggest that our analytical approach will be useful in revealing other biological processes and genes contributing to SCI pathology.

Collapse

Hodgins KA, Lai Z, Nurkowski K, Huang J, Rieseberg LH. The molecular basis of invasiveness: differences in gene expression of native and introduced common ragweed (Ambrosia artemisiifolia) in stressful and benign environments. Mol Ecol 2013;22:2496-510. [PMID: 23294156 DOI: 10.1111/mec.12179] [Citation(s) in RCA: 48] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2012] [Revised: 11/14/2012] [Accepted: 11/21/2012] [Indexed: 11/28/2022]

Zhang XF, Dai DQ. A framework for incorporating functional interrelationships into protein function prediction algorithms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2012;9:740-753. [PMID: 22084148 DOI: 10.1109/tcbb.2011.148] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/31/2023]

Abbaraju NV, Boutaghou MN, Townley IK, Zhang Q, Wang G, Cole RB, Rees BB. Analysis of tissue proteomes of the Gulf killifish, Fundulus grandis, by 2D electrophoresis and MALDI-TOF/TOF mass spectrometry. Integr Comp Biol 2012;52:626-35. [PMID: 22537935 DOI: 10.1093/icb/ics063] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/02/2023] Open

Iacucci E, Zingg HH, Perkins TJ. Methods for Determining the Statistical Significance of Enrichment or Depletion of Gene Ontology Classifications under Weighted Membership. Front Genet 2012;3:24. [PMID: 22375144 PMCID: PMC3284693 DOI: 10.3389/fgene.2012.00024] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2011] [Accepted: 02/06/2012] [Indexed: 11/20/2022] Open

Abstract

High-throughput molecular biology studies, such as microarray assays of gene expression, two-hybrid experiments for detecting protein interactions, or ChIP-Seq experiments for transcription factor binding, often result in an “interesting” set of genes – say, genes that are co-expressed or bound by the same factor. One way of understanding the biological meaning of such a set is to consider what processes or functions, as defined in an ontology, are over-represented (enriched) or under-represented (depleted) among genes in the set. Usually, the significance of enrichment or depletion scores is based on simple statistical models and on the membership of genes in different classifications. We consider the more general problem of computing p-values for arbitrary integer additive statistics, or weighted membership functions. Such membership functions can be used to represent, for example, prior knowledge on the role of certain genes or classifications, differential importance of different classifications or genes to the experimenter, hierarchical relationships between classifications, or different degrees of interestingness or evidence for specific genes. We describe a generic dynamic programming algorithm that can compute exact p-values for arbitrary integer additive statistics. We also describe several optimizations for important special cases, which can provide orders-of-magnitude speed up in the computations. We apply our methods to datasets describing oxidative phosphorylation and parturition and compare p-values based on computations of several different statistics for measuring enrichment. We find major differences between p-values resulting from these statistics, and that some statistics recover “gold standard” annotations of the data better than others. Our work establishes a theoretical and algorithmic basis for far richer notions of enrichment or depletion of gene sets with respect to gene ontologies than has previously been available.

Collapse

Glass K, Ott E, Losert W, Girvan M. Implications of functional similarity for gene regulatory interactions. J R Soc Interface 2012;9:1625-36. [PMID: 22298814 DOI: 10.1098/rsif.2011.0585] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open

Abstract

If one gene regulates another, those two genes are likely to be involved in many of the same biological functions. Conversely, shared biological function may be suggestive of the existence and nature of a regulatory interaction. With this in mind, we develop a measure of functional similarity between genes based on annotations made to the Gene Ontology in which the magnitude of their functional relationship is also indicative of a regulatory relationship. In contrast to other measures that have previously been used to quantify the functional similarity between genes, our measure scales the strength of any shared functional annotation by the frequency of that function's appearance across the entire set of annotations. We apply our method to both Escherichia coli and Saccharomyces cerevisiae gene annotations and find that the strength of our scaled similarity measure is more predictive of known regulatory interactions than previously published measures of functional similarity. In addition, we observe that the strength of the scaled similarity measure is correlated with the structural importance of links in the known regulatory network. By contrast, other measures of functional similarity are not indicative of any structural importance in the regulatory network. We therefore conclude that adequately adjusting for the frequency of shared biological functions is important in the construction of a functional similarity measure aimed at elucidating the existence and nature of regulatory interactions. We also compare the performance of the scaled similarity with a high-throughput method for determining regulatory interactions from gene expression data and observe that the ontology-based approach identifies a different subset of regulatory interactions compared with the gene expression approach. We show that combining predictions from the scaled similarity with those from the reconstruction algorithm leads to a significant improvement in the accuracy of the reconstructed network.

Collapse

Genomic Annotation Prediction Based on Integrated Information. COMPUTATIONAL INTELLIGENCE METHODS FOR BIOINFORMATICS AND BIOSTATISTICS 2012. [DOI: 10.1007/978-3-642-35686-5_20] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/11/2023]

Quantification of protein group coherence and pathway assignment using functional association. BMC Bioinformatics 2011;12:373. [PMID: 21929787 PMCID: PMC3189934 DOI: 10.1186/1471-2105-12-373] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2011] [Accepted: 09/19/2011] [Indexed: 11/11/2022] Open

Abstract

Background

Genomics and proteomics experiments produce a large amount of data that are awaiting functional elucidation. An important step in analyzing such data is to identify functional units, which consist of proteins that play coherent roles to carry out the function. Importantly, functional coherence is not identical with functional similarity. For example, proteins in the same pathway may not share the same Gene Ontology (GO) terms, but they work in a coordinated fashion so that the aimed function can be performed. Thus, simply applying existing functional similarity measures might not be the best solution to identify functional units in omics data.

Results

We have designed two scores for quantifying the functional coherence by considering association of GO terms observed in two biological contexts, co-occurrences in protein annotations and co-mentions in literature in the PubMed database. The counted co-occurrences of GO terms were normalized in a similar fashion as the statistical amino acid contact potential is computed in the protein structure prediction field. We demonstrate that the developed scores can identify functionally coherent protein sets, i.e. proteins in the same pathways, co-localized proteins, and protein complexes, with statistically significant score values showing a better accuracy than existing functional similarity scores. The scores are also capable of detecting protein pairs that interact with each other. It is further shown that the functional coherence scores can accurately assign proteins to their respective pathways.

Conclusion

We have developed two scores which quantify the functional coherence of sets of proteins. The scores reflect the actual associations of GO terms observed either in protein annotations or in literature. It has been shown that they have the ability to accurately distinguish biologically relevant groups of proteins from random ones as well as a good discriminative power for detecting interacting pairs of proteins. The scores were further successfully applied for assigning proteins to pathways.

Collapse

Hester SD, Johnstone AF, Boyes WK, Bushnell PJ, Shafer TJ. Acute toluene exposure alters expression of genes in the central nervous system associated with synaptic structure and function. Neurotoxicol Teratol 2011;33:521-9. [DOI: 10.1016/j.ntt.2011.07.008] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2011] [Revised: 07/07/2011] [Accepted: 07/20/2011] [Indexed: 10/17/2022]

Deo RC, MacRae CA. The zebrafish: scalable in vivo modeling for systems biology. WILEY INTERDISCIPLINARY REVIEWS-SYSTEMS BIOLOGY AND MEDICINE 2010;3:335-46. [PMID: 20882534 DOI: 10.1002/wsbm.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/23/2023]

Yu H, Huang J, Qiao N, Green CD, Han JDJ. Evaluating diabetes and hypertension disease causality using mouse phenotypes. BMC SYSTEMS BIOLOGY 2010;4:97. [PMID: 20642857 PMCID: PMC2917432 DOI: 10.1186/1752-0509-4-97] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/03/2010] [Accepted: 07/20/2010] [Indexed: 01/11/2023]

Bogdanov P, Singh AK. Molecular function prediction using neighborhood features. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010;7:208-217. [PMID: 20431141 DOI: 10.1109/tcbb.2009.81] [Citation(s) in RCA: 35] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/29/2023]

Holmans P. Statistical methods for pathway analysis of genome-wide data for association with complex genetic traits. ADVANCES IN GENETICS 2010;72:141-79. [PMID: 21029852 DOI: 10.1016/b978-0-12-380862-2.00007-2] [Citation(s) in RCA: 78] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/12/2022]

Done B, Khatri P, Done A, Draghici S. Predicting novel human gene ontology annotations using semantic analysis. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2010;7:91-9. [PMID: 20150671 PMCID: PMC3712327 DOI: 10.1109/tcbb.2008.29] [Citation(s) in RCA: 16] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/16/2023]

What we can learn about Escherichia coli through application of Gene Ontology. Trends Microbiol 2009;17:269-78. [PMID: 19576778 DOI: 10.1016/j.tim.2009.04.004] [Citation(s) in RCA: 13] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2008] [Revised: 03/04/2009] [Accepted: 04/08/2009] [Indexed: 11/21/2022]

Pandey G, Myers CL, Kumar V. Incorporating functional inter-relationships into protein function prediction algorithms. BMC Bioinformatics 2009;10:142. [PMID: 19435516 PMCID: PMC2693438 DOI: 10.1186/1471-2105-10-142] [Citation(s) in RCA: 56] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2008] [Accepted: 05/12/2009] [Indexed: 11/10/2022] Open

Abstract

BACKGROUND

Functional classification schemes (e.g. the Gene Ontology) that serve as the basis for annotation efforts in several organisms are often the source of gold standard information for computational efforts at supervised protein function prediction. While successful function prediction algorithms have been developed, few previous efforts have utilized more than the protein-to-functional class label information provided by such knowledge bases. For instance, the Gene Ontology not only captures protein annotations to a set of functional classes, but it also arranges these classes in a DAG-based hierarchy that captures rich inter-relationships between different classes. These inter-relationships present both opportunities, such as the potential for additional training examples for small classes from larger related classes, and challenges, such as a harder to learn distinction between similar GO terms, for standard classification-based approaches.

RESULTS

We propose a method to enhance the performance of classification-based protein function prediction algorithms by addressing the issue of using these interrelationships between functional classes constituting functional classification schemes. Using a standard measure for evaluating the semantic similarity between nodes in an ontology, we quantify and incorporate these inter-relationships into the k-nearest neighbor classifier. We present experiments on several large genomic data sets, each of which is used for the modeling and prediction of over hundred classes from the GO Biological Process ontology. The results show that this incorporation produces more accurate predictions for a large number of the functional classes considered, and also that the classes benefitted most by this approach are those containing the fewest members. In addition, we show how our proposed framework can be used for integrating information from the entire GO hierarchy for improving the accuracy of predictions made over a set of base classes. Finally, we provide qualitative and quantitative evidence that this incorporation of functional inter-relationships enables the discovery of interesting biology in the form of novel functional annotations for several yeast proteins, such as Sna4, Rtn1 and Lin1.

CONCLUSION

We implemented and evaluated a methodology for incorporating interrelationships between functional classes into a standard classification-based protein function prediction algorithm. Our results show that this incorporation can help improve the accuracy of such algorithms, and help uncover novel biology in the form of previously unknown functional annotations. The complete source code, a sample data set and the additional files for this paper are available free of charge for non-commercial use at http://www.cs.umn.edu/vk/gaurav/functionalsimilarity/.

Collapse

Fontana P, Cestaro A, Velasco R, Formentin E, Toppo S. Rapid annotation of anonymous sequences from genome projects using semantic similarities and a weighting scheme in gene ontology. PLoS One 2009;4:e4619. [PMID: 19247487 PMCID: PMC2645684 DOI: 10.1371/journal.pone.0004619] [Citation(s) in RCA: 29] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2008] [Accepted: 01/09/2009] [Indexed: 11/22/2022] Open

Sam LT, Mendonça EA, Li J, Blake J, Friedman C, Lussier YA. PhenoGO: an integrated resource for the multiscale mining of clinical and biological data. BMC Bioinformatics 2009;10 Suppl 2:S8. [PMID: 19208196 PMCID: PMC2646241 DOI: 10.1186/1471-2105-10-s2-s8] [Citation(s) in RCA: 27] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open

Taher L, Ovcharenko I. Variable locus length in the human genome leads to ascertainment bias in functional inference for non-coding elements. ACTA ACUST UNITED AC 2009;25:578-84. [PMID: 19168912 PMCID: PMC2647827 DOI: 10.1093/bioinformatics/btp043] [Citation(s) in RCA: 21] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/31/2022]

Sidhu AS, Bellgard MI, Dillon TS. Classification of Information About Proteins. Bioinformatics 2009. [DOI: 10.1007/978-0-387-92738-1_12] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/30/2022] Open