1
|
Pons C. Qarles: a web server for the quick characterization of large sets of genes. NAR Genom Bioinform 2025; 7:lqaf030. [PMID: 40160219 PMCID: PMC11954521 DOI: 10.1093/nargab/lqaf030] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2024] [Revised: 03/05/2025] [Accepted: 03/14/2025] [Indexed: 04/02/2025] Open
Abstract
The characterization of gene sets is a recurring task in computational biology. Identifying specific properties of a hit set compared to a reference set can reveal biological roles and mechanisms, and can lead to the prediction of new hits. However, collecting the features to evaluate can be time consuming, and implementing an informative but compact graphical representation of the multiple comparisons can be challenging, particularly for bench scientists. Here, I present Qarles (quick characterization of large sets of genes), a web server that annotates Saccharomyces cerevisiae gene sets by querying a database of 31 features widely used by the yeast community and that identifies their specific properties, providing publication-ready figures and reliable statistics. Qarles has a deliberately simple user interface with all the functionality in a single web page and a fast response time to facilitate adoption by the scientific community. Qarles provides a rich and compact graphical output, including up to five gene set comparisons across 31 features in a single dotplot, and interactive boxplots to enable the identification of outliers. Qarles can also predict new hit genes by using a random forest trained on the selected features. The web server is freely available at https://qarles.org.
Collapse
Affiliation(s)
- Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology (BIST), 08028 Barcelona, Catalonia, Spain
| |
Collapse
|
2
|
Tasnina N, Murali TM. ICoN: integration using co-attention across biological networks. BIOINFORMATICS ADVANCES 2024; 5:vbae182. [PMID: 39801779 PMCID: PMC11723530 DOI: 10.1093/bioadv/vbae182] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 07/10/2024] [Revised: 09/24/2024] [Accepted: 11/14/2024] [Indexed: 01/16/2025]
Abstract
Motivation Molecular interaction networks are powerful tools for studying cellular functions. Integrating diverse types of networks enhances performance in downstream tasks such as gene module detection and protein function prediction. The challenge lies in extracting meaningful protein feature representations due to varying levels of sparsity and noise across these heterogeneous networks. Results We propose ICoN, a novel unsupervised graph neural network model that takes multiple protein-protein association networks as inputs and generates a feature representation for each protein that integrates the topological information from all the networks. A key contribution of ICoN is exploiting a mechanism called "co-attention" that enables cross-network communication during training. The model also incorporates a denoising training technique, introducing perturbations to each input network and training the model to reconstruct the original network from its corrupted version. Our experimental results demonstrate that ICoN surpasses individual networks across three downstream tasks: gene module detection, gene coannotation prediction, and protein function prediction. Compared to existing unsupervised network integration models, ICoN exhibits superior performance across the majority of downstream tasks and shows enhanced robustness against noise. This work introduces a promising approach for effectively integrating diverse protein-protein association networks, aiming to achieve a biologically meaningful representation of proteins. Availability and implementation The ICoN software is available under the GNU Public License v3 at https://github.com/Murali-group/ICoN.
Collapse
Affiliation(s)
- Nure Tasnina
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| | - T M Murali
- Department of Computer Science, Virginia Tech, Blacksburg, VA 24061, United States
| |
Collapse
|
3
|
Pons C, van Leeuwen J. Meta-analysis of dispensable essential genes and their interactions with bypass suppressors. Life Sci Alliance 2024; 7:e202302192. [PMID: 37918966 PMCID: PMC10622647 DOI: 10.26508/lsa.202302192] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/31/2023] [Revised: 10/24/2023] [Accepted: 10/25/2023] [Indexed: 11/04/2023] Open
Abstract
Genes have been historically classified as essential or non-essential based on their requirement for viability. However, genomic mutations can sometimes bypass the requirement for an essential gene, challenging the binary classification of gene essentiality. Such dispensable essential genes represent a valuable model for understanding the incomplete penetrance of loss-of-function mutations often observed in natural populations. Here, we compiled data from multiple studies on essential gene dispensability in Saccharomyces cerevisiae to comprehensively characterize these genes. In analyses spanning different evolutionary timescales, dispensable essential genes exhibited distinct phylogenetic properties compared with other essential and non-essential genes. Integration of interactions with suppressor genes that can bypass the gene essentiality revealed the high functional modularity of the bypass suppression network. Furthermore, dispensable essential and bypass suppressor gene pairs reflected simultaneous changes in the mutational landscape of S. cerevisiae strains. Importantly, species in which dispensable essential genes were non-essential tended to carry bypass suppressor mutations in their genomes. Overall, our study offers a comprehensive view of dispensable essential genes and illustrates how their interactions with bypass suppressors reflect evolutionary outcomes.
Collapse
Affiliation(s)
- Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, Barcelona, Spain
| | - Jolanda van Leeuwen
- Center for Integrative Genomics, Bâtiment Génopode, University of Lausanne, Lausanne, Switzerland
| |
Collapse
|
4
|
Zheng R, Huang Z, Deng L. Large-scale predicting protein functions through heterogeneous feature fusion. Brief Bioinform 2023:bbad243. [PMID: 37401369 DOI: 10.1093/bib/bbad243] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2023] [Revised: 05/18/2023] [Accepted: 06/12/2023] [Indexed: 07/05/2023] Open
Abstract
As the volume of protein sequence and structure data grows rapidly, the functions of the overwhelming majority of proteins cannot be experimentally determined. Automated annotation of protein function at a large scale is becoming increasingly important. Existing computational prediction methods are typically based on expanding the relatively small number of experimentally determined functions to large collections of proteins with various clues, including sequence homology, protein-protein interaction, gene co-expression, etc. Although there has been some progress in protein function prediction in recent years, the development of accurate and reliable solutions still has a long way to go. Here we exploit AlphaFold predicted three-dimensional structural information, together with other non-structural clues, to develop a large-scale approach termed PredGO to annotate Gene Ontology (GO) functions for proteins. We use a pre-trained language model, geometric vector perceptrons and attention mechanisms to extract heterogeneous features of proteins and fuse these features for function prediction. The computational results demonstrate that the proposed method outperforms other state-of-the-art approaches for predicting GO functions of proteins in terms of both coverage and accuracy. The improvement of coverage is because the number of structures predicted by AlphaFold is greatly increased, and on the other hand, PredGO can extensively use non-structural information for functional prediction. Moreover, we show that over 205 000 ($\sim $100%) entries in UniProt for human are annotated by PredGO, over 186 000 ($\sim $90%) of which are based on predicted structure. The webserver and database are available at http://predgo.denglab.org/.
Collapse
Affiliation(s)
- Rongtao Zheng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Zhijian Huang
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| | - Lei Deng
- School of Computer Science and Engineering, Central South University, 410000 Changsha, China
| |
Collapse
|
5
|
Akdemir D, Somo M, Isidro-Sanchéz J. An Expectation-Maximization Algorithm for Combining a Sample of Partially Overlapping Covariance Matrices. AXIOMS 2023; 12:161. [PMID: 37284612 PMCID: PMC10243021 DOI: 10.3390/axioms12020161] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
The generation of unprecedented amounts of data brings new challenges in data management, but also an opportunity to accelerate the identification of processes of multiple science disciplines. One of these challenges is the harmonization of high-dimensional unbalanced and heterogeneous data. In this manuscript, we propose a statistical approach to combine incomplete and partially-overlapping pieces of covariance matrices that come from independent experiments. We assume that the data are a random sample of partial covariance matrices sampled from Wishart distributions and we derive an expectation-maximization algorithm for parameter estimation. We demonstrate the properties of our method by (i) using simulation studies and (ii) using empirical datasets. In general, being able to make inferences about the covariance of variables not observed in the same experiment is a valuable tool for data analysis since covariance estimation is an important step in many statistical applications, such as multivariate analysis, principal component analysis, factor analysis, and structural equation modeling.
Collapse
Affiliation(s)
- Deniz Akdemir
- Center of International Bone Marrow Transplantation Research, Minneapolis, MN 55401-1206, USA
| | | | - Julio Isidro-Sanchéz
- Centro de Biotecnologia y Genómica de Plantas, Instituto Nacional de Investigación y Tecnologia Agraria y Alimentaria, Universidad Politécnica de Madrid, 28223, Madrid, Spain
| |
Collapse
|
6
|
Forster DT, Li SC, Yashiroda Y, Yoshimura M, Li Z, Isuhuaylas LAV, Itto-Nakama K, Yamanaka D, Ohya Y, Osada H, Wang B, Bader GD, Boone C. BIONIC: biological network integration using convolutions. Nat Methods 2022; 19:1250-1261. [PMID: 36192463 PMCID: PMC11236286 DOI: 10.1038/s41592-022-01616-x] [Citation(s) in RCA: 28] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2020] [Accepted: 08/16/2022] [Indexed: 01/21/2023]
Abstract
Biological networks constructed from varied data can be used to map cellular function, but each data type has limitations. Network integration promises to address these limitations by combining and automatically weighting input information to obtain a more accurate and comprehensive representation of the underlying biology. We developed a deep learning-based network integration algorithm that incorporates a graph convolutional network framework. Our method, BIONIC (Biological Network Integration using Convolutions), learns features that contain substantially more functional information compared to existing approaches. BIONIC has unsupervised and semisupervised learning modes, making use of available gene function annotations. BIONIC is scalable in both size and quantity of the input networks, making it feasible to integrate numerous networks on the scale of the human genome. To demonstrate the use of BIONIC in identifying new biology, we predicted and experimentally validated essential gene chemical-genetic interactions from nonessential gene profiles in yeast.
Collapse
Affiliation(s)
- Duncan T Forster
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada
| | - Sheena C Li
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Yoko Yashiroda
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Mami Yoshimura
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Zhijian Li
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada
| | | | - Kaori Itto-Nakama
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
| | - Daisuke Yamanaka
- Laboratory for Immunopharmacology of Microbial Products, School of Pharmacy, Tokyo University of Pharmacy and Life Sciences, Hachioji, Tokyo, Japan
| | - Yoshikazu Ohya
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, The University of Tokyo, Kashiwa, Japan
- Collaborative Research Institute for Innovative Microbiology, The University of Tokyo, Tokyo, Japan
| | - Hiroyuki Osada
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Bo Wang
- Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.
- Peter Munk Cardiac Center, University Health Network, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Ontario, Canada.
| | - Gary D Bader
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
- Department of Computer Science, University of Toronto, Toronto, Ontario, Canada.
- The Lunenfeld-Tanenbaum Research Institute, Sinai Health System, Toronto, Ontario, Canada.
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.
| | - Charles Boone
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario, Canada.
- The Donnelly Centre, University of Toronto, Toronto, Ontario, Canada.
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan.
| |
Collapse
|
7
|
Lee AJ, Reiter T, Doing G, Oh J, Hogan DA, Greene CS. Using genome-wide expression compendia to study microorganisms. Comput Struct Biotechnol J 2022; 20:4315-4324. [PMID: 36016717 PMCID: PMC9396250 DOI: 10.1016/j.csbj.2022.08.012] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/10/2022] [Revised: 08/07/2022] [Accepted: 08/07/2022] [Indexed: 11/30/2022] Open
Abstract
A gene expression compendium is a heterogeneous collection of gene expression experiments assembled from data collected for diverse purposes. The widely varied experimental conditions and genetic backgrounds across samples creates a tremendous opportunity for gaining a systems level understanding of the transcriptional responses that influence phenotypes. Variety in experimental design is particularly important for studying microbes, where the transcriptional responses integrate many signals and demonstrate plasticity across strains including response to what nutrients are available and what microbes are present. Advances in high-throughput measurement technology have made it feasible to construct compendia for many microbes. In this review we discuss how these compendia are constructed and analyzed to reveal transcriptional patterns.
Collapse
Affiliation(s)
- Alexandra J. Lee
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA, USA
| | - Taylor Reiter
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| | - Georgia Doing
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Julia Oh
- The Jackson Laboratory for Genomic Medicine, Farmington, CT, USA
| | - Deborah A. Hogan
- Department of Microbiology and Immunology, Geisel School of Medicine, Dartmouth, Hanover, NH, USA
| | - Casey S. Greene
- Department of Biochemistry and Molecular Genetics, University of Colorado School of Medicine, Denver, CO, USA
| |
Collapse
|
8
|
Mansoor M, Nauman M, Ur Rehman H, Benso A. Gene Ontology GAN (GOGAN): a novel architecture for protein function prediction. Soft comput 2022. [DOI: 10.1007/s00500-021-06707-z] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2022]
|
9
|
De Vito R, Bellio R, Trippa L, Parmigiani G. Bayesian multistudy factor analysis for high-throughput biological data. Ann Appl Stat 2021. [DOI: 10.1214/21-aoas1456] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Affiliation(s)
| | - Ruggero Bellio
- Department of Economics and Statistics, University of Udine
| | - Lorenzo Trippa
- Department of Data Science, Dana Farber Cancer Institute
| | | |
Collapse
|
10
|
Lin CX, Li HD, Deng C, Guan Y, Wang J. TissueNexus: a database of human tissue functional gene networks built with a large compendium of curated RNA-seq data. Nucleic Acids Res 2021; 50:D710-D718. [PMID: 34850130 PMCID: PMC8728275 DOI: 10.1093/nar/gkab1133] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/12/2021] [Revised: 10/10/2021] [Accepted: 11/18/2021] [Indexed: 01/02/2023] Open
Abstract
Mapping gene interactions within tissues/cell types plays a crucial role in understanding the genetic basis of human physiology and disease. Tissue functional gene networks (FGNs) are essential models for mapping complex gene interactions. We present TissueNexus, a database of 49 human tissue/cell line FGNs constructed by integrating heterogeneous genomic data. We adopted an advanced machine learning approach for data integration because Bayesian classifiers, which is the main approach used for constructing existing tissue gene networks, cannot capture the interaction and nonlinearity of genomic features well. A total of 1,341 RNA-seq datasets containing 52,087 samples were integrated for all of these networks. Because the tissue label for RNA-seq data may be annotated with different names or be missing, we performed intensive hand-curation to improve quality. We further developed a user-friendly database for network search, visualization, and functional analysis. We illustrate the application of TissueNexus in prioritizing disease genes. The database is publicly available at https://www.diseaselinks.com/TissueNexus/.
Collapse
Affiliation(s)
- Cui-Xiang Lin
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Hong-Dong Li
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Chao Deng
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Jianxin Wang
- School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, P.R. China.,Hunan Provincial Key Lab on Bioinformatics, Central South University, Changsha, Hunan 410083, P.R. China
| |
Collapse
|
11
|
Van Dyke K, Lutz S, Mekonnen G, Myers CL, Albert FW. Trans-acting genetic variation affects the expression of adjacent genes. Genetics 2021; 217:6126816. [PMID: 33789351 DOI: 10.1093/genetics/iyaa051] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2020] [Accepted: 12/16/2020] [Indexed: 11/13/2022] Open
Abstract
Gene expression differences among individuals are shaped by trans-acting expression quantitative trait loci (eQTLs). Most trans-eQTLs map to hotspot locations that influence many genes. The molecular mechanisms perturbed by hotspots are often assumed to involve "vertical" cascades of effects in pathways that can ultimately affect the expression of thousands of genes. Here, we report that trans-eQTLs can affect the expression of adjacent genes via "horizontal" mechanisms that extend along a chromosome. Genes affected by trans-eQTL hotspots in the yeast Saccharomyces cerevisiae were more likely to be located next to each other than expected by chance. These paired hotspot effects tended to occur at adjacent genes that also show coexpression in response to genetic and environmental perturbations, suggesting shared mechanisms. Physical proximity and shared chromatin state, in addition to regulation of adjacent genes by similar transcription factors, were independently associated with paired hotspot effects among adjacent genes. Paired effects of trans-eQTLs can occur at neighboring genes even when these genes do not share a common function. This phenomenon could result in unexpected connections between regulatory genetic variation and phenotypes.
Collapse
Affiliation(s)
- Krisna Van Dyke
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Sheila Lutz
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Gemechu Mekonnen
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Frank W Albert
- Department of Genetics, Cell Biology, and Development, University of Minnesota, Minneapolis, MN 55455, USA
| |
Collapse
|
12
|
Quintero E, Isla J, Jordano P. Methodological overview and data‐merging approaches in the study of plant–frugivore interactions. OIKOS 2021. [DOI: 10.1111/oik.08379] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/27/2022]
Affiliation(s)
| | - Jorge Isla
- Estación Biológica de Doñana, CSIC Sevilla Spain
| | - Pedro Jordano
- Estación Biológica de Doñana, CSIC Sevilla Spain
- Dept Biología Vegetal y Ecología, Univ. de Sevilla Sevilla Spain
| |
Collapse
|
13
|
Costanzo M, Hou J, Messier V, Nelson J, Rahman M, VanderSluis B, Wang W, Pons C, Ross C, Ušaj M, San Luis BJ, Shuteriqi E, Koch EN, Aloy P, Myers CL, Boone C, Andrews B. Environmental robustness of the global yeast genetic interaction network. Science 2021; 372:372/6542/eabf8424. [PMID: 33958448 DOI: 10.1126/science.abf8424] [Citation(s) in RCA: 29] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2020] [Accepted: 03/30/2021] [Indexed: 12/18/2022]
Abstract
Phenotypes associated with genetic variants can be altered by interactions with other genetic variants (GxG), with the environment (GxE), or both (GxGxE). Yeast genetic interactions have been mapped on a global scale, but the environmental influence on the plasticity of genetic networks has not been examined systematically. To assess environmental rewiring of genetic networks, we examined 14 diverse conditions and scored 30,000 functionally representative yeast gene pairs for dynamic, differential interactions. Different conditions revealed novel differential interactions, which often uncovered functional connections between distantly related gene pairs. However, the majority of observed genetic interactions remained unchanged in different conditions, suggesting that the global yeast genetic interaction network is robust to environmental perturbation and captures the fundamental functional architecture of a eukaryotic cell.
Collapse
Affiliation(s)
- Michael Costanzo
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Jing Hou
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Vincent Messier
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Justin Nelson
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA.,Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Mahfuzur Rahman
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA.,Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute for Science and Technology, Barcelona, Spain
| | - Catherine Ross
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Matej Ušaj
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Bryan-Joseph San Luis
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Emira Shuteriqi
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| | - Elizabeth N Koch
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), Barcelona Institute for Science and Technology, Barcelona, Spain.,Institució Catalana de Recerca I Estudis Avaçats (ICREA), Barcelona, Spain
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. .,Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Charles Boone
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada. .,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada.,RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Brenda Andrews
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada. .,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, ON M5S 3E1, Canada
| |
Collapse
|
14
|
Meldal BHM, Pons C, Perfetto L, Del-Toro N, Wong E, Aloy P, Hermjakob H, Orchard S, Porras P. Analysing the yeast complexome-the Complex Portal rising to the challenge. Nucleic Acids Res 2021; 49:3156-3167. [PMID: 33677561 PMCID: PMC8034636 DOI: 10.1093/nar/gkab077] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/03/2020] [Revised: 01/22/2021] [Accepted: 01/27/2021] [Indexed: 02/06/2023] Open
Abstract
The EMBL-EBI Complex Portal is a knowledgebase of macromolecular complexes providing persistent stable identifiers. Entries are linked to literature evidence and provide details of complex membership, function, structure and complex-specific Gene Ontology annotations. Data are freely available and downloadable in HUPO-PSI community standards and missing entries can be requested for curation. In collaboration with Saccharomyces Genome Database and UniProt, the yeast complexome, a compendium of all known heteromeric assemblies from the model organism Saccharomyces cerevisiae, was curated. This expansion of knowledge and scope has led to a 50% increase in curated complexes compared to the previously published dataset, CYC2008. The yeast complexome is used as a reference resource for the analysis of complexes from large-scale experiments. Our analysis showed that genes coding for proteins in complexes tend to have more genetic interactions, are co-expressed with more genes, are more multifunctional, localize more often in the nucleus, and are more often involved in nucleic acid-related metabolic processes and processes where large machineries are the predominant functional drivers. A comparison to genetic interactions showed that about 40% of expanded co-complex pairs also have genetic interactions, suggesting strong functional links between complex members.
Collapse
Affiliation(s)
- Birgit H M Meldal
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, 08028 Barcelona, Catalonia, Spain
| | - Livia Perfetto
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Noemi Del-Toro
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Edith Wong
- Department of Genetics, Stanford University School of Medicine, Stanford, CA 94305-5477, USA
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), The Barcelona Institute for Science and Technology, 08028 Barcelona, Catalonia, Spain.,Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona, Catalonia, Spain
| | - Henning Hermjakob
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Sandra Orchard
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| | - Pablo Porras
- European Molecular Biology Laboratory, European Bioinformatics Institute (EMBL-EBI), Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, UK
| |
Collapse
|
15
|
Parts L, Batté A, Lopes M, Yuen MW, Laver M, San Luis B, Yue J, Pons C, Eray E, Aloy P, Liti G, van Leeuwen J. Natural variants suppress mutations in hundreds of essential genes. Mol Syst Biol 2021; 17:e10138. [PMID: 34042294 PMCID: PMC8156963 DOI: 10.15252/msb.202010138] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/24/2020] [Revised: 04/22/2021] [Accepted: 04/23/2021] [Indexed: 01/04/2023] Open
Abstract
The consequence of a mutation can be influenced by the context in which it operates. For example, loss of gene function may be tolerated in one genetic background, and lethal in another. The extent to which mutant phenotypes are malleable, the architecture of modifiers and the identities of causal genes remain largely unknown. Here, we measure the fitness effects of ~ 1,100 temperature-sensitive alleles of yeast essential genes in the context of variation from ten different natural genetic backgrounds and map the modifiers for 19 combinations. Altogether, fitness defects for 149 of the 580 tested genes (26%) could be suppressed by genetic variation in at least one yeast strain. Suppression was generally driven by gain-of-function of a single, strong modifier gene, and involved both genes encoding complex or pathway partners suppressing specific temperature-sensitive alleles, as well as general modifiers altering the effect of many alleles. The emerging frequency of suppression and range of possible mechanisms suggest that a substantial fraction of monogenic diseases could be managed by modulating other gene products.
Collapse
Affiliation(s)
- Leopold Parts
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Wellcome Sanger InstituteWellcome Genome CampusHinxtonUK
- Department of Computer ScienceUniversity of TartuTartuEstonia
| | - Amandine Batté
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
| | - Maykel Lopes
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
| | - Michael W Yuen
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Meredith Laver
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Bryan‐Joseph San Luis
- Donnelly Centre for Cellular and Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Jia‐Xing Yue
- University of Côte d’AzurCNRSINSERMIRCANNiceFrance
| | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelonaSpain
| | - Elise Eray
- Center for Integrative GenomicsUniversity of LausanneLausanneSwitzerland
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelonaSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
| | - Gianni Liti
- University of Côte d’AzurCNRSINSERMIRCANNiceFrance
| | | |
Collapse
|
16
|
Henningsen EC, Omidvar V, Della Coletta R, Michno JM, Gilbert E, Li F, Miller ME, Myers CL, Gordon SP, Vogel JP, Steffenson BJ, Kianian SF, Hirsch CD, Figueroa M. Identification of Candidate Susceptibility Genes to Puccinia graminis f. sp. tritici in Wheat. FRONTIERS IN PLANT SCIENCE 2021; 12:657796. [PMID: 33968112 PMCID: PMC8097158 DOI: 10.3389/fpls.2021.657796] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 01/24/2021] [Accepted: 03/22/2021] [Indexed: 05/30/2023]
Abstract
Wheat stem rust disease caused by Puccinia graminis f. sp. tritici (Pgt) is a global threat to wheat production. Fast evolving populations of Pgt limit the efficacy of plant genetic resistance and constrain disease management strategies. Understanding molecular mechanisms that lead to rust infection and disease susceptibility could deliver novel strategies to deploy crop resistance through genetic loss of disease susceptibility. We used comparative transcriptome-based and orthology-guided approaches to characterize gene expression changes associated with Pgt infection in susceptible and resistant Triticum aestivum genotypes as well as the non-host Brachypodium distachyon. We targeted our analysis to genes with differential expression in T. aestivum and genes suppressed or not affected in B. distachyon and report several processes potentially linked to susceptibility to Pgt, such as cell death suppression and impairment of photosynthesis. We complemented our approach with a gene co-expression network analysis to identify wheat targets to deliver resistance to Pgt through removal or modification of putative susceptibility genes.
Collapse
Affiliation(s)
- Eva C. Henningsen
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Vahid Omidvar
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Rafael Della Coletta
- Department of Agronomy and Plant Genetics, University of Minnesota, St. Paul, MN, United States
| | - Jean-Michel Michno
- Bioinformatics and Computational Biology Graduate Program, University of Minnesota, Minneapolis, MN, United States
| | - Erin Gilbert
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Feng Li
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Marisa E. Miller
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Chad L. Myers
- Bioinformatics and Computational Biology Graduate Program, University of Minnesota, Minneapolis, MN, United States
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, United States
| | | | - John P. Vogel
- Joint Genome Institute, Walnut Creek, CA, United States
- Department of Plant and Microbial Biology, University of California, Berkeley, Berkeley, CA, United States
| | - Brian J. Steffenson
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Shahryar F. Kianian
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
- USDA-ARS Cereal Disease Laboratory, St. Paul, MN, United States
| | - Cory D. Hirsch
- Department of Plant Pathology, University of Minnesota, St. Paul, MN, United States
| | - Melania Figueroa
- Commonwealth Scientific and Industrial Research Organisation, Agriculture and Food, Canberra, ACT, Australia
| |
Collapse
|
17
|
An integrated deep learning and dynamic programming method for predicting tumor suppressor genes, oncogenes, and fusion from PDB structures. Comput Biol Med 2021; 133:104323. [PMID: 33934067 DOI: 10.1016/j.compbiomed.2021.104323] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2020] [Revised: 02/18/2021] [Accepted: 03/07/2021] [Indexed: 11/20/2022]
Abstract
Mutations in proto-oncogenes (ONGO) and the loss of regulatory function of tumor suppression genes (TSG) are the common underlying mechanism for uncontrolled tumor growth. While cancer is a heterogeneous complex of distinct diseases, finding the potentiality of the genes related functionality to ONGO or TSG through computational studies can help develop drugs that target the disease. This paper proposes a classification method that starts with a preprocessing stage to extract the feature map sets from the input 3D protein structural information. The next stage is a deep convolutional neural network stage (DCNN) that outputs the probability of functional classification of genes. We explored and tested two approaches: in Approach 1, all filtered and cleaned 3D-protein-structures (PDB) are pooled together, whereas in Approach 2, the primary structures and their corresponding PDBs are separated according to the genes' primary structural information. Following the DCNN stage, a dynamic programming-based method is used to determine the final prediction of the primary structures' functionality. We validated our proposed method using the COSMIC online database. For the ONGO vs TSG classification problem the AUROC of the DCNN stage for Approach 1 and Approach 2 DCNN are 0.978 and 0.765, respectively. The AUROCs of the final genes' primary structure functionality classification for Approach 1 and Approach 2 are 0.989, and 0.879, respectively. For comparison, the current state-of-the-art reported AUROC is 0.924. Our results warrant further study to apply the deep learning models to humans' (GRCh38) genes, for predicting their corresponding probabilities of functionality in the cancer drivers.
Collapse
|
18
|
van Leeuwen J, Pons C, Tan G, Wang ZY, Hou J, Weile J, Gebbia M, Liang W, Shuteriqi E, Li Z, Lopes M, Ušaj M, Dos Santos Lopes A, van Lieshout N, Myers CL, Roth FP, Aloy P, Andrews BJ, Boone C. Systematic analysis of bypass suppression of essential genes. Mol Syst Biol 2020; 16:e9828. [PMID: 32939983 PMCID: PMC7507402 DOI: 10.15252/msb.20209828] [Citation(s) in RCA: 33] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2020] [Revised: 08/11/2020] [Accepted: 08/13/2020] [Indexed: 12/15/2022] Open
Abstract
Essential genes tend to be highly conserved across eukaryotes, but, in some cases, their critical roles can be bypassed through genetic rewiring. From a systematic analysis of 728 different essential yeast genes, we discovered that 124 (17%) were dispensable essential genes. Through whole-genome sequencing and detailed genetic analysis, we investigated the genetic interactions and genome alterations underlying bypass suppression. Dispensable essential genes often had paralogs, were enriched for genes encoding membrane-associated proteins, and were depleted for members of protein complexes. Functionally related genes frequently drove the bypass suppression interactions. These gene properties were predictive of essential gene dispensability and of specific suppressors among hundreds of genes on aneuploid chromosomes. Our findings identify yeast's core essential gene set and reveal that the properties of dispensable essential genes are conserved from yeast to human cells, correlating with human genes that display cell line-specific essentiality in the Cancer Dependency Map (DepMap) project.
Collapse
Affiliation(s)
- Jolanda van Leeuwen
- Center for Integrative
GenomicsBâtiment GénopodeUniversity of LausanneLausanneSwitzerland
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Carles Pons
- Institute for Research in
Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelonaSpain
| | - Guihong Tan
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Zi Yang Wang
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular
GeneticsUniversity of TorontoTorontoONCanada
| | - Jing Hou
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Jochen Weile
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular
GeneticsUniversity of TorontoTorontoONCanada
- Lunenfeld‐Tanenbaum Research
InstituteSinai Health SystemTorontoONCanada
| | - Marinella Gebbia
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Lunenfeld‐Tanenbaum Research
InstituteSinai Health SystemTorontoONCanada
| | - Wendy Liang
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Ermira Shuteriqi
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Zhijian Li
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | - Maykel Lopes
- Center for Integrative
GenomicsBâtiment GénopodeUniversity of LausanneLausanneSwitzerland
| | - Matej Ušaj
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
| | | | - Natascha van Lieshout
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Lunenfeld‐Tanenbaum Research
InstituteSinai Health SystemTorontoONCanada
| | - Chad L Myers
- Department of Computer Science and
EngineeringUniversity of Minnesota‐Twin CitiesMinneapolisMNUSA
| | - Frederick P Roth
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular
GeneticsUniversity of TorontoTorontoONCanada
- Lunenfeld‐Tanenbaum Research
InstituteSinai Health SystemTorontoONCanada
- Department of Computer
ScienceUniversity of TorontoTorontoONCanada
| | - Patrick Aloy
- Institute for Research in
Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelonaSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)BarcelonaSpain
| | - Brenda J Andrews
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular
GeneticsUniversity of TorontoTorontoONCanada
| | - Charles Boone
- Donnelly Centre for Cellular and
Biomolecular ResearchUniversity of TorontoTorontoONCanada
- Department of Molecular
GeneticsUniversity of TorontoTorontoONCanada
| |
Collapse
|
19
|
Kuzmin E, VanderSluis B, Nguyen Ba AN, Wang W, Koch EN, Usaj M, Khmelinskii A, Usaj MM, van Leeuwen J, Kraus O, Tresenrider A, Pryszlak M, Hu MC, Varriano B, Costanzo M, Knop M, Moses A, Myers CL, Andrews BJ, Boone C. Exploring whole-genome duplicate gene retention with complex genetic interaction analysis. Science 2020; 368:eaaz5667. [PMID: 32586993 PMCID: PMC7539174 DOI: 10.1126/science.aaz5667] [Citation(s) in RCA: 72] [Impact Index Per Article: 14.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2019] [Accepted: 05/06/2020] [Indexed: 12/25/2022]
Abstract
Whole-genome duplication has played a central role in the genome evolution of many organisms, including the human genome. Most duplicated genes are eliminated, and factors that influence the retention of persisting duplicates remain poorly understood. We describe a systematic complex genetic interaction analysis with yeast paralogs derived from the whole-genome duplication event. Mapping of digenic interactions for a deletion mutant of each paralog, and of trigenic interactions for the double mutant, provides insight into their roles and a quantitative measure of their functional redundancy. Trigenic interaction analysis distinguishes two classes of paralogs: a more functionally divergent subset and another that retained more functional overlap. Gene feature analysis and modeling suggest that evolutionary trajectories of duplicated genes are dictated by combined functional and structural entanglement factors.
Collapse
Affiliation(s)
- Elena Kuzmin
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Alex N Nguyen Ba
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Elizabeth N Koch
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA
| | - Matej Usaj
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Anton Khmelinskii
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
| | | | | | - Oren Kraus
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Amy Tresenrider
- Department of Molecular and Cell Biology, University of California, Berkeley, CA, USA
| | - Michael Pryszlak
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Ming-Che Hu
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Brenda Varriano
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Michael Costanzo
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Michael Knop
- Zentrum für Molekulare Biologie der Universität Heidelberg (ZMBH), DKFZ-ZMBH Alliance, 69120 Heidelberg, Germany
- Cell Morphogenesis and Signal Transduction, German Cancer Research Center (DKFZ), 69120 Heidelberg, Germany
| | - Alan Moses
- Department of Cell and Systems Biology, University of Toronto, Toronto, Ontario, Canada
- Center for Analysis of Evolution and Function, University of Toronto, Toronto, Ontario, Canada
- Department of Ecology and Evolutionary Biology, University of Toronto, Toronto, Ontario, Canada
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN 55455, USA.
| | - Brenda J Andrews
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| | - Charles Boone
- Donnelly Centre, University of Toronto, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, Toronto, Ontario M5S 3E1, Canada
| |
Collapse
|
20
|
Peng J, Xue H, Wei Z, Tuncali I, Hao J, Shang X. Integrating multi-network topology for gene function prediction using deep neural networks. Brief Bioinform 2020; 22:2096-2105. [PMID: 32249297 DOI: 10.1093/bib/bbaa036] [Citation(s) in RCA: 50] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/02/2019] [Revised: 02/09/2020] [Accepted: 02/25/2020] [Indexed: 01/18/2023] Open
Abstract
MOTIVATION The emergence of abundant biological networks, which benefit from the development of advanced high-throughput techniques, contributes to describing and modeling complex internal interactions among biological entities such as genes and proteins. Multiple networks provide rich information for inferring the function of genes or proteins. To extract functional patterns of genes based on multiple heterogeneous networks, network embedding-based methods, aiming to capture non-linear and low-dimensional feature representation based on network biology, have recently achieved remarkable performance in gene function prediction. However, existing methods do not consider the shared information among different networks during the feature learning process. RESULTS Taking the correlation among the networks into account, we design a novel semi-supervised autoencoder method to integrate multiple networks and generate a low-dimensional feature representation. Then we utilize a convolutional neural network based on the integrated feature embedding to annotate unlabeled gene functions. We test our method on both yeast and human datasets and compare with three state-of-the-art methods. The results demonstrate the superior performance of our method. We not only provide a comprehensive analysis of the performance of the newly proposed algorithm but also provide a tool for extracting features of genes based on multiple networks, which can be used in the downstream machine learning task. AVAILABILITY DeepMNE-CNN is freely available at https://github.com/xuehansheng/DeepMNE-CNN. CONTACT jiajiepeng@nwpu.edu.cn; shang@nwpu.edu.cn; jianye.hao@tju.edu.cn.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Hansheng Xue
- Key Laboratory of Big Data Storage and Management, Ministry of Industry and Information Technology, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Zhongyu Wei
- Research School of Computer Science, Australian National University, Canberra, 2601, Australia
| | - Idil Tuncali
- School of Data Science, Fudan University, Shanghai, 200433, China
| | | | | |
Collapse
|
21
|
Mattiazzi Usaj M, Sahin N, Friesen H, Pons C, Usaj M, Masinas MPD, Shuteriqi E, Shkurin A, Aloy P, Morris Q, Boone C, Andrews BJ. Systematic genetics and single-cell imaging reveal widespread morphological pleiotropy and cell-to-cell variability. Mol Syst Biol 2020; 16:e9243. [PMID: 32064787 PMCID: PMC7025093 DOI: 10.15252/msb.20199243] [Citation(s) in RCA: 31] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2019] [Revised: 12/16/2019] [Accepted: 01/15/2020] [Indexed: 12/13/2022] Open
Abstract
Our ability to understand the genotype-to-phenotype relationship is hindered by the lack of detailed understanding of phenotypes at a single-cell level. To systematically assess cell-to-cell phenotypic variability, we combined automated yeast genetics, high-content screening and neural network-based image analysis of single cells, focussing on genes that influence the architecture of four subcellular compartments of the endocytic pathway as a model system. Our unbiased assessment of the morphology of these compartments-endocytic patch, actin patch, late endosome and vacuole-identified 17 distinct mutant phenotypes associated with ~1,600 genes (~30% of all yeast genes). Approximately half of these mutants exhibited multiple phenotypes, highlighting the extent of morphological pleiotropy. Quantitative analysis also revealed that incomplete penetrance was prevalent, with the majority of mutants exhibiting substantial variability in phenotype at the single-cell level. Our single-cell analysis enabled exploration of factors that contribute to incomplete penetrance and cellular heterogeneity, including replicative age, organelle inheritance and response to stress.
Collapse
Affiliation(s)
| | - Nil Sahin
- The Donnelly CentreUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| | | | - Carles Pons
- Institute for Research in Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelona, CataloniaSpain
| | - Matej Usaj
- The Donnelly CentreUniversity of TorontoTorontoONCanada
| | | | | | - Aleksei Shkurin
- The Donnelly CentreUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona)The Barcelona Institute for Science and TechnologyBarcelona, CataloniaSpain
- Institució Catalana de Recerca i Estudis Avançats (ICREA)Barcelona, CataloniaSpain
| | - Quaid Morris
- The Donnelly CentreUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
- Computational and Systems Biology ProgramMemorial Sloan Kettering Cancer CenterNew YorkNYUSA
| | - Charles Boone
- The Donnelly CentreUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
- RIKEN Centre for Sustainable Resource ScienceWakoSaitamaJapan
| | - Brenda J Andrews
- The Donnelly CentreUniversity of TorontoTorontoONCanada
- Department of Molecular GeneticsUniversity of TorontoTorontoONCanada
| |
Collapse
|
22
|
Hong J, Luo Y, Zhang Y, Ying J, Xue W, Xie T, Tao L, Zhu F. Protein functional annotation of simultaneously improved stability, accuracy and false discovery rate achieved by a sequence-based deep learning. Brief Bioinform 2019; 21:1437-1447. [PMID: 31504150 PMCID: PMC7412958 DOI: 10.1093/bib/bbz081] [Citation(s) in RCA: 111] [Impact Index Per Article: 18.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 05/27/2019] [Accepted: 06/10/2019] [Indexed: 11/12/2022] Open
Abstract
Functional annotation of protein sequence with high accuracy has become one of the most important issues in modern biomedical studies, and computational approaches of significantly accelerated analysis process and enhanced accuracy are greatly desired. Although a variety of methods have been developed to elevate protein annotation accuracy, their ability in controlling false annotation rates remains either limited or not systematically evaluated. In this study, a protein encoding strategy, together with a deep learning algorithm, was proposed to control the false discovery rate in protein function annotation, and its performances were systematically compared with that of the traditional similarity-based and de novo approaches. Based on a comprehensive assessment from multiple perspectives, the proposed strategy and algorithm were found to perform better in both prediction stability and annotation accuracy compared with other de novo methods. Moreover, an in-depth assessment revealed that it possessed an improved capacity of controlling the false discovery rate compared with traditional methods. All in all, this study not only provided a comprehensive analysis on the performances of the newly proposed strategy but also provided a tool for the researcher in the fields of protein function annotation.
Collapse
Affiliation(s)
- Jiajun Hong
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yongchao Luo
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Yang Zhang
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China.,School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Junbiao Ying
- College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| | - Weiwei Xue
- School of Pharmaceutical Sciences, Chongqing University, Chongqing, China
| | - Tian Xie
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Lin Tao
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China
| | - Feng Zhu
- Key Laboratory of Elemene Class Anti-cancer Chinese Medicine of Zhejiang Province, School of Medicine, Hangzhou Normal University, Hangzhou, China.,College of Pharmaceutical Sciences, Zhejiang University, Hangzhou, China
| |
Collapse
|
23
|
Pan Q, Wei J, Guo F, Huang S, Gong Y, Liu H, Liu J, Li L. Trait ontology analysis based on association mapping studies bridges the gap between crop genomics and Phenomics. BMC Genomics 2019; 20:443. [PMID: 31159731 PMCID: PMC6547493 DOI: 10.1186/s12864-019-5812-0] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2018] [Accepted: 05/20/2019] [Indexed: 01/08/2023] Open
Abstract
BACKGROUND Trait ontology (TO) analysis is a powerful system for functional annotation and enrichment analysis of genes. However, given the complexity of the molecular mechanisms underlying phenomes, only a few hundred gene-to-TO relationships in plants have been elucidated to date, limiting the pace of research in this "big data" era. RESULTS Here, we curated all the available trait associated sites (TAS) information from 79 association mapping studies of maize (Zea mays L.) and rice (Oryza sativa L.) lines with diverse genetic backgrounds and built a large-scale TAS-derived TO system for functional annotation of genes in various crops. Our TO system contains information for up to 18,042 genes (6345 in maize at the 25 k level and 11,697 in rice at the 50 k level), including gene-to-TO relationships, which covers over one fifth of the annotated gene sets for maize and rice. A comparison of Gene Ontology (GO) vs. TO analysis demonstrated that the TAS-derived TO system is an efficient alternative tool for gene functional annotation and enrichment analysis. We therefore combined information from the TO, GO, metabolic pathway, and co-expression network databases and constructed the TAS system, which is publicly available at http://tas.hzau.edu.cn . TAS provides a user-friendly interface for functional annotation of genes, enrichment analysis, genome-wide extraction of trait-associated genes, and crosschecking of different functional annotation databases. CONCLUSIONS TAS bridges the gap between genomic and phenomic information in crops. This easy-to-use tool will be useful for geneticists, biologists, and breeders in the agricultural community, as it facilitates the dissection of molecular mechanisms conferring agronomic traits in an easy, genome-wide manner.
Collapse
Affiliation(s)
- Qingchun Pan
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Junfeng Wei
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Feng Guo
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Suiyong Huang
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Yong Gong
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Hao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China
| | - Jianxiao Liu
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| | - Lin Li
- National Key Laboratory of Crop Genetic Improvement, Huazhong Agricultural University, Wuhan, 430070, China.
| |
Collapse
|
24
|
Minor Isozymes Tailor Yeast Metabolism to Carbon Availability. mSystems 2019; 4:mSystems00170-18. [PMID: 30834327 PMCID: PMC6392091 DOI: 10.1128/msystems.00170-18] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2018] [Accepted: 01/21/2019] [Indexed: 11/23/2022] Open
Abstract
Gene duplication is one of the main evolutionary paths to new protein function. Typically, duplicated genes either accumulate mutations and degrade into pseudogenes or are retained and diverge in function. Some duplicated genes, however, show long-term persistence without apparently acquiring new function. An important class of isozymes consists of those that catalyze the same reaction in the same compartment, where knockout of one isozyme causes no known functional defect. Here we present an approach to assigning specific functional roles to seemingly redundant isozymes. First, gene expression data are analyzed computationally to identify conditions under which isozyme expression diverges. Then, knockouts are compared under those conditions. This approach revealed that the expression of many yeast isozymes diverges in response to carbon availability and that carbon source manipulations can induce fitness phenotypes for seemingly redundant isozymes. A driver of these fitness phenotypes is differential allosteric enzyme regulation, indicating isozyme divergence to achieve more-optimal control of metabolism. Isozymes are enzymes that differ in sequence but catalyze the same chemical reactions. Despite their apparent redundancy, isozymes are often retained over evolutionary time, suggesting that they contribute to fitness. We developed an unsupervised computational method for identifying environmental conditions under which isozymes are likely to make fitness contributions. This method analyzes published gene expression data to find specific experimental perturbations that induce differential isozyme expression. In yeast, we found that isozymes are strongly enriched in the pathways of central carbon metabolism and that many isozyme pairs show anticorrelated expression during the respirofermentative shift. Building on these observations, we assigned function to two minor central carbon isozymes, aconitase 2 (ACO2) and pyruvate kinase 2 (PYK2). ACO2 is expressed during fermentation and proves advantageous when glucose is limiting. PYK2 is expressed during respiration and proves advantageous for growth on three-carbon substrates. PYK2’s deletion can be rescued by expressing the major pyruvate kinase only if that enzyme carries mutations mirroring PYK2’s allosteric regulation. Thus, central carbon isozymes help to optimize allosteric metabolic regulation under a broad range of potential nutrient conditions while requiring only a small number of transcriptional states. IMPORTANCE Gene duplication is one of the main evolutionary paths to new protein function. Typically, duplicated genes either accumulate mutations and degrade into pseudogenes or are retained and diverge in function. Some duplicated genes, however, show long-term persistence without apparently acquiring new function. An important class of isozymes consists of those that catalyze the same reaction in the same compartment, where knockout of one isozyme causes no known functional defect. Here we present an approach to assigning specific functional roles to seemingly redundant isozymes. First, gene expression data are analyzed computationally to identify conditions under which isozyme expression diverges. Then, knockouts are compared under those conditions. This approach revealed that the expression of many yeast isozymes diverges in response to carbon availability and that carbon source manipulations can induce fitness phenotypes for seemingly redundant isozymes. A driver of these fitness phenotypes is differential allosteric enzyme regulation, indicating isozyme divergence to achieve more-optimal control of metabolism.
Collapse
|
25
|
Kuzmin E, VanderSluis B, Wang W, Tan G, Deshpande R, Chen Y, Usaj M, Balint A, Mattiazzi Usaj M, van Leeuwen J, Koch EN, Pons C, Dagilis AJ, Pryszlak M, Wang ZY, Hanchard J, Riggi M, Xu K, Heydari H, San Luis BJ, Shuteriqi E, Zhu H, Van Dyk N, Sharifpoor S, Costanzo M, Loewith R, Caudy A, Bolnick D, Brown GW, Andrews BJ, Boone C, Myers CL. Systematic analysis of complex genetic interactions. Science 2018; 360:eaao1729. [PMID: 29674565 PMCID: PMC6215713 DOI: 10.1126/science.aao1729] [Citation(s) in RCA: 176] [Impact Index Per Article: 25.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/25/2017] [Accepted: 02/23/2018] [Indexed: 12/11/2022]
Abstract
To systematically explore complex genetic interactions, we constructed ~200,000 yeast triple mutants and scored negative trigenic interactions. We selected double-mutant query genes across a broad spectrum of biological processes, spanning a range of quantitative features of the global digenic interaction network and tested for a genetic interaction with a third mutation. Trigenic interactions often occurred among functionally related genes, and essential genes were hubs on the trigenic network. Despite their functional enrichment, trigenic interactions tended to link genes in distant bioprocesses and displayed a weaker magnitude than digenic interactions. We estimate that the global trigenic interaction network is ~100 times as large as the global digenic network, highlighting the potential for complex genetic interactions to affect the biology of inheritance, including the genotype-to-phenotype relationship.
Collapse
Affiliation(s)
- Elena Kuzmin
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Guihong Tan
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Raamesh Deshpande
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Yiqun Chen
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Matej Usaj
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Attila Balint
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Biochemistry, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Mojca Mattiazzi Usaj
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Jolanda van Leeuwen
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Elizabeth N Koch
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Carles Pons
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Andrius J Dagilis
- Department of Integrative Biology, 1 University Station C0990, University of Texas at Austin, Austin, TX 78712, USA
| | - Michael Pryszlak
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Zi Yang Wang
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Julia Hanchard
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Margot Riggi
- Department of Molecular Biology, University of Geneva, Geneva 1211, Switzerland
- Department of Biochemistry, University of Geneva, 1211 Geneva, Switzerland
- iGE3 (Institute of Genetics and Genomics of Geneva), 1211 Geneva, Switzerland
- Swiss National Centre for Competence in Research Programme Chemical Biology, 1211 Geneva, Switzerland
| | - Kaicong Xu
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Hamed Heydari
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Bryan-Joseph San Luis
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Ermira Shuteriqi
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Hongwei Zhu
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Nydia Van Dyk
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Sara Sharifpoor
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Michael Costanzo
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Robbie Loewith
- Department of Molecular Biology, University of Geneva, Geneva 1211, Switzerland
- iGE3 (Institute of Genetics and Genomics of Geneva), 1211 Geneva, Switzerland
- Swiss National Centre for Competence in Research Programme Chemical Biology, 1211 Geneva, Switzerland
| | - Amy Caudy
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Daniel Bolnick
- Department of Integrative Biology, 1 University Station C0990, University of Texas at Austin, Austin, TX 78712, USA
| | - Grant W Brown
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- Department of Biochemistry, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Brenda J Andrews
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Charles Boone
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
- RIKEN Center for Sustainable Resource Science, Wako, Saitama, Japan
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA.
| |
Collapse
|
26
|
ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network. Molecules 2017; 22:molecules22101732. [PMID: 29039790 PMCID: PMC6151571 DOI: 10.3390/molecules22101732] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Revised: 10/11/2017] [Accepted: 10/11/2017] [Indexed: 11/25/2022] Open
Abstract
With the development of next generation sequencing techniques, it is fast and cheap to determine protein sequences but relatively slow and expensive to extract useful information from protein sequences because of limitations of traditional biological experimental techniques. Protein function prediction has been a long standing challenge to fill the gap between the huge amount of protein sequences and the known function. In this paper, we propose a novel method to convert the protein function problem into a language translation problem by the new proposed protein sequence language “ProLan” to the protein function language “GOLan”, and build a neural machine translation model based on recurrent neural networks to translate “ProLan” language to “GOLan” language. We blindly tested our method by attending the latest third Critical Assessment of Function Annotation (CAFA 3) in 2016, and also evaluate the performance of our methods on selected proteins whose function was released after CAFA competition. The good performance on the training and testing datasets demonstrates that our new proposed method is a promising direction for protein function prediction. In summary, we first time propose a method which converts the protein function prediction problem to a language translation problem and applies a neural machine translation model for protein function prediction.
Collapse
|
27
|
van Leeuwen J, Pons C, Mellor JC, Yamaguchi TN, Friesen H, Koschwanez J, Ušaj MM, Pechlaner M, Takar M, Ušaj M, VanderSluis B, Andrusiak K, Bansal P, Baryshnikova A, Boone CE, Cao J, Cote A, Gebbia M, Horecka G, Horecka I, Kuzmin E, Legro N, Liang W, van Lieshout N, McNee M, San Luis BJ, Shaeri F, Shuteriqi E, Sun S, Yang L, Youn JY, Yuen M, Costanzo M, Gingras AC, Aloy P, Oostenbrink C, Murray A, Graham TR, Myers CL, Andrews BJ, Roth FP, Boone C. Exploring genetic suppression interactions on a global scale. Science 2017; 354:354/6312/aag0839. [PMID: 27811238 DOI: 10.1126/science.aag0839] [Citation(s) in RCA: 125] [Impact Index Per Article: 15.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/09/2016] [Accepted: 10/04/2016] [Indexed: 12/21/2022]
Abstract
Genetic suppression occurs when the phenotypic defects caused by a mutation in a particular gene are rescued by a mutation in a second gene. To explore the principles of genetic suppression, we examined both literature-curated and unbiased experimental data, involving systematic genetic mapping and whole-genome sequencing, to generate a large-scale suppression network among yeast genes. Most suppression pairs identified novel relationships among functionally related genes, providing new insights into the functional wiring diagram of the cell. In addition to suppressor mutations, we identified frequent secondary mutations,in a subset of genes, that likely cause a delay in the onset of stationary phase, which appears to promote their enrichment within a propagating population. These findings allow us to formulate and quantify general mechanisms of genetic suppression.
Collapse
Affiliation(s)
- Jolanda van Leeuwen
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Carles Pons
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA.,Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain
| | - Joseph C Mellor
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Takafumi N Yamaguchi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Helena Friesen
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - John Koschwanez
- Department of Molecular and Cellular Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Mojca Mattiazzi Ušaj
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Maria Pechlaner
- Institute of Molecular Modeling and Simulation, University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Mehmet Takar
- Department of Biological Sciences, Vanderbilt University, 1161 21st Avenue South, Nashville, TN 37232, USA
| | - Matej Ušaj
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Kerry Andrusiak
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Pritpal Bansal
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Anastasia Baryshnikova
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Claire E Boone
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Jessica Cao
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Atina Cote
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Marinella Gebbia
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Gene Horecka
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Ira Horecka
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Elena Kuzmin
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Nicole Legro
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Wendy Liang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Natascha van Lieshout
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Margaret McNee
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Bryan-Joseph San Luis
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Fatemeh Shaeri
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Ermira Shuteriqi
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Song Sun
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Lu Yang
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Ji-Young Youn
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada
| | - Michael Yuen
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Michael Costanzo
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Anne-Claude Gingras
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Patrick Aloy
- Institute for Research in Biomedicine (IRB Barcelona), the Barcelona Institute for Science and Technology, Barcelona, Catalonia, Spain.,Institució Catalana de Recerca i Estudis Avançats, Barcelona, Catalonia, Spain
| | - Chris Oostenbrink
- Institute of Molecular Modeling and Simulation, University of Natural Resources and Life Sciences, Muthgasse 18, A-1190 Vienna, Austria
| | - Andrew Murray
- Department of Molecular and Cellular Biology, Harvard University, 52 Oxford Street, Cambridge, MA 02138, USA
| | - Todd R Graham
- Department of Biological Sciences, Vanderbilt University, 1161 21st Avenue South, Nashville, TN 37232, USA
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. .,Canadian Institute for Advanced Research, 180 Dundas Street West, Toronto, Ontario M5G 1Z8, Canada
| | - Brenda J Andrews
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada. .,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Frederick P Roth
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada. .,Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, 600 University Avenue, Toronto, Ontario M5G 1X5, Canada.,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Canadian Institute for Advanced Research, 180 Dundas Street West, Toronto, Ontario M5G 1Z8, Canada.,Department of Computer Science, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada
| | - Charles Boone
- Donnelly Centre for Cellular and Biomolecular Research, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada. .,Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto, Ontario M5S 3E1, Canada.,Canadian Institute for Advanced Research, 180 Dundas Street West, Toronto, Ontario M5G 1Z8, Canada
| |
Collapse
|
28
|
Costanzo M, VanderSluis B, Koch EN, Baryshnikova A, Pons C, Tan G, Wang W, Usaj M, Hanchard J, Lee SD, Pelechano V, Styles EB, Billmann M, van Leeuwen J, van Dyk N, Lin ZY, Kuzmin E, Nelson J, Piotrowski JS, Srikumar T, Bahr S, Chen Y, Deshpande R, Kurat CF, Li SC, Li Z, Usaj MM, Okada H, Pascoe N, San Luis BJ, Sharifpoor S, Shuteriqi E, Simpkins SW, Snider J, Suresh HG, Tan Y, Zhu H, Malod-Dognin N, Janjic V, Przulj N, Troyanskaya OG, Stagljar I, Xia T, Ohya Y, Gingras AC, Raught B, Boutros M, Steinmetz LM, Moore CL, Rosebrock AP, Caudy AA, Myers CL, Andrews B, Boone C. A global genetic interaction network maps a wiring diagram of cellular function. Science 2017; 353:353/6306/aaf1420. [PMID: 27708008 DOI: 10.1126/science.aaf1420] [Citation(s) in RCA: 841] [Impact Index Per Article: 105.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022]
Abstract
We generated a global genetic interaction network for Saccharomyces cerevisiae, constructing more than 23 million double mutants, identifying about 550,000 negative and about 350,000 positive genetic interactions. This comprehensive network maps genetic interactions for essential gene pairs, highlighting essential genes as densely connected hubs. Genetic interaction profiles enabled assembly of a hierarchical model of cell function, including modules corresponding to protein complexes and pathways, biological processes, and cellular compartments. Negative interactions connected functionally related genes, mapped core bioprocesses, and identified pleiotropic genes, whereas positive interactions often mapped general regulatory connections among gene pairs, rather than shared functionality. The global network illustrates how coherent sets of genetic interactions connect protein complex and pathway modules to map a functional wiring diagram of the cell.
Collapse
Affiliation(s)
- Michael Costanzo
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Benjamin VanderSluis
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. Simons Center for Data Analysis, Simons Foundation, 160 Fifth Avenue, New York, NY 10010, USA
| | - Elizabeth N Koch
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Anastasia Baryshnikova
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Carles Pons
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Guihong Tan
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Wen Wang
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Matej Usaj
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Julia Hanchard
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Susan D Lee
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, MA 02111, USA
| | - Vicent Pelechano
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany
| | - Erin B Styles
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Maximilian Billmann
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Heidelberg University, Heidelberg, Germany
| | - Jolanda van Leeuwen
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Nydia van Dyk
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Zhen-Yuan Lin
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto ON, Canada
| | - Elena Kuzmin
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Justin Nelson
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Jeff S Piotrowski
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Chemical Genomics Research Group, RIKEN Center for Sustainable Resource Sciences (CSRS), Saitama, Japan
| | - Tharan Srikumar
- Princess Margaret Cancer Centre, University Health Network and Department of Medical Biophysics, University of Toronto, Toronto ON, Canada
| | - Sondra Bahr
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Yiqun Chen
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Raamesh Deshpande
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Christoph F Kurat
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Sheena C Li
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Chemical Genomics Research Group, RIKEN Center for Sustainable Resource Sciences (CSRS), Saitama, Japan
| | - Zhijian Li
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Mojca Mattiazzi Usaj
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Hiroki Okada
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan 277-8561
| | - Natasha Pascoe
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Bryan-Joseph San Luis
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Sara Sharifpoor
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Emira Shuteriqi
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Scott W Simpkins
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA
| | - Jamie Snider
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Harsha Garadi Suresh
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Yizhao Tan
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Hongwei Zhu
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Noel Malod-Dognin
- Computer Science Deptartment, University College London, London WC1E 6BT, UK
| | - Vuk Janjic
- Department of Computing, Imperial College London, UK
| | - Natasa Przulj
- Computer Science Deptartment, University College London, London WC1E 6BT, UK. School of Computing (RAF), Union University, Belgrade, Serbia
| | - Olga G Troyanskaya
- Simons Center for Data Analysis, Simons Foundation, 160 Fifth Avenue, New York, NY 10010, USA. Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Igor Stagljar
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Biochemistry, University of Toronto, Toronto, ON, Canada
| | - Tian Xia
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China, 430074
| | - Yoshikazu Ohya
- Department of Integrated Biosciences, Graduate School of Frontier Sciences, University of Tokyo, Kashiwa, Chiba, Japan 277-8561
| | - Anne-Claude Gingras
- Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, Toronto ON, Canada
| | - Brian Raught
- Princess Margaret Cancer Centre, University Health Network and Department of Medical Biophysics, University of Toronto, Toronto ON, Canada
| | - Michael Boutros
- Division of Signaling and Functional Genomics, German Cancer Research Center (DKFZ) and Heidelberg University, Heidelberg, Germany
| | - Lars M Steinmetz
- European Molecular Biology Laboratory (EMBL), Genome Biology Unit, 69117 Heidelberg, Germany. Department of Genetics, School of Medicine and Stanford Genome Technology Center Stanford University, Palo Alto, CA 94304, USA
| | - Claire L Moore
- Department of Developmental, Molecular and Chemical Biology, Tufts University School of Medicine, Boston, MA 02111, USA
| | - Adam P Rosebrock
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Amy A Caudy
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1
| | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA. Program in Biomedical Informatics and Computational Biology, University of Minnesota-Twin Cities, 200 Union Street, Minneapolis, MN 55455, USA.
| | - Brenda Andrews
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1.
| | - Charles Boone
- The Donnelly Centre, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Department of Molecular Genetics, University of Toronto, 160 College Street, Toronto ON, Canada M5S 3E1. Chemical Genomics Research Group, RIKEN Center for Sustainable Resource Sciences (CSRS), Saitama, Japan.
| |
Collapse
|
29
|
Abstract
A biological experiment is the most reliable way of assigning function to a protein. However, in the era of high-throughput sequencing, scientists are unable to carry out experiments to determine the function of every single gene product. Therefore, to gain insights into the activity of these molecules and guide experiments, we must rely on computational means to functionally annotate the majority of sequence data. To understand how well these algorithms perform, we have established a challenge involving a broad scientific community in which we evaluate different annotation methods according to their ability to predict the associations between previously unannotated protein sequences and Gene Ontology terms. Here we discuss the rationale, benefits, and issues associated with evaluating computational methods in an ongoing community-wide challenge.
Collapse
|
30
|
Cho H, Berger B, Peng J. Compact Integration of Multi-Network Topology for Functional Analysis of Genes. Cell Syst 2016; 3:540-548.e5. [PMID: 27889536 DOI: 10.1016/j.cels.2016.10.017] [Citation(s) in RCA: 152] [Impact Index Per Article: 16.9] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/09/2016] [Revised: 08/14/2016] [Accepted: 10/19/2016] [Indexed: 01/18/2023]
Abstract
The topological landscape of molecular or functional interaction networks provides a rich source of information for inferring functional patterns of genes or proteins. However, a pressing yet-unsolved challenge is how to combine multiple heterogeneous networks, each having different connectivity patterns, to achieve more accurate inference. Here, we describe the Mashup framework for scalable and robust network integration. In Mashup, the diffusion in each network is first analyzed to characterize the topological context of each node. Next, the high-dimensional topological patterns in individual networks are canonically represented using low-dimensional vectors, one per gene or protein. These vectors can then be plugged into off-the-shelf machine learning methods to derive functional insights about genes or proteins. We present tools based on Mashup that achieve state-of-the-art performance in three diverse functional inference tasks: protein function prediction, gene ontology reconstruction, and genetic interaction prediction. Mashup enables deeper insights into the structure of rapidly accumulating and diverse biological network data and can be broadly applied to other network science domains.
Collapse
Affiliation(s)
- Hyunghoon Cho
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA
| | - Bonnie Berger
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Mathematics, MIT, Cambridge, MA 02139, USA.
| | - Jian Peng
- Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, USA; Department of Computer Science, University of Illinois at Urbana-Champaign, Champaign, IL 61801, USA.
| |
Collapse
|
31
|
Li L, Briskine R, Schaefer R, Schnable PS, Myers CL, Flagel LE, Springer NM, Muehlbauer GJ. Co-expression network analysis of duplicate genes in maize (Zea mays L.) reveals no subgenome bias. BMC Genomics 2016; 17:875. [PMID: 27814670 PMCID: PMC5097351 DOI: 10.1186/s12864-016-3194-0] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2016] [Accepted: 10/22/2016] [Indexed: 01/08/2023] Open
Abstract
Background Gene duplication is prevalent in many species and can result in coding and regulatory divergence. Gene duplications can be classified as whole genome duplication (WGD), tandem and inserted (non-syntenic). In maize, WGD resulted in the subgenomes maize1 and maize2, of which maize1 is considered the dominant subgenome. However, the landscape of co-expression network divergence of duplicate genes in maize is still largely uncharacterized. Results To address the consequence of gene duplication on co-expression network divergence, we developed a gene co-expression network from RNA-seq data derived from 64 different tissues/stages of the maize reference inbred-B73. WGD, tandem and inserted gene duplications exhibited distinct regulatory divergence. Inserted duplicate genes were more likely to be singletons in the co-expression networks, while WGD duplicate genes were likely to be co-expressed with other genes. Tandem duplicate genes were enriched in the co-expression pattern where co-expressed genes were nearly identical for the duplicates in the network. Older gene duplications exhibit more extensive co-expression variation than younger duplications. Overall, non-syntenic genes primarily from inserted duplications show more co-expression divergence. Also, such enlarged co-expression divergence is significantly related to duplication age. Moreover, subgenome dominance was not observed in the co-expression networks – maize1 and maize2 exhibit similar levels of intra subgenome correlations. Intriguingly, the level of inter subgenome co-expression was similar to the level of intra subgenome correlations, and genes from specific subgenomes were not likely to be the enriched in co-expression network modules and the hub genes were not predominantly from any specific subgenomes in maize. Conclusions Our work provides a comprehensive analysis of maize co-expression network divergence for three different types of gene duplications and identifies potential relationships between duplication types, duplication ages and co-expression consequences. Electronic supplementary material The online version of this article (doi:10.1186/s12864-016-3194-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Lin Li
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN, 55108, USA.,National Key Laboratory of Crop Genetic Improvement, College of Plant Science and Technology, Huazhong Agricultural University, Wuhan, 430070, China
| | - Roman Briskine
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Robert Schaefer
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | | | - Chad L Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, MN, 55455, USA
| | - Lex E Flagel
- Monsanto Company, Chesterfield, MO, 63017, USA.,Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA
| | - Nathan M Springer
- Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA
| | - Gary J Muehlbauer
- Department of Agronomy and Plant Genetics, University of Minnesota, Saint Paul, MN, 55108, USA. .,Department of Plant and Microbial Biology, University of Minnesota, Saint Paul, MN, 55108, USA.
| |
Collapse
|
32
|
Lagani V, Karozou AD, Gomez-Cabrero D, Silberberg G, Tsamardinos I. A comparative evaluation of data-merging and meta-analysis methods for reconstructing gene-gene interactions. BMC Bioinformatics 2016; 17 Suppl 5:194. [PMID: 27294826 PMCID: PMC4905611 DOI: 10.1186/s12859-016-1038-1] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/16/2022] Open
Abstract
BACKGROUND We address the problem of integratively analyzing multiple gene expression, microarray datasets in order to reconstruct gene-gene interaction networks. Integrating multiple datasets is generally believed to provide increased statistical power and to lead to a better characterization of the system under study. However, the presence of systematic variation across different studies makes network reverse-engineering tasks particularly challenging. We contrast two approaches that have been frequently used in the literature for addressing systematic biases: meta-analysis methods, which first calculate opportune statistics on single datasets and successively summarize them, and data-merging methods, which directly analyze the pooled data after removing eventual biases. This comparative evaluation is performed on both synthetic and real data, the latter consisting of two manually curated microarray compendia comprising several E. coli and Yeast studies, respectively. Furthermore, the reconstruction of the regulatory network of the transcription factor Ikaros in human Peripheral Blood Mononuclear Cells (PBMCs) is presented as a case-study. RESULTS The meta-analysis and data-merging methods included in our experimentations provided comparable performances on both synthetic and real data. Furthermore, both approaches outperformed (a) the naïve solution of merging data together ignoring possible biases, and (b) the results that are expected when only one dataset out of the available ones is analyzed in isolation. Using correlation statistics proved to be more effective than using p-values for correctly ranking candidate interactions. The results from the PBMC case-study indicate that the findings of the present study generalize to different types of network reconstruction algorithms. CONCLUSIONS Ignoring the systematic variations that differentiate heterogeneous studies can produce results that are statistically indistinguishable from random guessing. Meta-analysis and data merging methods have proved equally effective in addressing this issue, and thus researchers may safely select the approach that best suit their specific application.
Collapse
Affiliation(s)
- Vincenzo Lagani
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
- />Computer Science Department, University of Crete, Heraklion, Sweden
| | - Argyro D. Karozou
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
| | - David Gomez-Cabrero
- />Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Heraklion, Sweden
- />Science for Life Laboratory, 17121 Solna, Sweden
| | - Gilad Silberberg
- />Unit of Computational Medicine, Department of Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Center for Molecular Medicine, Karolinska Institutet, 171 77 Stockholm, Sweden
- />Unit of Clinical Epidemiology, Department of Medicine, Karolinska University Hospital, L8, 17176 Heraklion, Sweden
- />Science for Life Laboratory, 17121 Solna, Sweden
| | - Ioannis Tsamardinos
- />Institute of Computer Science, Foundation for Research and Technology – Hellas, Heraklion, Greece
- />Computer Science Department, University of Crete, Heraklion, Sweden
| |
Collapse
|
33
|
Abstract
The laboratory mouse is the primary mammalian species used for studying alternative splicing events. Recent studies have generated computational models to predict functions for splice isoforms in the mouse. However, the functional relationship network, describing the probability of splice isoforms participating in the same biological process or pathway, has not yet been studied in the mouse. Here we describe a rich genome-wide resource of mouse networks at the isoform level, which was generated using a unique framework that was originally developed to infer isoform functions. This network was built through integrating heterogeneous genomic and protein data, including RNA-seq, exon array, protein docking and pseudo-amino acid composition. Through simulation and cross-validation studies, we demonstrated the accuracy of the algorithm in predicting isoform-level functional relationships. We showed that this network enables the users to reveal functional differences of the isoforms of the same gene, as illustrated by literature evidence with Anxa6 (annexin a6) as an example. We expect this work will become a useful resource for the mouse genetics community to understand gene functions. The network is publicly available at: http://guanlab.ccmb.med.umich.edu/isoformnetwork.
Collapse
|
34
|
Cao R, Cheng J. Integrated protein function prediction by mining function associations, sequences, and protein-protein and gene-gene interaction networks. Methods 2016; 93:84-91. [PMID: 26370280 PMCID: PMC4894840 DOI: 10.1016/j.ymeth.2015.09.011] [Citation(s) in RCA: 66] [Impact Index Per Article: 7.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/31/2015] [Revised: 09/03/2015] [Accepted: 09/10/2015] [Indexed: 11/30/2022] Open
Abstract
MOTIVATIONS Protein function prediction is an important and challenging problem in bioinformatics and computational biology. Functionally relevant biological information such as protein sequences, gene expression, and protein-protein interactions has been used mostly separately for protein function prediction. One of the major challenges is how to effectively integrate multiple sources of both traditional and new information such as spatial gene-gene interaction networks generated from chromosomal conformation data together to improve protein function prediction. RESULTS In this work, we developed three different probabilistic scores (MIS, SEQ, and NET score) to combine protein sequence, function associations, and protein-protein interaction and spatial gene-gene interaction networks for protein function prediction. The MIS score is mainly generated from homologous proteins found by PSI-BLAST search, and also association rules between Gene Ontology terms, which are learned by mining the Swiss-Prot database. The SEQ score is generated from protein sequences. The NET score is generated from protein-protein interaction and spatial gene-gene interaction networks. These three scores were combined in a new Statistical Multiple Integrative Scoring System (SMISS) to predict protein function. We tested SMISS on the data set of 2011 Critical Assessment of Function Annotation (CAFA). The method performed substantially better than three base-line methods and an advanced method based on protein profile-sequence comparison, profile-profile comparison, and domain co-occurrence networks according to the maximum F-measure.
Collapse
Affiliation(s)
- Renzhi Cao
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA
| | - Jianlin Cheng
- Computer Science Department, Informatics Institute, University of Missouri, Columbia, MO 65211, USA.
| |
Collapse
|
35
|
Li HD, Omenn GS, Guan Y. A proteogenomic approach to understand splice isoform functions through sequence and expression-based computational modeling. Brief Bioinform 2016; 17:1024-1031. [PMID: 26740460 DOI: 10.1093/bib/bbv109] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/27/2015] [Revised: 11/03/2015] [Indexed: 01/23/2023] Open
Abstract
The products of multi-exon genes are a mixture of alternatively spliced isoforms, from which the translated proteins can have similar, different or even opposing functions. It is therefore essential to differentiate and annotate functions for individual isoforms. Computational approaches provide an efficient complement to expensive and time-consuming experimental studies. The input data of these methods range from DNA sequence, to RNA selection pressure, to expressed sequence tags, to full-length complementary DNA, to exon array, to RNA-seq expression, to proteomic data. Notably, RNA-seq technology generates quantitative profiling of transcript expression at the genome scale, with an unprecedented amount of expression data available for developing isoform function prediction methods. Integrative analysis of these data at different molecular levels enables a proteogenomic approach to systematically interrogate isoform functions. Here, we briefly review the state-of-the-art methods according to their input data sources, discuss their advantages and limitations and point out potential ways to improve prediction accuracies.
Collapse
|
36
|
Gorenshteyn D, Zaslavsky E, Fribourg M, Park CY, Wong AK, Tadych A, Hartmann BM, Albrecht RA, García-Sastre A, Kleinstein SH, Troyanskaya OG, Sealfon SC. Interactive Big Data Resource to Elucidate Human Immune Pathways and Diseases. Immunity 2015; 43:605-14. [PMID: 26362267 DOI: 10.1016/j.immuni.2015.08.014] [Citation(s) in RCA: 46] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2014] [Revised: 04/24/2015] [Accepted: 06/25/2015] [Indexed: 12/21/2022]
Abstract
Many functionally important interactions between genes and proteins involved in immunological diseases and processes are unknown. The exponential growth in public high-throughput data offers an opportunity to expand this knowledge. To unlock human-immunology-relevant insight contained in the global biomedical research effort, including all public high-throughput datasets, we performed immunological-pathway-focused Bayesian integration of a comprehensive, heterogeneous compendium comprising 38,088 genome-scale experiments. The distillation of this knowledge into immunological networks of functional relationships between molecular entities (ImmuNet), and tools to mine this resource, are accessible to the public at http://immunet.princeton.edu. The predictive capacity of ImmuNet, established by rigorous statistical validation, is easily accessed by experimentalists to generate data-driven hypotheses. We demonstrate the power of this approach through the identification of unique host-virus interaction responses, and we show how ImmuNet complements genetic studies by predicting disease-associated genes. ImmuNet should be widely beneficial for investigating the mechanisms of the human immune system and immunological diseases.
Collapse
Affiliation(s)
- Dmitriy Gorenshteyn
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Elena Zaslavsky
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Miguel Fribourg
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Christopher Y Park
- New York Genome Center, 101 Avenue of the Americas, New York, NY 10013, USA
| | - Aaron K Wong
- Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA
| | - Alicja Tadych
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Boris M Hartmann
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Randy A Albrecht
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Adolfo García-Sastre
- Department of Microbiology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Global Health and Emerging Pathogens Institute, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA; Department of Medicine, Division of Infectious Diseases, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Steven H Kleinstein
- Departments of Pathology and Immunobiology, Yale School of Medicine, New Haven, CT 06520, USA; Interdepartmental Program in Computational Biology and Bioinformatics, Yale University, New Haven, CT 06511, USA
| | - Olga G Troyanskaya
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA; Simons Center for Data Analysis, Simons Foundation, New York, NY 10010, USA; Department of Computer Science, Princeton University, Princeton, NJ 08540, USA.
| | - Stuart C Sealfon
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA.
| |
Collapse
|
37
|
Zhu F, Panwar B, Guan Y. Algorithms for modeling global and context-specific functional relationship networks. Brief Bioinform 2015; 17:686-95. [PMID: 26254431 DOI: 10.1093/bib/bbv065] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/26/2015] [Indexed: 02/07/2023] Open
Abstract
Functional genomics has enormous potential to facilitate our understanding of normal and disease-specific physiology. In the past decade, intensive research efforts have been focused on modeling functional relationship networks, which summarize the probability of gene co-functionality relationships. Such modeling can be based on either expression data only or heterogeneous data integration. Numerous methods have been deployed to infer the functional relationship networks, while most of them target the global (non-context-specific) functional relationship networks. However, it is expected that functional relationships consistently reprogram under different tissues or biological processes. Thus, advanced methods have been developed targeting tissue-specific or developmental stage-specific networks. This article brings together the state-of-the-art functional relationship network modeling methods, emphasizes the need for heterogeneous genomic data integration and context-specific network modeling and outlines future directions for functional relationship networks.
Collapse
|
38
|
Dong X, Yambartsev A, Ramsey SA, Thomas LD, Shulzhenko N, Morgun A. Reverse enGENEering of Regulatory Networks from Big Data: A Roadmap for Biologists. Bioinform Biol Insights 2015; 9:61-74. [PMID: 25983554 PMCID: PMC4415676 DOI: 10.4137/bbi.s12467] [Citation(s) in RCA: 35] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2014] [Revised: 02/16/2015] [Accepted: 02/17/2015] [Indexed: 12/29/2022] Open
Abstract
Omics technologies enable unbiased investigation of biological systems through massively parallel sequence acquisition or molecular measurements, bringing the life sciences into the era of Big Data. A central challenge posed by such omics datasets is how to transform these data into biological knowledge, for example, how to use these data to answer questions such as: Which functional pathways are involved in cell differentiation? Which genes should we target to stop cancer? Network analysis is a powerful and general approach to solve this problem consisting of two fundamental stages, network reconstruction, and network interrogation. Here we provide an overview of network analysis including a step-by-step guide on how to perform and use this approach to investigate a biological question. In this guide, we also include the software packages that we and others employ for each of the steps of a network analysis workflow.
Collapse
Affiliation(s)
- Xiaoxi Dong
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| | - Anatoly Yambartsev
- Department of Statistics, Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Stephen A Ramsey
- School of Electrical Engineering and Computer Science, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA. ; College of Veterinary Medicine, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| | - Lina D Thomas
- Department of Statistics, Institute of Mathematics and Statistics, University of Sao Paulo, Sao Paulo, SP, Brazil
| | - Natalia Shulzhenko
- College of Veterinary Medicine, Department of Biomedical Sciences, Oregon State University, Corvallis, OR, USA
| | - Andrey Morgun
- College of Pharmacy, Oregon State University, Corvallis, OR, USA
| |
Collapse
|
39
|
Pelle KG, Oh K, Buchholz K, Narasimhan V, Joice R, Milner DA, Brancucci NM, Ma S, Voss TS, Ketman K, Seydel KB, Taylor TE, Barteneva NS, Huttenhower C, Marti M. Transcriptional profiling defines dynamics of parasite tissue sequestration during malaria infection. Genome Med 2015; 7:19. [PMID: 25722744 PMCID: PMC4342211 DOI: 10.1186/s13073-015-0133-7] [Citation(s) in RCA: 75] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/13/2014] [Accepted: 01/15/2015] [Indexed: 01/23/2023] Open
Abstract
BACKGROUND During intra-erythrocytic development, late asexually replicating Plasmodium falciparum parasites sequester from peripheral circulation. This facilitates chronic infection and is linked to severe disease and organ-specific pathology including cerebral and placental malaria. Immature gametocytes - sexual stage precursor cells - likewise disappear from circulation. Recent work has demonstrated that these sexual stage parasites are located in the hematopoietic system of the bone marrow before mature gametocytes are released into the bloodstream to facilitate mosquito transmission. However, as sequestration occurs only in vivo and not during in vitro culture, the mechanisms by which it is regulated and enacted (particularly by the gametocyte stage) remain poorly understood. RESULTS We generated the most comprehensive P. falciparum functional gene network to date by integrating global transcriptional data from a large set of asexual and sexual in vitro samples, patient-derived in vivo samples, and a new set of in vitro samples profiling sexual commitment. We defined more than 250 functional modules (clusters) of genes that are co-expressed primarily during the intra-erythrocytic parasite cycle, including 35 during sexual commitment and gametocyte development. Comparing the in vivo and in vitro datasets allowed us, for the first time, to map the time point of asexual parasite sequestration in patients to 22 hours post-invasion, confirming previous in vitro observations on the dynamics of host cell modification and cytoadherence. Moreover, we were able to define the properties of gametocyte sequestration, demonstrating the presence of two circulating gametocyte populations: gametocyte rings between 0 and approximately 30 hours post-invasion and mature gametocytes after around 7 days post-invasion. CONCLUSIONS This study provides a bioinformatics resource for the functional elucidation of parasite life cycle dynamics and specifically demonstrates the presence of the gametocyte ring stages in circulation, adding significantly to our understanding of the dynamics of gametocyte sequestration in vivo.
Collapse
Affiliation(s)
- Karell G Pelle
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Keunyoung Oh
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Kathrin Buchholz
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Vagheesh Narasimhan
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Regina Joice
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| | - Danny A Milner
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA ; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115 USA
| | - Nicolas Mb Brancucci
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA ; Swiss Tropical and Public Health Institute, 4051 Basel, Switzerland
| | - Siyuan Ma
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA
| | - Till S Voss
- Swiss Tropical and Public Health Institute, 4051 Basel, Switzerland
| | - Ken Ketman
- Program in Cellular and Molecular Medicine, Children's Hospital, Boston, MA 02115 USA
| | - Karl B Seydel
- College of Osteopathic Medicine, Michigan State University, East Lansing, MI 48825 USA ; Blantyre Malaria Project, University of Malawi College of Medicine, Blantyre, 3 Malawi
| | - Terrie E Taylor
- College of Osteopathic Medicine, Michigan State University, East Lansing, MI 48825 USA ; Blantyre Malaria Project, University of Malawi College of Medicine, Blantyre, 3 Malawi
| | - Natasha S Barteneva
- Program in Cellular and Molecular Medicine, Children's Hospital, Boston, MA 02115 USA ; Department of Pediatrics, Harvard Medical School, Boston, MA 02115 USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115 USA ; The Broad Institute of Harvard and MIT, Cambridge, MA 02142 USA
| | - Matthias Marti
- Department of Immunology and Infectious Diseases, Harvard School of Public Health, Boston, MA 02115 USA
| |
Collapse
|
40
|
Mills CL, Beuning PJ, Ondrechen MJ. Biochemical functional predictions for protein structures of unknown or uncertain function. Comput Struct Biotechnol J 2015; 13:182-91. [PMID: 25848497 PMCID: PMC4372640 DOI: 10.1016/j.csbj.2015.02.003] [Citation(s) in RCA: 62] [Impact Index Per Article: 6.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/03/2014] [Revised: 02/06/2015] [Accepted: 02/11/2015] [Indexed: 01/07/2023] Open
Abstract
With the exponential growth in the determination of protein sequences and structures via genome sequencing and structural genomics efforts, there is a growing need for reliable computational methods to determine the biochemical function of these proteins. This paper reviews the efforts to address the challenge of annotating the function at the molecular level of uncharacterized proteins. While sequence- and three-dimensional-structure-based methods for protein function prediction have been reviewed previously, the recent trends in local structure-based methods have received less attention. These local structure-based methods are the primary focus of this review. Computational methods have been developed to predict the residues important for catalysis and the local spatial arrangements of these residues can be used to identify protein function. In addition, the combination of different types of methods can help obtain more information and better predictions of function for proteins of unknown function. Global initiatives, including the Enzyme Function Initiative (EFI), COMputational BRidges to EXperiments (COMBREX), and the Critical Assessment of Function Annotation (CAFA), are evaluating and testing the different approaches to predicting the function of proteins of unknown function. These initiatives and global collaborations will increase the capability and reliability of methods to predict biochemical function computationally and will add substantial value to the current volume of structural genomics data by reducing the number of absent or inaccurate functional annotations.
Collapse
Affiliation(s)
- Caitlyn L Mills
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Penny J Beuning
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| | - Mary Jo Ondrechen
- Department of Chemistry and Chemical Biology, Northeastern University, Boston, MA 02115, United States
| |
Collapse
|
41
|
Lin D, Zhang J, Li J, He H, Deng HW, Wang YP. Integrative analysis of multiple diverse omics datasets by sparse group multitask regression. Front Cell Dev Biol 2014; 2:62. [PMID: 25364766 PMCID: PMC4209817 DOI: 10.3389/fcell.2014.00062] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2014] [Accepted: 10/01/2014] [Indexed: 01/10/2023] Open
Abstract
A variety of high throughput genome-wide assays enable the exploration of genetic risk factors underlying complex traits. Although these studies have remarkable impact on identifying susceptible biomarkers, they suffer from issues such as limited sample size and low reproducibility. Combining individual studies of different genetic levels/platforms has the promise to improve the power and consistency of biomarker identification. In this paper, we propose a novel integrative method, namely sparse group multitask regression, for integrating diverse omics datasets, platforms, and populations to identify risk genes/factors of complex diseases. This method combines multitask learning with sparse group regularization, which will: (1) treat the biomarker identification in each single study as a task and then combine them by multitask learning; (2) group variables from all studies for identifying significant genes; (3) enforce sparse constraint on groups of variables to overcome the "small sample, but large variables" problem. We introduce two sparse group penalties: sparse group lasso and sparse group ridge in our multitask model, and provide an effective algorithm for each model. In addition, we propose a significance test for the identification of potential risk genes. Two simulation studies are performed to evaluate the performance of our integrative method by comparing it with conventional meta-analysis method. The results show that our sparse group multitask method outperforms meta-analysis method significantly. In an application to our osteoporosis studies, 7 genes are identified as significant genes by our method and are found to have significant effects in other three independent studies for validation. The most significant gene SOD2 has been identified in our previous osteoporosis study involving the same expression dataset. Several other genes such as TREML2, HTR1E, and GLO1 are shown to be novel susceptible genes for osteoporosis, as confirmed from other studies.
Collapse
Affiliation(s)
- Dongdong Lin
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Jigang Zhang
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Jingyao Li
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA
| | - Hao He
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Hong-Wen Deng
- Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| | - Yu-Ping Wang
- Biomedical Engineering Department, Tulane University New Orleans, LA, USA ; Center for Bioinformatics and Genomics, Tulane University New Orleans, LA, USA ; Department of Biostatistics and Bioinformatics, Tulane University New Orleans, LA, USA
| |
Collapse
|
42
|
Rajasundaram D, Selbig J, Persson S, Klie S. Co-ordination and divergence of cell-specific transcription and translation of genes in arabidopsis root cells. ANNALS OF BOTANY 2014; 114:1109-23. [PMID: 25149544 PMCID: PMC4195562 DOI: 10.1093/aob/mcu151] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/05/2023]
Abstract
BACKGROUND AND AIMS A key challenge in biology is to systematically investigate and integrate the different levels of information available at the global and single-cell level. Recent studies have elucidated spatiotemporal expression patterns of root cell types in Arabidopsis thaliana, and genome-wide quantification of polysome-associated mRNA levels, i.e. the translatome, has also been obtained for corresponding cell types. Translational control has been increasingly recognized as an important regulatory step in protein synthesis. The aim of this study was to investigate coupled transcription and translation by use of publicly available root datasets. METHODS Using cell-type-specific datasets of the root transcriptome and translatome of arabidopsis, a systematic assessment was made of the degree of co-ordination and divergence between these two levels of cellular organization. The computational analysis considered correlation and variation of expression across cell types at both system levels, and also provided insights into the degree of co-regulatory relationships that are preserved between the two processes. KEY RESULTS The overall correlation of expression and translation levels of genes resemble an almost bimodal distribution (mean/median value of 0·08/0·12), with a second, less strongly pronounced 'mode' for negative Pearson's correlation coefficient values. The analysis conducted also confirms that previously identified key transcriptional activators of secondary cell wall development display highly conserved patterns of transcription and translation across the investigated cell types. Moreover, the biological processes that display conserved and divergent patterns based on the cell-type-specific expression and translation levels were identified. CONCLUSIONS In agreement with previous studies in animal cells, a large degree of uncoupling was found between the transcriptome and translatome. However, components and processes were also identified that are under co-ordinated transcriptional and translational control in plant root cells.
Collapse
Affiliation(s)
- Dhivyaa Rajasundaram
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Joachim Selbig
- Institute of Biochemistry and Biology, University of Potsdam, Potsdam-Golm, 14476, Germany Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany
| | - Staffan Persson
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany ARC Centre of Excellence in Plant Cell Walls, School of Botany, University of Melbourne, Parkville, VIC 3010, Australia
| | - Sebastian Klie
- Max-Planck Institute of Molecular Plant Physiology, Potsdam-Golm, 14476, Germany Targenomix GmbH, Potsdam-Golm, 14476, Germany
| |
Collapse
|
43
|
Zhu F, Shi L, Li H, Eksi R, Engel JD, Guan Y. Modeling dynamic functional relationship networks and application to ex vivo human erythroid differentiation. ACTA ACUST UNITED AC 2014; 30:3325-33. [PMID: 25115705 DOI: 10.1093/bioinformatics/btu542] [Citation(s) in RCA: 9] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Abstract
MOTIVATION Functional relationship networks, which summarize the probability of co-functionality between any two genes in the genome, could complement the reductionist focus of modern biology for understanding diverse biological processes in an organism. One major limitation of the current networks is that they are static, while one might expect functional relationships to consistently reprogram during the differentiation of a cell lineage. To address this potential limitation, we developed a novel algorithm that leverages both differentiation stage-specific expression data and large-scale heterogeneous functional genomic data to model such dynamic changes. We then applied this algorithm to the time-course RNA-Seq data we collected for ex vivo human erythroid cell differentiation. RESULTS Through computational cross-validation and literature validation, we show that the resulting networks correctly predict the (de)-activated functional connections between genes during erythropoiesis. We identified known critical genes, such as HBD and GATA1, and functional connections during erythropoiesis using these dynamic networks, while the traditional static network was not able to provide such information. Furthermore, by comparing the static and the dynamic networks, we identified novel genes (such as OSBP2 and PDZK1IP1) that are potential drivers of erythroid cell differentiation. This novel method of modeling dynamic networks is applicable to other differentiation processes where time-course genome-scale expression data are available, and should assist in generating greater understanding of the functional dynamics at play across the genome during development. AVAILABILITY AND IMPLEMENTATION The network described in this article is available at http://guanlab.ccmb.med.umich.edu/stageSpecificNetwork.
Collapse
Affiliation(s)
- Fan Zhu
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Lihong Shi
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Hongdong Li
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Ridvan Eksi
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - James Douglas Engel
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| | - Yuanfang Guan
- Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA Department of Computational Medicine and Bioinformatics, Department of Cell and Developmental Biology, Department of Internal Medicine and Department of Computer Science and Engineering, University of Michigan, MI48109, USA
| |
Collapse
|
44
|
Discovering functional modules across diverse maize transcriptomes using COB, the Co-expression Browser. PLoS One 2014; 9:e99193. [PMID: 24922320 PMCID: PMC4055606 DOI: 10.1371/journal.pone.0099193] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/10/2014] [Accepted: 05/12/2014] [Indexed: 01/13/2023] Open
Abstract
Tools that provide improved ability to relate genotype to phenotype have the potential to accelerate breeding for desired traits and to improve our understanding of the molecular variants that underlie phenotypes. The availability of large-scale gene expression profiles in maize provides an opportunity to advance our understanding of complex traits in this agronomically important species. We built co-expression networks based on genome-wide expression data from a variety of maize accessions as well as an atlas of different tissues and developmental stages. We demonstrate that these networks reveal clusters of genes that are enriched for known biological function and contain extensive structure which has yet to be characterized. Furthermore, we found that co-expression networks derived from developmental or tissue atlases as compared to expression variation across diverse accessions capture unique functions. To provide convenient access to these networks, we developed a public, web-based Co-expression Browser (COB), which enables interactive queries of the genome-wide networks. We illustrate the utility of this system through two specific use cases: one in which gene-centric queries are used to provide functional context for previously characterized metabolic pathways, and a second where lists of genes produced by mapping studies are further resolved and validated using co-expression networks.
Collapse
|
45
|
Tsiliki G, Vlachakis D, Kossida S. On integrating multi-experiment microarray data. PHILOSOPHICAL TRANSACTIONS. SERIES A, MATHEMATICAL, PHYSICAL, AND ENGINEERING SCIENCES 2014; 372:20130136. [PMID: 24751870 PMCID: PMC3996576 DOI: 10.1098/rsta.2013.0136] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/03/2023]
Abstract
With the extensive use of microarray technology as a potential prognostic and diagnostic tool, the comparison and reproducibility of results obtained from the use of different platforms is of interest. The integration of those datasets can yield more informative results corresponding to numerous datasets and microarray platforms. We developed a novel integration technique for microarray gene-expression data derived by different studies for the purpose of a two-way Bayesian partition modelling which estimates co-expression profiles under subsets of genes and between biological samples or experimental conditions. The suggested methodology transforms disparate gene-expression data on a common probability scale to obtain inter-study-validated gene signatures. We evaluated the performance of our model using artificial data. Finally, we applied our model to six publicly available cancer gene-expression datasets and compared our results with well-known integrative microarray data methods. Our study shows that the suggested framework can relieve the limited sample size problem while reporting high accuracies by integrating multi-experiment data.
Collapse
Affiliation(s)
| | | | - Sophia Kossida
- Bioinformatics and Medical Informatics Group, Biomedical Research Foundation, Academy of Athens, 4 Soranou Ephessiou 115 27, Greece
| |
Collapse
|
46
|
Computational prediction of protein function based on weighted mapping of domains and GO terms. BIOMED RESEARCH INTERNATIONAL 2014; 2014:641469. [PMID: 24868539 PMCID: PMC4017789 DOI: 10.1155/2014/641469] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/21/2013] [Accepted: 03/12/2014] [Indexed: 11/17/2022]
Abstract
In this paper, we propose a novel method, SeekFun, to predict protein function based on weighted mapping of domains and GO terms. Firstly, a weighted mapping of domains and GO terms is constructed according to GO annotations and domain composition of the proteins. The association strength between domain and GO term is weighted by symmetrical conditional probability. Secondly, the mapping is extended along the true paths of the terms based on GO hierarchy. Finally, the terms associated with resident domains are transferred to host protein and real annotations of the host protein are determined by association strengths. Our careful comparisons demonstrate that SeekFun outperforms the concerned methods on most occasions. SeekFun provides a flexible and effective way for protein function prediction. It benefits from the well-constructed mapping of domains and GO terms, as well as the reasonable strategy for inferring annotations of protein from those of its domains.
Collapse
|
47
|
Fuxman Bass JI, Diallo A, Nelson J, Soto JM, Myers CL, Walhout AJM. Using networks to measure similarity between genes: association index selection. Nat Methods 2013; 10:1169-76. [PMID: 24296474 PMCID: PMC3959882 DOI: 10.1038/nmeth.2728] [Citation(s) in RCA: 164] [Impact Index Per Article: 13.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/14/2013] [Accepted: 07/22/2013] [Indexed: 02/08/2023]
Abstract
Biological networks can be used to functionally annotate genes on the basis of interaction-profile similarities. Metrics known as association indices can be used to quantify interaction-profile similarity. We provide an overview of commonly used association indices, including the Jaccard index and the Pearson correlation coefficient, and compare their performance in different types of analyses of biological networks. We introduce the Guide for Association Index for Networks (GAIN), a web tool for calculating and comparing interaction-profile similarities and defining modules of genes with similar profiles.
Collapse
Affiliation(s)
- Juan I Fuxman Bass
- 1] Program in Systems Biology, University of Massachusetts Medical School, Worcester, Massachusetts, USA. [2] Program in Molecular Medicine, University of Massachusetts Medical School, Worcester, Massachusetts, USA
| | | | | | | | | | | |
Collapse
|
48
|
Sekhon RS, Briskine R, Hirsch CN, Myers CL, Springer NM, Buell CR, de Leon N, Kaeppler SM. Maize gene atlas developed by RNA sequencing and comparative evaluation of transcriptomes based on RNA sequencing and microarrays. PLoS One 2013; 8:e61005. [PMID: 23637782 PMCID: PMC3634062 DOI: 10.1371/journal.pone.0061005] [Citation(s) in RCA: 133] [Impact Index Per Article: 11.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2012] [Accepted: 03/05/2013] [Indexed: 01/17/2023] Open
Abstract
Transcriptome analysis is a valuable tool for identification and characterization of genes and pathways underlying plant growth and development. We previously published a microarray-based maize gene atlas from the analysis of 60 unique spatially and temporally separated tissues from 11 maize organs [1]. To enhance the coverage and resolution of the maize gene atlas, we have analyzed 18 selected tissues representing five organs using RNA sequencing (RNA-Seq). For a direct comparison of the two methodologies, the same RNA samples originally used for our microarray-based atlas were evaluated using RNA-Seq. Both technologies produced similar transcriptome profiles as evident from high Pearson's correlation statistics ranging from 0.70 to 0.83, and from nearly identical clustering of the tissues. RNA-Seq provided enhanced coverage of the transcriptome, with 82.1% of the filtered maize genes detected as expressed in at least one tissue by RNA-Seq compared to only 56.5% detected by microarrays. Further, from the set of 465 maize genes that have been historically well characterized by mutant analysis, 427 show significant expression in at least one tissue by RNA-Seq compared to 390 by microarray analysis. RNA-Seq provided higher resolution for identifying tissue-specific expression as well as for distinguishing the expression profiles of closely related paralogs as compared to microarray-derived profiles. Co-expression analysis derived from the microarray and RNA-Seq data revealed that broadly similar networks result from both platforms, and that co-expression estimates are stable even when constructed from mixed data including both RNA-Seq and microarray expression data. The RNA-Seq information provides a useful complement to the microarray-based maize gene atlas and helps to further understand the dynamics of transcription during maize development.
Collapse
Affiliation(s)
- Rajandeep S. Sekhon
- Department of Agronomy, University of Wisconsin, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Roman Briskine
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Candice N. Hirsch
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
| | - Chad L. Myers
- Department of Computer Science and Engineering, University of Minnesota, Minneapolis, Minnesota, United States of America
| | - Nathan M. Springer
- Microbial and Plant Genomics Institute, Department of Plant Biology, University of Minnesota, Saint Paul, Minnesota, United States of America
| | - C. Robin Buell
- Department of Plant Biology, Michigan State University, East Lansing, Michigan, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, Michigan State University, East Lansing, Michigan, United States of America
| | - Natalia de Leon
- Department of Agronomy, University of Wisconsin, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, Wisconsin, United States of America
| | - Shawn M. Kaeppler
- Department of Agronomy, University of Wisconsin, Madison, Wisconsin, United States of America
- Department of Energy Great Lakes Bioenergy Research Center, University of Wisconsin, Madison, Wisconsin, United States of America
- * E-mail:
| |
Collapse
|
49
|
Dowell KG, Simons AK, Wang ZZ, Yun K, Hibbs MA. Cell-type-specific predictive network yields novel insights into mouse embryonic stem cell self-renewal and cell fate. PLoS One 2013; 8:e56810. [PMID: 23468881 PMCID: PMC3585227 DOI: 10.1371/journal.pone.0056810] [Citation(s) in RCA: 10] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2012] [Accepted: 01/14/2013] [Indexed: 01/25/2023] Open
Abstract
Self-renewal, the ability of a stem cell to divide repeatedly while maintaining an undifferentiated state, is a defining characteristic of all stem cells. Here, we clarify the molecular foundations of mouse embryonic stem cell (mESC) self-renewal by applying a proven Bayesian network machine learning approach to integrate high-throughput data for protein function discovery. By focusing on a single stem-cell system, at a specific developmental stage, within the context of well-defined biological processes known to be active in that cell type, we produce a consensus predictive network that reflects biological reality more closely than those made by prior efforts using more generalized, context-independent methods. In addition, we show how machine learning efforts may be misled if the tissue specific role of mammalian proteins is not defined in the training set and circumscribed in the evidential data. For this study, we assembled an extensive compendium of mESC data: ∼2.2 million data points, collected from 60 different studies, under 992 conditions. We then integrated these data into a consensus mESC functional relationship network focused on biological processes associated with embryonic stem cell self-renewal and cell fate determination. Computational evaluations, literature validation, and analyses of predicted functional linkages show that our results are highly accurate and biologically relevant. Our mESC network predicts many novel players involved in self-renewal and serves as the foundation for future pluripotent stem cell studies. This network can be used by stem cell researchers (at http://StemSight.org) to explore hypotheses about gene function in the context of self-renewal and to prioritize genes of interest for experimental validation.
Collapse
Affiliation(s)
- Karen G. Dowell
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Allen K. Simons
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
| | - Zack Z. Wang
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Johns Hopkins University, Baltimore, Maryland, United States of America
| | - Kyuson Yun
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
| | - Matthew A. Hibbs
- The Jackson Laboratory, Bar Harbor, Maine, United States of America
- Graduate School of Biomedical Sciences and Engineering, University of Maine, Orono, Maine, United States of America
- Trinity University, Department of Computer Science, San Antonio, Texas, United States of America
- * E-mail:
| |
Collapse
|
50
|
Radivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, et alRadivojac P, Clark WT, Oron TR, Schnoes AM, Wittkop T, Sokolov A, Graim K, Funk C, Verspoor K, Ben-Hur A, Pandey G, Yunes JM, Talwalkar AS, Repo S, Souza ML, Piovesan D, Casadio R, Wang Z, Cheng J, Fang H, Gough J, Koskinen P, Törönen P, Nokso-Koivisto J, Holm L, Cozzetto D, Buchan DWA, Bryson K, Jones DT, Limaye B, Inamdar H, Datta A, Manjari SK, Joshi R, Chitale M, Kihara D, Lisewski AM, Erdin S, Venner E, Lichtarge O, Rentzsch R, Yang H, Romero AE, Bhat P, Paccanaro A, Hamp T, Kaßner R, Seemayer S, Vicedo E, Schaefer C, Achten D, Auer F, Boehm A, Braun T, Hecht M, Heron M, Hönigschmid P, Hopf TA, Kaufmann S, Kiening M, Krompass D, Landerer C, Mahlich Y, Roos M, Björne J, Salakoski T, Wong A, Shatkay H, Gatzmann F, Sommer I, Wass MN, Sternberg MJE, Škunca N, Supek F, Bošnjak M, Panov P, Džeroski S, Šmuc T, Kourmpetis YAI, van Dijk ADJ, ter Braak CJF, Zhou Y, Gong Q, Dong X, Tian W, Falda M, Fontana P, Lavezzo E, Di Camillo B, Toppo S, Lan L, Djuric N, Guo Y, Vucetic S, Bairoch A, Linial M, Babbitt PC, Brenner SE, Orengo C, Rost B, Mooney SD, Friedberg I. A large-scale evaluation of computational protein function prediction. Nat Methods 2013; 10:221-7. [PMID: 23353650 PMCID: PMC3584181 DOI: 10.1038/nmeth.2340] [Show More Authors] [Citation(s) in RCA: 621] [Impact Index Per Article: 51.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/02/2012] [Accepted: 12/10/2012] [Indexed: 01/03/2023]
Abstract
A report on the results of the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Automated annotation of protein function is challenging. As the number of sequenced genomes rapidly grows, the overwhelming majority of protein products can only be annotated computationally. If computational predictions are to be relied upon, it is crucial that the accuracy of these methods be high. Here we report the results from the first large-scale community-based critical assessment of protein function annotation (CAFA) experiment. Fifty-four methods representing the state of the art for protein function prediction were evaluated on a target set of 866 proteins from 11 organisms. Two findings stand out: (i) today's best protein function prediction algorithms substantially outperform widely used first-generation methods, with large gains on all types of targets; and (ii) although the top methods perform well enough to guide experiments, there is considerable need for improvement of currently available tools.
Collapse
Affiliation(s)
- Predrag Radivojac
- School of Informatics and Computing, Indiana University, Bloomington, Indiana, USA
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|