1
|
Sarafinovska S, Koester SK, Fang LZ, Thorpe JW, Chaturvedi SM, Ji J, Jones EF, Selmanovic D, Kornbluth DJ, Barrett MR, Rurak GM, Maloney SE, Creed MC, Mitra RD, Dougherty JD. Single-Cell Resolution of Individual Variation in Hypothalamic Neurons Allows Targeted Manipulation Affecting Social Motivation. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2025:2025.03.10.642464. [PMID: 40161710 PMCID: PMC11952468 DOI: 10.1101/2025.03.10.642464] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Subscribe] [Scholar Register] [Indexed: 04/02/2025]
Abstract
Despite decades of research, connecting molecular and cellular phenotypes to complex behavioral traits remains an elusive goal1. Social motivation exhibits individual trait variation2, which we hypothesize is mediated by molecular and cellular variability across hypothalamic neurons. To test this, we generated single-nucleus RNA-sequencing profiles3,4 of >120,000 neurons from tuberal hypothalamus and adjacent thalamus in 36 mice, balanced across sex and autism-associated mutation5, with all mice assessed for social motivation2. First, we show that molecular activation patterns predict behavior across individuals: specifically, activation of paraventricular Agtr1a+ (angiotensin receptor 1a) neurons predicted reduced social behavior. Subsequent inhibition of AGTR1A with telmisartan-an FDA-approved antihypertensive6-improved social orienting. Second, we show natural variation in neuronal proportions-likely arising from stochastic developmental events7-is sufficient to shape adult behavior even among genetically-identical individuals: we identified multiple neuronal populations whose relative abundance predicted social reward-seeking behavior. Chemogenetic inhibition of one such population, Nxph4+ neurons of the postero-lateral hypothalamus8, suppressed multiple aspects of social motivation. This work establishes proof-of-principle for an approach where single-cell genomics precisely maps neural substrates governing behavior. This approach revealed that stochastic variations in neuronal architecture deterministically influence social motivation, and enabled identification of therapeutically-actionable targets with immediate translational potential for disorders with social deficits.
Collapse
Affiliation(s)
- S Sarafinovska
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - S K Koester
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - L Z Fang
- Washington University Pain Center, Department of Anesthesiology, St. Louis, MO, USA
| | - J W Thorpe
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - S M Chaturvedi
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - J Ji
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - E F Jones
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - D Selmanovic
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - D J Kornbluth
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - M R Barrett
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St. Louis, MO, 63110-1093, USA
| | - G M Rurak
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
| | - S E Maloney
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St. Louis, MO, 63110-1093, USA
| | - M C Creed
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
- Washington University Pain Center, Department of Anesthesiology, St. Louis, MO, USA
| | - R D Mitra
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Edison Family Center for Genome Sciences and Systems Biology, Washington University School of Medicine, Saint Louis, MO, USA
| | - J D Dougherty
- Department of Genetics, Washington University School of Medicine, Saint Louis, MO, USA
- Department of Psychiatry, Washington University School of Medicine, Saint Louis, MO, USA
- Intellectual and Developmental Disabilities Research Center, Washington University School of Medicine, St. Louis, MO, 63110-1093, USA
| |
Collapse
|
2
|
Di Persia L, Lopez T, Arce A, Milone DH, Stegmayer G. exp2GO: Improving Prediction of Functions in the Gene Ontology With Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:999-1008. [PMID: 35417352 DOI: 10.1109/tcbb.2022.3167245] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The computational methods for the prediction of gene function annotations aim to automatically find associations between a gene and a set of Gene Ontology (GO) terms describing its functions. Since the hand-made curation process of novel annotations and the corresponding wet experiments validations are very time-consuming and costly procedures, there is a need for computational tools that can reliably predict likely annotations and boost the discovery of new gene functions. This work proposes a novel method for predicting annotations based on the inference of GO similarities from expression similarities. The novel method was benchmarked against other methods on several public biological datasets, obtaining the best comparative results. exp2GO effectively improved the prediction of GO annotations in comparison to state-of-the-art methods. Furthermore, the proposal was validated with a full genome case where it was capable of predicting relevant and accurate biological functions. The repository of this project withh full data and code is available at https://github.com/sinc-lab/exp2GO.
Collapse
|
3
|
Zhang Y, Duan L, Zheng H, Li-Ling J, Qin R, Chen Z, He C, Wang T. Mining Similar Aspects for Gene Similarity Explanation Based on Gene Information Network. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:1734-1746. [PMID: 33259307 DOI: 10.1109/tcbb.2020.3041559] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Analysis of gene similarity not only can provide information on the understanding of the biological roles and functions of a gene, but may also reveal the relationships among various genes. In this paper, we introduce a novel idea of mining similar aspects from a gene information network, i.e., for a given gene pair, we want to know in which aspects (meta paths) they are most similar from the perspective of the gene information network. We defined a similarity metric based on the set of meta paths connecting the query genes in the gene information network and used the rank of similarity of a gene pair in a meta path set to measure the similarity significance in that aspect. A minimal set of gene meta paths where the query gene pair ranks the highest is a similar aspect, and the similar aspect of a query gene pair is far from trivial. We proposed a novel method, SCENARIO, to investigate minimal similar aspects. Our empirical study on the gene information network, constructed from six public gene-related databases, verified that our proposed method is effective, efficient, and useful.
Collapse
|
4
|
Zhao Y, Wang J, Guo M, Zhang X, Yu G. Cross-Species Protein Function Prediction with Asynchronous-Random Walk. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1439-1450. [PMID: 31562099 DOI: 10.1109/tcbb.2019.2943342] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein function prediction is a fundamental task in the post-genomic era. Available functional annotations of proteins are incomplete and the annotations of two homologous species are complementary to each other. However, how to effectively leverage mutually complementary annotations of different species to further boost the prediction performance is still not well studied. In this paper, we propose a cross-species protein function prediction approach by performing Asynchronous Random Walk on a heterogeneous network (AsyRW). AsyRW first constructs a heterogeneous network to integrate multiple functional association networks derived from different biological data, established homology-relationships between proteins from different species, known annotations of proteins and Gene Ontology (GO). To account for the intrinsic structures of intra- and inter-species of proteins and that of GO, AsyRW quantifies the individual walk lengths of each network node using the gravity-like theory, and then performs asynchronous-random walk with the individual length to predict associations between proteins and GO terms. Experiments on annotations archived in different years show that individual walk length and asynchronous-random walk can effectively leverage the complementary annotations of different species, AsyRW has a significantly improved performance to other related and competitive methods. The codes of AsyRW are available at: http://mlda.swu.edu.cn/codes.php?name=AsyRW.
Collapse
|
5
|
Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020; 11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open
Abstract
Annotating the functional properties of gene products, i.e., RNAs and proteins, is a fundamental task in biology. The Gene Ontology database (GO) was developed to systematically describe the functional properties of gene products across species, and to facilitate the computational prediction of gene function. As GO is routinely updated, it serves as the gold standard and main knowledge source in functional genomics. Many gene function prediction methods making use of GO have been proposed. But no literature review has summarized these methods and the possibilities for future efforts from the perspective of GO. To bridge this gap, we review the existing methods with an emphasis on recent solutions. First, we introduce the conventions of GO and the widely adopted evaluation metrics for gene function prediction. Next, we summarize current methods of gene function prediction that apply GO in different ways, such as using hierarchical or flat inter-relationships between GO terms, compressing massive GO terms and quantifying semantic similarities. Although many efforts have improved performance by harnessing GO, we conclude that there remain many largely overlooked but important topics for future research.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Jian Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, China Agricultural University, Beijing, China
| | - Xiangliang Zhang
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
- CBRC, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
| |
Collapse
|
6
|
Yu G, Wang K, Fu G, Guo M, Wang J. NMFGO: Gene Function Prediction via Nonnegative Matrix Factorization with Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:238-249. [PMID: 30059316 DOI: 10.1109/tcbb.2018.2861379] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
Gene Ontology (GO) is a controlled vocabulary of terms that describe molecule function, biological roles, and cellular locations of gene products (i.e., proteins and RNAs), it hierarchically organizes more than 43,000 GO terms via the direct acyclic graph. A gene is generally annotated with several of these GO terms. Therefore, accurately predicting the association between genes and massive terms is a difficult challenge. To combat with this challenge, we propose an matrix factorization based approach called NMFGO. NMFGO stores the available GO annotations of genes in a gene-term association matrix and adopts an ontological structure based taxonomic similarity measure to capture the GO hierarchy. Next, it factorizes the association matrix into two low-rank matrices via nonnegative matrix factorization regularized with the GO hierarchy. After that, it employs a semantic similarity based k nearest neighbor classifier in the low-rank matrices approximated subspace to predict gene functions. Empirical study on three model species (S. cerevisiae, H. sapiens, and A. thaliana) shows that NMFGO is robust to the input parameters and achieves significantly better prediction performance than GIC, TO, dRW- kNN, and NtN, which were re-implemented based on the instructions of the original papers. The supplementary file and demo codes of NMFGO are available at http://mlda.swu.edu.cn/codes.php?name=NMFGO.
Collapse
|
7
|
Wang D, Li J, Liu R, Wang Y. Optimizing gene set annotations combining GO structure and gene expression data. BMC SYSTEMS BIOLOGY 2018; 12:133. [PMID: 30598093 PMCID: PMC6311910 DOI: 10.1186/s12918-018-0659-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
Collapse
Affiliation(s)
- Dong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Rui Liu
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| |
Collapse
|
8
|
Hadarovich A, Anishchenko I, Tuzikov AV, Kundrotas PJ, Vakser IA. Gene ontology improves template selection in comparative protein docking. Proteins 2018; 87:245-253. [PMID: 30520123 DOI: 10.1002/prot.25645] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2018] [Revised: 10/21/2018] [Accepted: 11/29/2018] [Indexed: 02/06/2023]
Abstract
Structural characterization of protein-protein interactions is essential for our ability to study life processes at the molecular level. Computational modeling of protein complexes (protein docking) is important as the source of their structure and as a way to understand the principles of protein interaction. Rapidly evolving comparative docking approaches utilize target/template similarity metrics, which are often based on the protein structure. Although the structural similarity, generally, yields good performance, other characteristics of the interacting proteins (eg, function, biological process, and localization) may improve the prediction quality, especially in the case of weak target/template structural similarity. For the ranking of a pool of models for each target, we tested scoring functions that quantify similarity of Gene Ontology (GO) terms assigned to target and template proteins in three ontology domains-biological process, molecular function, and cellular component (GO-score). The scoring functions were tested in docking of bound, unbound, and modeled proteins. The results indicate that the combined structural and GO-terms functions improve the scoring, especially in the twilight zone of structural similarity, typical for protein models of limited accuracy.
Collapse
Affiliation(s)
- Anna Hadarovich
- Computational Biology Program, The University of Kansas, Lawrence, Kansas.,United Institute of Informatics Problems, National Academy of Sciences, Minsk, Belarus
| | - Ivan Anishchenko
- Computational Biology Program, The University of Kansas, Lawrence, Kansas
| | - Alexander V Tuzikov
- United Institute of Informatics Problems, National Academy of Sciences, Minsk, Belarus
| | - Petras J Kundrotas
- Computational Biology Program, The University of Kansas, Lawrence, Kansas
| | - Ilya A Vakser
- Computational Biology Program, The University of Kansas, Lawrence, Kansas.,Department of Molecular Biosciences, The University of Kansas, Kansas, Lawrence
| |
Collapse
|
9
|
Zhao Y, Fu G, Wang J, Guo M, Yu G. Gene function prediction based on Gene Ontology Hierarchy Preserving Hashing. Genomics 2018; 111:334-342. [PMID: 29477548 DOI: 10.1016/j.ygeno.2018.02.008] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/20/2017] [Revised: 02/02/2018] [Accepted: 02/16/2018] [Indexed: 12/27/2022]
Abstract
Gene Ontology (GO) uses structured vocabularies (or terms) to describe the molecular functions, biological roles, and cellular locations of gene products in a hierarchical ontology. GO annotations associate genes with GO terms and indicate the given gene products carrying out the biological functions described by the relevant terms. However, predicting correct GO annotations for genes from a massive set of GO terms as defined by GO is a difficult challenge. To combat with this challenge, we introduce a Gene Ontology Hierarchy Preserving Hashing (HPHash) based semantic method for gene function prediction. HPHash firstly measures the taxonomic similarity between GO terms. It then uses a hierarchy preserving hashing technique to keep the hierarchical order between GO terms, and to optimize a series of hashing functions to encode massive GO terms via compact binary codes. After that, HPHash utilizes these hashing functions to project the gene-term association matrix into a low-dimensional one and performs semantic similarity based gene function prediction in the low-dimensional space. Experimental results on three model species (Homo sapiens, Mus musculus and Rattus norvegicus) for interspecies gene function prediction show that HPHash performs better than other related approaches and it is robust to the number of hash functions. In addition, we also take HPHash as a plugin for BLAST based gene function prediction. From the experimental results, HPHash again significantly improves the prediction performance. The codes of HPHash are available at: http://mlda.swu.edu.cn/codes.php?name=HPHash.
Collapse
Affiliation(s)
- Yingwen Zhao
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Guangyuan Fu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing 100044, China; Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing 100044, China.
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China.
| |
Collapse
|
10
|
HashGO: hashing gene ontology for protein function prediction. Comput Biol Chem 2017; 71:264-273. [DOI: 10.1016/j.compbiolchem.2017.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 09/25/2017] [Indexed: 10/18/2022]
|
11
|
Schulze S, Urzica E, Reijnders MJMF, van de Geest H, Warris S, Bakker LV, Fufezan C, Martins Dos Santos VAP, Schaap PJ, Peters SA, Hippler M. Identification of methylated GnTI-dependent N-glycans in Botryococcus brauni. THE NEW PHYTOLOGIST 2017; 215:1361-1369. [PMID: 28737213 DOI: 10.1111/nph.14713] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/07/2017] [Accepted: 06/15/2017] [Indexed: 05/21/2023]
Abstract
In contrast to mammals and vascular plants, microalgae show a high diversity in the N-glycan structures of complex N-glycoproteins. Although homologues for β1,2-N-acetylglucosaminyltransferase I (GnTI), a key enzyme in the formation of complex N-glycans, have been identified in several algal species, GnTI-dependent N-glycans have not been detected so far. We have performed an N-glycoproteomic analysis of the hydrocarbon oils accumulating green microalgae Botryococcus braunii. Thereby, the analysis of intact N-glycopeptides allowed the determination of N-glycan compositions. Furthermore, insights into the role of N-glycosylation in B. braunii were gained from functional annotation of the identified N-glycoproteins. In total, 517 unique N-glycosylated peptides have been identified, including intact N-glycopeptides that harbored N-acetylhexosamine (HexNAc) at the nonreducing end. Surprisingly, these GnTI-dependent N-glycans were also found to be modified with (di)methylated hexose. The identification of GnTI-dependent N-glycans in combination with N-glycan methylation in B. braunii revealed an uncommon type of N-glycan processing in this microalgae.
Collapse
Affiliation(s)
- Stefan Schulze
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, 48143, Germany
| | - Eugen Urzica
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, 48143, Germany
| | - Maarten J M F Reijnders
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6703 HB, the Netherlands
| | - Henri van de Geest
- Applied Bioinformatics, Wageningen University and Research Centre, Wageningen, 6708 PB, the Netherlands
| | - Sven Warris
- Applied Bioinformatics, Wageningen University and Research Centre, Wageningen, 6708 PB, the Netherlands
| | - Linda V Bakker
- Applied Bioinformatics, Wageningen University and Research Centre, Wageningen, 6708 PB, the Netherlands
| | - Christian Fufezan
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, 48143, Germany
| | - Vitor A P Martins Dos Santos
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6703 HB, the Netherlands
- LifeGlimmer GmbH, Berlin, 12163, Germany
| | - Peter J Schaap
- Laboratory of Systems and Synthetic Biology, Wageningen University, Wageningen, 6703 HB, the Netherlands
| | - Sander A Peters
- Applied Bioinformatics, Wageningen University and Research Centre, Wageningen, 6708 PB, the Netherlands
| | - Michael Hippler
- Institute of Plant Biology and Biotechnology, University of Münster, Münster, 48143, Germany
| |
Collapse
|
12
|
Yu G, Lu C, Wang J. NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinformatics 2017; 18:350. [PMID: 28732468 PMCID: PMC5521088 DOI: 10.1186/s12859-017-1764-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/14/2017] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. RESULTS We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. CONCLUSIONS The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Chang Lu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
13
|
Zou X, Wang G, Yu G. Protein Function Prediction Using Deep Restricted Boltzmann Machines. BIOMED RESEARCH INTERNATIONAL 2017; 2017:1729301. [PMID: 28744460 PMCID: PMC5506480 DOI: 10.1155/2017/1729301] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 03/30/2017] [Accepted: 05/30/2017] [Indexed: 11/17/2022]
Abstract
Accurately annotating biological functions of proteins is one of the key tasks in the postgenome era. Many machine learning based methods have been applied to predict functional annotations of proteins, but this task is rarely solved by deep learning techniques. Deep learning techniques recently have been successfully applied to a wide range of problems, such as video, images, and nature language processing. Inspired by these successful applications, we investigate deep restricted Boltzmann machines (DRBM), a representative deep learning technique, to predict the missing functional annotations of partially annotated proteins. Experimental results on Homo sapiens, Saccharomyces cerevisiae, Mus musculus, and Drosophila show that DRBM achieves better performance than other related methods across different evaluation metrics, and it also runs faster than these comparing methods.
Collapse
Affiliation(s)
- Xianchun Zou
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Guijun Wang
- College of Computer and Information Science, Southwest University, Chongqing, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing, China
| |
Collapse
|
14
|
Jiang B, Kloster K, Gleich DF, Gribskov M. AptRank: an adaptive PageRank model for protein function prediction on bi-relational graphs. Bioinformatics 2017; 33:1829-1836. [PMID: 28200073 DOI: 10.1093/bioinformatics/btx029] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/23/2016] [Accepted: 02/14/2017] [Indexed: 11/15/2022] Open
Affiliation(s)
- Biaobin Jiang
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
| | - Kyle Kloster
- Department of Mathematics, Purdue University, West Lafayette, IN, USA
| | - David F Gleich
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| | - Michael Gribskov
- Department of Biological Sciences, Purdue University, West Lafayette, IN, USA
- Department of Computer Science, Purdue University, West Lafayette, IN, USA
| |
Collapse
|
15
|
Abstract
BACKGROUND Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them. RESULTS Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other. CONCLUSIONS Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Wei Luo
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Guangyuan Fu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
16
|
Tian Z, Wang C, Guo M, Liu X, Teng Z. SGFSC: speeding the gene functional similarity calculation based on hash tables. BMC Bioinformatics 2016; 17:445. [PMID: 27814675 PMCID: PMC5096311 DOI: 10.1186/s12859-016-1294-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 10/19/2016] [Indexed: 12/23/2022] Open
Abstract
Background In recent years, many measures of gene functional similarity have been proposed and widely used in all kinds of essential research. These methods are mainly divided into two categories: pairwise approaches and group-wise approaches. However, a common problem with these methods is their time consumption, especially when measuring the gene functional similarities of a large number of gene pairs. The problem of computational efficiency for pairwise approaches is even more prominent because they are dependent on the combination of semantic similarity. Therefore, the efficient measurement of gene functional similarity remains a challenging problem. Results To speed current gene functional similarity calculation methods, a novel two-step computing strategy is proposed: (1) establish a hash table for each method to store essential information obtained from the Gene Ontology (GO) graph and (2) measure gene functional similarity based on the corresponding hash table. There is no need to traverse the GO graph repeatedly for each method with the help of the hash table. The analysis of time complexity shows that the computational efficiency of these methods is significantly improved. We also implement a novel Speeding Gene Functional Similarity Calculation tool, namely SGFSC, which is bundled with seven typical measures using our proposed strategy. Further experiments show the great advantage of SGFSC in measuring gene functional similarity on the whole genomic scale. Conclusions The proposed strategy is successful in speeding current gene functional similarity calculation methods. SGFSC is an efficient tool that is freely available at http://nclab.hit.edu.cn/SGFSC. The source code of SGFSC can be downloaded from http://pan.baidu.com/s/1dFFmvpZ.
Collapse
Affiliation(s)
- Zhen Tian
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Maozu Guo
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Xiaoyan Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Zhixia Teng
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.,Department of Information Management and Information System, Northeast Forestry University, Harbin, 150001, People's Republic of China
| |
Collapse
|
17
|
Fu G, Wang J, Yang B, Yu G. NegGOA: negative GO annotations selection using ontology structure. Bioinformatics 2016; 32:2996-3004. [PMID: 27318205 DOI: 10.1093/bioinformatics/btw366] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 06/01/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. RESULTS In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. AVAILABILITY AND IMPLEMENTATION The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa CONTACT gxyu@swu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyuan Fu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Bo Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
18
|
Peng L, Liao B, Zhu W, Li Z, Li K. Predicting Drug-Target Interactions With Multi-Information Fusion. IEEE J Biomed Health Inform 2015; 21:561-572. [PMID: 26731781 DOI: 10.1109/jbhi.2015.2513200] [Citation(s) in RCA: 56] [Impact Index Per Article: 5.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/09/2022]
Abstract
Identifying potential associations between drugs and targets is a critical prerequisite for modern drug discovery and repurposing. However, predicting these associations is difficult because of the limitations of existing computational methods. Most models only consider chemical structures and protein sequences, and other models are oversimplified. Moreover, datasets used for analysis contain only true-positive interactions, and experimentally validated negative samples are unavailable. To overcome these limitations, we developed a semi-supervised based learning framework called NormMulInf through collaborative filtering theory by using labeled and unlabeled interaction information. The proposed method initially determines similarity measures, such as similarities among samples and local correlations among the labels of the samples, by integrating biological information. The similarity information is then integrated into a robust principal component analysis model, which is solved using augmented Lagrange multipliers. Experimental results on four classes of drug-target interaction networks suggest that the proposed approach can accurately classify and predict drug-target interactions. Part of the predicted interactions are reported in public databases. The proposed method can also predict possible targets for new drugs and can be used to determine whether atropine may interact with alpha1B- and beta1- adrenergic receptors. Furthermore, the developed technique identifies potential drugs for new targets and can be used to assess whether olanzapine and propiomazine may target 5HT2B. Finally, the proposed method can potentially address limitations on studies of multitarget drugs and multidrug targets.
Collapse
|