1
|
de Siqueira Santos S, Yang H, Galeano A, Paccanaro A. Host centric drug repurposing for viral diseases. PLoS Comput Biol 2025; 21:e1012876. [PMID: 40173200 PMCID: PMC12052139 DOI: 10.1371/journal.pcbi.1012876] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2024] [Revised: 05/05/2025] [Accepted: 02/14/2025] [Indexed: 04/04/2025] Open
Abstract
Computational approaches for drug repurposing for viral diseases have mainly focused on a small number of antivirals that directly target pathogens (virus centric therapies). In this work, we combine ideas from collaborative filtering and network medicine for making predictions on a much larger set of drugs that could be repurposed for host centric therapies, that are aimed at interfering with host cell factors required by a pathogen. Our idea is to create matrices quantifying the perturbation that drugs and viruses induce on human protein interaction networks. Then, we decompose these matrices to learn embeddings of drugs, viruses, and proteins in a low dimensional space. Predictions of host-centric antivirals are obtained by taking the dot product between the corresponding drug and virus representations. Our approach is general and can be applied systematically to any compound with known targets and any virus whose host proteins are known. We show that our predictions have high accuracy and that the embeddings contain meaningful biological information that may provide insights into the underlying biology of viral infections. Our approach can integrate different types of information, does not rely on known drug-virus associations and can be applied to new viral diseases and drugs.
Collapse
Affiliation(s)
| | - Haixuan Yang
- School of Mathematical & Statistical Sciences, University of Galway, Galway, Ireland
| | - Aldo Galeano
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway, University of London, Egham Hill, Egham, United Kingdom
| |
Collapse
|
2
|
Gu Z. simona: a comprehensive R package for semantic similarity analysis on bio-ontologies. BMC Genomics 2024; 25:869. [PMID: 39285315 PMCID: PMC11406866 DOI: 10.1186/s12864-024-10759-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/17/2024] [Accepted: 09/02/2024] [Indexed: 09/19/2024] Open
Abstract
BACKGROUND Bio-ontologies are keys in structuring complex biological information for effective data integration and knowledge representation. Semantic similarity analysis on bio-ontologies quantitatively assesses the degree of similarity between biological concepts based on the semantics encoded in ontologies. It plays an important role in structured and meaningful interpretations and integration of complex data from multiple biological domains. RESULTS We present simona, a novel R package for semantic similarity analysis on general bio-ontologies. Simona implements infrastructures for ontology analysis by offering efficient data structures, fast ontology traversal methods, and elegant visualizations. Moreover, it provides a robust toolbox supporting over 70 methods for semantic similarity analysis. With simona, we conducted a benchmark against current semantic similarity methods. The results demonstrate methods are clustered based on their mathematical methodologies, thus guiding researchers in the selection of appropriate methods. Additionally, we explored annotation-based versus topology-based methods, revealing that semantic similarities solely based on ontology topology can efficiently reveal semantic similarity structures, facilitating analysis on less-studied organisms and other ontologies. CONCLUSIONS Simona offers a versatile interface and efficient implementation for processing, visualization, and semantic similarity analysis on bio-ontologies. We believe that simona will serve as a robust tool for uncovering relationships and enhancing the interoperability of biological knowledge systems.
Collapse
Affiliation(s)
- Zuguang Gu
- Molecular Precision Oncology Program, National Center for Tumor Diseases (NCT), Im Neuenheimer Feld 280, Heidelberg, 69120, Germany.
| |
Collapse
|
3
|
Caniza H, Cáceres JJ, Torres M, Paccanaro A. LanDis: the disease landscape explorer. Eur J Hum Genet 2024; 32:461-465. [PMID: 38200084 PMCID: PMC10999415 DOI: 10.1038/s41431-023-01511-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/13/2023] [Revised: 11/01/2023] [Accepted: 11/23/2023] [Indexed: 01/12/2024] Open
Abstract
From a network medicine perspective, a disease is the consequence of perturbations on the interactome. These perturbations tend to appear in a specific neighbourhood on the interactome, the disease module, and modules related to phenotypically similar diseases tend to be located in close-by regions. We present LanDis, a freely available web-based interactive tool ( https://paccanarolab.org/landis ) that allows domain experts, medical doctors and the larger scientific community to graphically navigate the interactome distances between the modules of over 44 million pairs of heritable diseases. The map-like interface provides detailed comparisons between pairs of diseases together with supporting evidence. Every disease in LanDis is linked to relevant entries in OMIM and UniProt, providing a starting point for in-depth analysis and an opportunity for novel insight into the aetiology of diseases as well as differential diagnosis.
Collapse
Affiliation(s)
- Horacio Caniza
- Universidad Paraguayo Alemana de Ciencias Aplicadas, Facultad de Ciencias de la Ingeniería, San Lorenzo, Paraguay
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Juan J Cáceres
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK
| | - Mateo Torres
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil
| | - Alberto Paccanaro
- Department of Computer Science, Centre for Systems and Synthetic Biology, Royal Holloway University of London, Egham, UK.
- Escola de Matemática Aplicada, Fundação Getúlio Vargas, Rio de Janeiro, Brazil.
| |
Collapse
|
4
|
Hao DC, Chen H, Xiao PG, Jiang T. A Global Analysis of Alternative Splicing of Dichocarpum Medicinal Plants, Ranunculales. Curr Genomics 2022; 23:207-216. [PMID: 36777007 PMCID: PMC9878827 DOI: 10.2174/1389202923666220527112929] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/28/2022] [Revised: 04/19/2022] [Accepted: 04/26/2022] [Indexed: 11/22/2022] Open
Abstract
Background: The multiple isoforms are often generated from a single gene via Alternative Splicing (AS) in plants, and the functional diversity of the plant genome is significantly increased. Despite well-studied gene functions, the specific functions of isoforms are little known, therefore, the accurate prediction of isoform functions is exceedingly wanted. Methods: Here we perform the first global analysis of AS of Dichocarpum, a medicinal genus of Ranunculales, by utilizing full-length transcriptome datasets of five Chinese endemic Dichocarpum taxa. Multiple software were used to identify AS events, the gene function was annotated based on seven databases, and the protein-coding sequence of each AS isoform was translated into an amino acid sequence. The self-developed software DIFFUSE was used to predict the functions of AS isoforms. Results: Among 8,485 genes with AS events, the genes with two isoforms were the most (6,038), followed by those with three isoforms and four isoforms. Retained intron (RI, 551) was predominant among 1,037 AS events, and alternative 3' splice sites and alternative 5' splice sites were second. The software DIFFUSE was effective in predicting functions of Dichocarpum isoforms, which have not been unearthed. When compared with the sequence alignment-based database annotations, DIFFUSE performed better in differentiating isoform functions. The DIFFUSE predictions on the terms GO:0003677 (DNA binding) and GO: 0010333 (terpene synthase activity) agreed with the biological features of transcript isoforms. Conclusion: Numerous AS events were for the first time identified from full-length transcriptome datasets of five Dichocarpum taxa, and functions of AS isoforms were successfully predicted by the self-developed software DIFFUSE. The global analysis of Dichocarpum AS events and predicting isoform functions can help understand the metabolic regulations of medicinal taxa and their pharmaceutical explorations.
Collapse
Affiliation(s)
- Da-Cheng Hao
- Biotechnology Institute, School of Environment and Chemical Engineering, Dalian Jiaotong University, Dalian 116028, China;,Institute of Molecular Plant Sciences, University of Edinburgh, Edinburgh EH9 3BF, UK;,Address correspondence to these authors at the School of Environment and Chemical Engineering, Dalian Jiaotong University, Dalian 116028, China; Tel: 0086-411-84572552; E-mail: ; and Department of Computer Science and Engineering, University of California, Riverside, CA, USA; Tel/Fax: 001-951-827-2991; E-mail:
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA;,Department of Computational Biology, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA;,These authors contributed equally to this work.
| | - Pei-Gen Xiao
- Institute of Medicinal Plant Development, Chinese Academy of Medical Sciences, Beijing 100193, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA;,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing 100084, China,Address correspondence to these authors at the School of Environment and Chemical Engineering, Dalian Jiaotong University, Dalian 116028, China; Tel: 0086-411-84572552; E-mail: ; and Department of Computer Science and Engineering, University of California, Riverside, CA, USA; Tel/Fax: 001-951-827-2991; E-mail:
| |
Collapse
|
5
|
Lastra-Díaz JJ, Lara-Clares A, Garcia-Serrano A. HESML: a real-time semantic measures library for the biomedical domain with a reproducible survey. BMC Bioinformatics 2022; 23:23. [PMID: 34991460 PMCID: PMC8734250 DOI: 10.1186/s12859-021-04539-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2020] [Accepted: 12/15/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Ontology-based semantic similarity measures based on SNOMED-CT, MeSH, and Gene Ontology are being extensively used in many applications in biomedical text mining and genomics respectively, which has encouraged the development of semantic measures libraries based on the aforementioned ontologies. However, current state-of-the-art semantic measures libraries have some performance and scalability drawbacks derived from their ontology representations based on relational databases, or naive in-memory graph representations. Likewise, a recent reproducible survey on word similarity shows that one hybrid IC-based measure which integrates a shortest-path computation sets the state of the art in the family of ontology-based semantic measures. However, the lack of an efficient shortest-path algorithm for their real-time computation prevents both their practical use in any application and the use of any other path-based semantic similarity measure. RESULTS To bridge the two aforementioned gaps, this work introduces for the first time an updated version of the HESML Java software library especially designed for the biomedical domain, which implements the most efficient and scalable ontology representation reported in the literature, together with a new method for the approximation of the Dijkstra's algorithm for taxonomies, called Ancestors-based Shortest-Path Length (AncSPL), which allows the real-time computation of any path-based semantic similarity measure. CONCLUSIONS We introduce a set of reproducible benchmarks showing that HESML outperforms by several orders of magnitude the current state-of-the-art libraries in the three aforementioned biomedical ontologies, as well as the real-time performance and approximation quality of the new AncSPL shortest-path algorithm. Likewise, we show that AncSPL linearly scales regarding the dimension of the common ancestor subgraph regardless of the ontology size. Path-based measures based on the new AncSPL algorithm are up to six orders of magnitude faster than their exact implementation in large ontologies like SNOMED-CT and GO. Finally, we provide a detailed reproducibility protocol and dataset as supplementary material to allow the exact replication of all our experiments and results.
Collapse
Affiliation(s)
- Juan J. Lastra-Díaz
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Alicia Lara-Clares
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| | - Ana Garcia-Serrano
- NLP & IR Research Group, E.T.S.I. Informática, Universidad Nacional de Educación a Distancia (UNED), C/Juan del Rosal 16, 28040 Madrid, Spain
| |
Collapse
|
6
|
Saxena R, Bishnoi R, Singla D. Gene Ontology: application and importance in functional annotation of the genomic data. Bioinformatics 2022. [DOI: 10.1016/b978-0-323-89775-4.00015-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
|
7
|
Zheng F, Kelly MR, Ramms DJ, Heintschel ML, Tao K, Tutuncuoglu B, Lee JJ, Ono K, Foussard H, Chen M, Herrington KA, Silva E, Liu S, Chen J, Churas C, Wilson N, Kratz A, Pillich RT, Patel DN, Park J, Kuenzi B, Yu MK, Licon K, Pratt D, Kreisberg JF, Kim M, Swaney DL, Nan X, Fraley SI, Gutkind JS, Krogan NJ, Ideker T. Interpretation of cancer mutations using a multiscale map of protein systems. Science 2021; 374:eabf3067. [PMID: 34591613 PMCID: PMC9126298 DOI: 10.1126/science.abf3067] [Citation(s) in RCA: 32] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
A major goal of cancer research is to understand how mutations distributed across diverse genes affect common cellular systems, including multiprotein complexes and assemblies. Two challenges—how to comprehensively map such systems and how to identify which are under mutational selection—have hindered this understanding. Accordingly, we created a comprehensive map of cancer protein systems integrating both new and published multi-omic interaction data at multiple scales of analysis. We then developed a unified statistical model that pinpoints 395 specific systems under mutational selection across 13 cancer types. This map, called NeST (Nested Systems in Tumors), incorporates canonical processes and notable discoveries, including a PIK3CA-actomyosin complex that inhibits phosphatidylinositol 3-kinase signaling and recurrent mutations in collagen complexes that promote tumor proliferation. These systems can be used as clinical biomarkers and implicate a total of 548 genes in cancer evolution and progression. This work shows how disparate tumor mutations converge on protein assemblies at different scales.
Collapse
Affiliation(s)
- Fan Zheng
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Marcus R. Kelly
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dana J. Ramms
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Marissa L. Heintschel
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - Kai Tao
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Beril Tutuncuoglu
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - John J. Lee
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Keiichiro Ono
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Helene Foussard
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Michael Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Kari A. Herrington
- Department of Biochemistry and Biophysics Center for Advanced Light Microscopy at UCSF, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Erica Silva
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Sophie Liu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jing Chen
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Christopher Churas
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Nicholas Wilson
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Anton Kratz
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Rudolf T. Pillich
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Devin N. Patel
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Jisoo Park
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Brent Kuenzi
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Michael K. Yu
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Katherine Licon
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Dexter Pratt
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
| | - Jason F. Kreisberg
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| | - Minkyu Kim
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Danielle L. Swaney
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Xiaolin Nan
- Department of Biomedical Engineering, Oregon Health and Science University, Portland, OR, 97239, USA
- Center for Spatial Systems Biomedicine, Oregon Health and Science University, Portland, OR, 97201, USA
- Knight Cancer Early Detection Advanced Research Center, Oregon Health and Science University, Portland, OR, 97201, USA
| | - Stephanie I. Fraley
- Department of Bioengineering, University of California San Diego, La Jolla, CA 92093, USA
| | - J. Silvio Gutkind
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Moores Cancer Center, University of California San Diego, La Jolla, CA 92093, USA
- Department of Pharmacology, University of California San Diego, La Jolla, CA 92093, USA
| | - Nevan J. Krogan
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
- Department of Cellular and Molecular Pharmacology, University of California San Francisco, CA 94158, USA
- The J. David Gladstone Institutes, San Francisco, CA 94158, USA
- Quantitative Biosciences Institute, University of California San Francisco, San Francisco, CA, 94158, USA
| | - Trey Ideker
- Division of Genetics, Department of Medicine, University of California San Diego, La Jolla, CA 92093, USA
- Cancer Cell Map Initiative (CCMI), La Jolla and San Francisco, CA, USA
| |
Collapse
|
8
|
Nguyen QH, Le DH. Similarity Calculation, Enrichment Analysis, and Ontology Visualization of Biomedical Ontologies using UFO. Curr Protoc 2021; 1:e115. [PMID: 33900688 DOI: 10.1002/cpz1.115] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/27/2023]
Abstract
The rapid growth of biomedical ontologies observed in recent years has been reported to be useful in various applications. In this article, we propose two main-function protocols-term-related and entity-related-with the three most common ontology analyses, including similarity calculation, enrichment analysis, and ontology visualization, which can be done by separate methods. Many previously developed tools implementing those methods run on different platforms and implement a limited number of the methods for similarity calculation and enrichment analysis tools for a specific type of biomedical ontology, although any type can be acceptable. Moreover, depending on each application, methods have distinct advantages; thus, the greater the number of methods a tool has, the better decisions that users make. The protocol here implements all the analyses above using an advanced popular tool called UFO. UFO is a Cytoscape app that unifies most of the semantic similarity measures for between-term and between-entity similarity calculation for biomedical ontologies in OBO format, which can calculate the similarity between two sets of entities and weigh imported entity networks, as well as generate functional similarity networks. The complete protocol can be performed in 30 min and is designed for use by biologists with no prior bioinformatics training. © 2021 Wiley Periodicals LLC. Basic Protocol: Running UFO using a list of input Gene Ontology, Disease Ontology, or Human Phenotype Ontology data.
Collapse
Affiliation(s)
- Quang-Huy Nguyen
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam
| | - Duc-Hau Le
- Department of Computational Biomedicine, Vingroup Big Data Institute, Hanoi, Vietnam.,School of Computer Science and Engineering, Thuyloi University, Hanoi, Vietnam
| |
Collapse
|
9
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
10
|
Nepomuceno-Chamorro IA, Nepomuceno JA, Galván-Rojas JL, Vega-Márquez B, Rubio-Escudero C. Using prior knowledge in the inference of gene association networks. APPL INTELL 2020. [DOI: 10.1007/s10489-020-01705-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/23/2022]
|
11
|
SabziNezhad A, Jalili S. DPCT: A Dynamic Method for Detecting Protein Complexes From TAP-Aware Weighted PPI Network. Front Genet 2020; 11:567. [PMID: 32676097 PMCID: PMC7333736 DOI: 10.3389/fgene.2020.00567] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2020] [Accepted: 05/11/2020] [Indexed: 12/13/2022] Open
Abstract
Detecting protein complexes from the Protein-Protein interaction network (PPI) is the essence of discovering the rules of the cellular world. There is a large amount of PPI data available, generated from high throughput experimental data. The enormous size of the data persuaded us to use computational methods instead of experimental methods to detect protein complexes. In past years, many researchers presented their algorithms to detect protein complexes. Most of the presented algorithms use current static PPI networks. New researches proved the dynamicity of cellular systems, and so, the PPI is not static over time. In this paper, we introduce DPCT to detect protein complexes from dynamic PPI networks. In the proposed method, TAP and GO data are used to make a weighted PPI network and to reduce the noise of PPI. Gene expression data are also used to make dynamic subnetworks from PPI. A memetic algorithm is used to bicluster gene expression data and to create a dynamic subnetwork for each bicluster. Experimental results show that DPCT can detect protein complexes with better correctness than state-of-the-art detection algorithms. The source code and datasets of DPCT used can be found at https://github.com/alisn72/DPCT.
Collapse
Affiliation(s)
- Ali SabziNezhad
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| | - Saeed Jalili
- Computer Engineering Department, Tarbiat Modares University, Tehran, Iran
| |
Collapse
|
12
|
Shaw D, Chen H, Jiang T. DeepIsoFun: a deep domain adaptation approach to predict isoform functions. Bioinformatics 2020; 35:2535-2544. [PMID: 30535380 DOI: 10.1093/bioinformatics/bty1017] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2018] [Revised: 11/07/2018] [Accepted: 12/08/2018] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Isoforms are mRNAs produced from the same gene locus by alternative splicing and may have different functions. Although gene functions have been studied extensively, little is known about the specific functions of isoforms. Recently, some computational approaches based on multiple instance learning have been proposed to predict isoform functions from annotated gene functions and expression data, but their performance is far from being desirable primarily due to the lack of labeled training data. To improve the performance on this problem, we propose a novel deep learning method, DeepIsoFun, that combines multiple instance learning with domain adaptation. The latter technique helps to transfer the knowledge of gene functions to the prediction of isoform functions and provides additional labeled training data. Our model is trained on a deep neural network architecture so that it can adapt to different expression distributions associated with different gene ontology terms. RESULTS We evaluated the performance of DeepIsoFun on three expression datasets of human and mouse collected from SRA studies at different times. On each dataset, DeepIsoFun performed significantly better than the existing methods. In terms of area under the receiver operating characteristics curve, our method acquired at least 26% improvement and in terms of area under the precision-recall curve, it acquired at least 10% improvement over the state-of-the-art methods. In addition, we also study the divergence of the functions predicted by our method for isoforms from the same gene and the overall correlation between expression similarity and the similarity of predicted functions. AVAILABILITY AND IMPLEMENTATION https://github.com/dls03/DeepIsoFun/. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA, USA.,Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
13
|
Vascon S, Frasca M, Tripodi R, Valentini G, Pelillo M. Protein function prediction as a graph-transduction game. Pattern Recognit Lett 2020. [DOI: 10.1016/j.patrec.2018.04.002] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/17/2022]
|
14
|
Yassaee Meybodi F, Emdadi A, Rezvan A, Eslahchi C. CAMND: Comparative analysis of metabolic network decomposition based on previous and two new criteria, a web based application. Biosystems 2019; 189:104081. [PMID: 31838143 DOI: 10.1016/j.biosystems.2019.104081] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/04/2018] [Revised: 10/06/2019] [Accepted: 11/26/2019] [Indexed: 11/29/2022]
Abstract
Metabolic networks can model the behavior of metabolism in the cell. Since analyzing the whole metabolic networks is not easy, network modulation is an important issue to be investigated. Decomposing metabolic networks is a strategy to obtain better insight into metabolic functions. Additionally, decomposing these networks facilitates using computational methods, which are very slow when applied to the original genome-scale network. Several methods have been proposed for decomposing of the metabolic network. Therefore, it is necessary to evaluate these methods by suitable criteria. In this study, we introduce a web server package for decomposing of metabolic networks with 10 different methods, 9 datasets and the ability of computing 12 criteria, to evaluate and compare the results of different methods using ten previously defined and two new criteria which are based on Chebi ontology and Co-expression_of_Enzymes information. This package visualizes the obtained modules via "gephi" software. The ability of this package is that the user can examine whether two metabolites or reactions are in the same module or not. The functionality of the package can be easily extended to include new datasets and criteria. It also has the ability to compare the results of novel methods with the results of previously developed methods. The package is implemented in python and is available at http://eslahchilab.ir/softwares/dmn.
Collapse
Affiliation(s)
- Fatemeh Yassaee Meybodi
- Department of Computer Science, Faculty of Mathematics, Shahid-Beheshti University, GC, Tehran 1983963113, Iran
| | - Akram Emdadi
- Department of Computer Science, Faculty of Mathematics, Shahid-Beheshti University, GC, Tehran 1983963113, Iran
| | - Abolfazl Rezvan
- Department of Computer Science, Faculty of Mathematics, Shahid-Beheshti University, GC, Tehran 1983963113, Iran
| | - Changiz Eslahchi
- Department of Computer Science, Faculty of Mathematics, Shahid-Beheshti University, GC, Tehran 1983963113, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran 193955746, Iran.
| |
Collapse
|
15
|
Chen H, Shaw D, Zeng J, Bu D, Jiang T. DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning. Bioinformatics 2019; 35:i284-i294. [PMID: 31510699 PMCID: PMC6612874 DOI: 10.1093/bioinformatics/btz367] [Citation(s) in RCA: 28] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/25/2023] Open
Abstract
MOTIVATION Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision-recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Hao Chen
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Dipan Shaw
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
| | - Jianyang Zeng
- Institute for Interdisciplinary Information Sciences, Tsinghua University, Beijing, China
| | - Dongbo Bu
- Key Lab of Intelligent Information Process, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
- University of Chinese Academy of Sciences, Beijing, China
| | - Tao Jiang
- Department of Compute Science and Engineering, University of California, Riverside, CA, USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
16
|
Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019; 13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Improving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms. Results In this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms. Conclusions Compared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.
Collapse
Affiliation(s)
- Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
17
|
Wang D, Li J, Liu R, Wang Y. Optimizing gene set annotations combining GO structure and gene expression data. BMC SYSTEMS BIOLOGY 2018; 12:133. [PMID: 30598093 PMCID: PMC6311910 DOI: 10.1186/s12918-018-0659-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/25/2022]
Abstract
BACKGROUND With the rapid accumulation of genomic data, it has become a challenge issue to annotate and interpret these data. As a representative, Gene set enrichment analysis has been widely used to interpret large molecular datasets generated by biological experiments. The result of gene set enrichment analysis heavily relies on the quality and integrity of gene set annotations. Although several methods were developed to annotate gene sets, there is still a lack of high quality annotation methods. Here, we propose a novel method to improve the annotation accuracy through combining the GO structure and gene expression data. RESULTS We propose a novel approach for optimizing gene set annotations to get more accurate annotation results. The proposed method filters the inconsistent annotations using GO structure information and probabilistic gene set clusters calculated by a range of cluster sizes over multiple bootstrap resampled datasets. The proposed method is employed to analyze p53 cell lines, colon cancer and breast cancer gene expression data. The experimental results show that the proposed method can filter a number of annotations unrelated to experimental data and increase gene set enrichment power and decrease the inconsistent of annotations. CONCLUSIONS A novel gene set annotation optimization approach is proposed to improve the quality of gene annotations. Experimental results indicate that the proposed method effectively improves gene set annotation quality based on the GO structure and gene expression data.
Collapse
Affiliation(s)
- Dong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Rui Liu
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, West Da-Zhi Street, Harbin, China
| |
Collapse
|
18
|
Duong D, Ahmad WU, Eskin E, Chang KW, Li JJ. Word and Sentence Embedding Tools to Measure Semantic Similarity of Gene Ontology Terms by Their Definitions. J Comput Biol 2018; 26:38-52. [PMID: 30383443 DOI: 10.1089/cmb.2018.0093] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/26/2023] Open
Abstract
The gene ontology (GO) database contains GO terms that describe biological functions of genes. Previous methods for comparing GO terms have relied on the fact that GO terms are organized into a tree structure. Under this paradigm, the locations of two GO terms in the tree dictate their similarity score. In this article, we introduce two new solutions for this problem by focusing instead on the definitions of the GO terms. We apply neural network-based techniques from the natural language processing (NLP) domain. The first method does not rely on the GO tree, whereas the second indirectly depends on the GO tree. In our first approach, we compare two GO definitions by treating them as two unordered sets of words. The word similarity is estimated by a word embedding model that maps words into an N-dimensional space. In our second approach, we account for the word-ordering within a sentence. We use a sentence encoder to embed GO definitions into vectors and estimate how likely one definition entails another. We validate our methods in two ways. In the first experiment, we test the model's ability to differentiate a true protein-protein network from a randomly generated network. In the second experiment, we test the model in identifying orthologs from randomly matched genes in human, mouse, and fly. In both experiments, a hybrid of NLP and GO tree-based method achieves the best classification accuracy.
Collapse
Affiliation(s)
- Dat Duong
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Wasi Uddin Ahmad
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Eleazar Eskin
- 1 Department of Computer Science, University of California, Los Angeles, California.,2 Department of Human Genetics, and University of California, Los Angeles, California
| | - Kai-Wei Chang
- 1 Department of Computer Science, University of California, Los Angeles, California
| | - Jingyi Jessica Li
- 2 Department of Human Genetics, and University of California, Los Angeles, California.,3 Department of Statistics, University of California, Los Angeles, California
| |
Collapse
|
19
|
Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018; 19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs. RESULTS We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network. CONCLUSIONS PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Hansheng Xue
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055 China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055 China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001 China
| |
Collapse
|
20
|
Nikdelfaz O, Jalili S. Disease genes prediction by HMM based PU-learning using gene expression profiles. J Biomed Inform 2018; 81:102-111. [PMID: 29571901 DOI: 10.1016/j.jbi.2018.03.006] [Citation(s) in RCA: 12] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2017] [Revised: 11/22/2017] [Accepted: 03/12/2018] [Indexed: 12/24/2022]
Abstract
Predicting disease candidate genes from human genome is a crucial part of nowadays biomedical research. According to observations, diseases with the same phenotype have the similar biological characteristics and genes associated with these same diseases tend to share common functional properties. Therefore, by applying machine learning methods, new disease genes are predicted based on previous ones. In recent studies, some semi-supervised learning methods, called Positive-Unlabeled Learning (PU-Learning) are used for predicting disease candidate genes. In this study, a novel method is introduced to predict disease candidate genes through gene expression profiles by learning hidden Markov models. In order to evaluate the proposed method, it is applied on a mixed part of 398 disease genes from three disease types and 12001 unlabeled genes. Compared to the other methods in literature, the experimental results indicate a significant improvement in favor of the proposed method.
Collapse
Affiliation(s)
- Ozra Nikdelfaz
- Tarbiat Modares University, Computer Engineering Department, Islamic Republic of Iran.
| | - Saeed Jalili
- Tarbiat Modares University, Computer Engineering Department, Islamic Republic of Iran.
| |
Collapse
|
21
|
Rezvan A, Eslahchi C. Comparison of different approaches for identifying subnetworks in metabolic networks. J Bioinform Comput Biol 2017; 15:1750025. [DOI: 10.1142/s0219720017500251] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]
Abstract
A metabolic network model provides a computational framework for studying the metabolism of a cell at the system level. The organization of metabolic networks has been investigated in different studies. One of the organization aspects considered in these studies is the decomposition of a metabolic network. The decompositions produced by different methods are very different and there is no comprehensive evaluation framework to compare the results with each other. In this study, these methods are reviewed and compared in the first place. Then they are applied to six different metabolic network models and the results are evaluated and compared based on two existing and two newly proposed criteria. Results show that no single method can beat others in all criteria but it seems that the methods introduced by Guimera and Amaral and Verwoerd do better on among metabolite-based methods and the method introduced by Sridharan et al. does better among reaction-based ones. Also, the methods are applied to several artificial networks, each constructed from merging a few KEGG pathways. Then, their capability to recover those pathways are compared. Results show that among metabolite-based methods, the method of Guimera and Amaral does better again, however, no notable difference between the performances of reaction-based methods was detected.
Collapse
Affiliation(s)
- Abolfazl Rezvan
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| | - Changiz Eslahchi
- Department of Computer Science, Faculty of Mathematical Sciences, Shahid Beheshti University, Tehran, Iran
| |
Collapse
|
22
|
Mazandu GK, Chimusa ER, Mulder NJ. Gene Ontology semantic similarity tools: survey on features and challenges for biological knowledge discovery. Brief Bioinform 2017; 18:886-901. [PMID: 27473066 DOI: 10.1093/bib/bbw067] [Citation(s) in RCA: 33] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/27/2016] [Indexed: 01/02/2023] Open
Abstract
Gene Ontology (GO) semantic similarity tools enable retrieval of semantic similarity scores, which incorporate biological knowledge embedded in the GO structure for comparing or classifying different proteins or list of proteins based on their GO annotations. This facilitates a better understanding of biological phenomena underlying the corresponding experiment and enables the identification of processes pertinent to different biological conditions. Currently, about 14 tools are available, which may play an important role in improving protein analyses at the functional level using different GO semantic similarity measures. Here we survey these tools to provide a comprehensive view of the challenges and advances made in this area to avoid redundant effort in developing features that already exist, or implementing ideas already proven to be obsolete in the context of GO. This helps researchers, tool developers, as well as end users, understand the underlying semantic similarity measures implemented through knowledge of pertinent features of, and issues related to, a particular tool. This should empower users to make appropriate choices for their biological applications and ensure effective knowledge discovery based on GO annotations.
Collapse
|
23
|
Peng J, Li Q, Shang X. Investigations on factors influencing HPO-based semantic similarity calculation. J Biomed Semantics 2017; 8:34. [PMID: 29297376 PMCID: PMC5763495 DOI: 10.1186/s13326-017-0144-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background Although disease diagnosis has greatly benefited from next generation sequencing technologies, it is still difficult to make the right diagnosis purely based on sequencing technologies for many diseases with complex phenotypes and high genetic heterogeneity. Recently, calculating Human Phenotype Ontology (HPO)-based phenotype semantic similarity has contributed a lot for completing disease diagnosis. However, factors which affect the accuracy of HPO-based semantic similarity have not been evaluated systematically. Results In this study, we proposed a new framework called HPOFactor to evaluate these factors. Our model includes four components: (1) the size of annotation set, (2) the evidence code of annotations, (3) the quality of annotations and (4) the coverage of annotations respectively. Conclusions HPOFactor analyzes the four factors systematically based on two kinds of experiments: causative gene prediction and disease prediction. Furthermore, semantic similarity measurement could be designed based on the characteristic of these factors.
Collapse
Affiliation(s)
- Jiajie Peng
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China
| | - Qianqian Li
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China
| | - Xuequn Shang
- Northwestern Polytechnical University, 127 West Youyi Road, Xi'an, 710072, China.
| |
Collapse
|
24
|
Nap JP, Sanchez-Perez GF, van Dijk ADJ. Similarities between plant traits based on their connection to underlying gene functions. PLoS One 2017; 12:e0182097. [PMID: 28797052 PMCID: PMC5552327 DOI: 10.1371/journal.pone.0182097] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2017] [Accepted: 07/12/2017] [Indexed: 11/19/2022] Open
Abstract
Understanding of phenotypes and their genetic basis is a major focus in current plant biology. Large amounts of phenotype data are being generated, both for macroscopic phenotypes such as size or yield, and for molecular phenotypes such as expression levels and metabolite levels. More insight in the underlying genetic and molecular mechanisms that influence phenotypes will enable a better understanding of how various phenotypes are related to each other. This will be a major step forward in understanding plant biology, with immediate value for plant breeding and academic plant research. Currently the genetic basis of most phenotypes remains however to be discovered, and the relatedness of different traits is unclear. We here present a novel approach to connect phenotypes to underlying biological processes and molecular functions. These connections define similarities between different types of phenotypes. The approach starts by using Quantitative Trait Locus (QTL) data, which are abundantly available for many phenotypes of interest. Overrepresentation analysis of gene functions based on Gene Ontology term enrichment across multiple QTL regions for a given phenotype, be it macroscopic or molecular, results in a small set of biological processes and molecular functions for each phenotype. Subsequently, similarity between different phenotypes can be defined in terms of these gene functions. Using publicly available rice data as example, a close relationship with defined molecular phenotypes is demonstrated for many macroscopic phenotypes. This includes for example a link between 'leaf senescence' and 'aspartic acid', as well as between 'days to maturity' and 'choline'. Relationships between macroscopic and molecular phenotypes may result in more efficient marker-assisted breeding and are likely to direct future research aimed at a better understanding of plant phenotypes.
Collapse
Affiliation(s)
- Jan-Peter Nap
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
| | - Gabino F. Sanchez-Perez
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
| | - Aalt D. J. van Dijk
- Applied Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Laboratory of Bioinformatics, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- Biometris, Wageningen University & Research, Droevendaalsesteeg 1, PB Wageningen, The Netherlands
- * E-mail:
| |
Collapse
|
25
|
Yu G, Lu C, Wang J. NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinformatics 2017; 18:350. [PMID: 28732468 PMCID: PMC5521088 DOI: 10.1186/s12859-017-1764-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/14/2017] [Indexed: 01/11/2023] Open
Abstract
BACKGROUND Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem. RESULTS We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction. CONCLUSIONS The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .
Collapse
Affiliation(s)
- Guoxian Yu
- College of Computer and Information Sciences, Southwest University, Chongqing, China.
| | - Chang Lu
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| | - Jun Wang
- College of Computer and Information Sciences, Southwest University, Chongqing, China
| |
Collapse
|
26
|
A cytosolic Ezh1 isoform modulates a PRC2–Ezh1 epigenetic adaptive response in postmitotic cells. Nat Struct Mol Biol 2017; 24:444-452. [DOI: 10.1038/nsmb.3392] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2016] [Accepted: 02/24/2017] [Indexed: 12/13/2022]
|
27
|
Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework. Sci Rep 2017; 7:381. [PMID: 28336965 PMCID: PMC5428484 DOI: 10.1038/s41598-017-00465-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 02/28/2017] [Indexed: 11/21/2022] Open
Abstract
Protein functional similarity based on gene ontology (GO) annotations serves as a powerful tool when comparing proteins on a functional level in applications such as protein-protein interaction prediction, gene prioritization, and disease gene discovery. Functional similarity (FS) is usually quantified by combining the GO hierarchy with an annotation corpus that links genes and gene products to GO terms. One large group of algorithms involves calculation of GO term semantic similarity (SS) between all the terms annotating the two proteins, followed by a second step, described as “mixing strategy”, which involves combining the SS values to yield the final FS value. Due to the variability of protein annotation caused e.g. by annotation bias, this value cannot be reliably compared on an absolute scale. We therefore introduce a similarity z-score that takes into account the FS background distribution of each protein. For a selection of popular SS measures and mixing strategies we demonstrate moderate accuracy improvement when using z-scores in a benchmark that aims to separate orthologous cases from random gene pairs and discuss in this context the impact of annotation corpus choice. The approach has been implemented in Frela, a fast high-throughput public web server for protein FS calculation and interpretation.
Collapse
|
28
|
Abstract
Gene Ontology-based semantic similarity (SS) allows the comparison of GO terms or entities annotated with GO terms, by leveraging on the ontology structure and properties and on annotation corpora. In the last decade the number and diversity of SS measures based on GO has grown considerably, and their application ranges from functional coherence evaluation, protein interaction prediction, and disease gene prioritization.Understanding how SS measures work, what issues can affect their performance and how they compare to each other in different evaluation settings is crucial to gain a comprehensive view of this area and choose the most appropriate approaches for a given application.In this chapter, we provide a guide to understanding and selecting SS measures for biomedical researchers. We present a straightforward categorization of SS measures and describe the main strategies they employ. We discuss the intrinsic and external issues that affect their performance, and how these can be addressed. We summarize comparative assessment studies, highlighting the top measures in different settings, and compare different implementation strategies and their use. Finally, we discuss some of the extant challenges and opportunities, namely the increased semantic complexity of GO and the need for fast and efficient computation, pointing the way towards the future generation of SS measures.
Collapse
Affiliation(s)
- Catia Pesquita
- LaSIGE, Faculdade de Ciências, Universidade de Lisboa, Edifício C6, Piso 3, Campo Grande, 1749-016, Lisbon, Portugal.
| |
Collapse
|
29
|
Peng J, Li H, Liu Y, Juan L, Jiang Q, Wang Y, Chen J. InteGO2: a web tool for measuring and visualizing gene semantic similarities using Gene Ontology. BMC Genomics 2016; 17 Suppl 5:530. [PMID: 27586009 PMCID: PMC5009821 DOI: 10.1186/s12864-016-2828-6] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) has been used in high-throughput omics research as a major bioinformatics resource. The hierarchical structure of GO provides users a convenient platform for biological information abstraction and hypothesis testing. Computational methods have been developed to identify functionally similar genes. However, none of the existing measurements take into account all the rich information in GO. Similarly, using these existing methods, web-based applications have been constructed to compute gene functional similarities, and to provide pure text-based outputs. Without a graphical visualization interface, it is difficult for result interpretation. RESULTS We present InteGO2, a web tool that allows researchers to calculate the GO-based gene semantic similarities using seven widely used GO-based similarity measurements. Also, we provide an integrative measurement that synergistically integrates all the individual measurements to improve the overall performance. Using HTML5 and cytoscape.js, we provide a graphical interface in InteGO2 to visualize the resulting gene functional association networks. CONCLUSIONS InteGO2 is an easy-to-use HTML5 based web tool. With it, researchers can measure gene or gene product functional similarity conveniently, and visualize the network of functional interactions in a graphical interface. InteGO2 can be accessed via http://mlg.hit.edu.cn:8089/ .
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, 48824, MI, USA
| | - Hongxiang Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yongzhuang Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Jin Chen
- Department of Energy Plant Research Laboratory, Michigan State University, East Lansing, 48824, MI, USA. .,Department of Computer Science and Engineering, Michigan State University, East Lansing, 48824, MI, USA.
| |
Collapse
|
30
|
Fu G, Wang J, Yang B, Yu G. NegGOA: negative GO annotations selection using ontology structure. Bioinformatics 2016; 32:2996-3004. [PMID: 27318205 DOI: 10.1093/bioinformatics/btw366] [Citation(s) in RCA: 23] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/25/2016] [Accepted: 06/01/2016] [Indexed: 11/14/2022] Open
Abstract
MOTIVATION Predicting the biological functions of proteins is one of the key challenges in the post-genomic era. Computational models have demonstrated the utility of applying machine learning methods to predict protein function. Most prediction methods explicitly require a set of negative examples-proteins that are known not carrying out a particular function. However, Gene Ontology (GO) almost always only provides the knowledge that proteins carry out a particular function, and functional annotations of proteins are incomplete. GO structurally organizes more than tens of thousands GO terms and a protein is annotated with several (or dozens) of these terms. For these reasons, the negative examples of a protein can greatly help distinguishing true positive examples of the protein from such a large candidate GO space. RESULTS In this paper, we present a novel approach (called NegGOA) to select negative examples. Specifically, NegGOA takes advantage of the ontology structure, available annotations and potentiality of additional annotations of a protein to choose negative examples of the protein. We compare NegGOA with other negative examples selection algorithms and find that NegGOA produces much fewer false negatives than them. We incorporate the selected negative examples into an efficient function prediction model to predict the functions of proteins in Yeast, Human, Mouse and Fly. NegGOA also demonstrates improved accuracy than these comparing algorithms across various evaluation metrics. In addition, NegGOA is less suffered from incomplete annotations of proteins than these comparing methods. AVAILABILITY AND IMPLEMENTATION The Matlab and R codes are available at https://sites.google.com/site/guoxian85/neggoa CONTACT gxyu@swu.edu.cn SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Guangyuan Fu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Jun Wang
- College of Computer and Information Science, Southwest University, Chongqing 400715, China
| | - Bo Yang
- Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| | - Guoxian Yu
- College of Computer and Information Science, Southwest University, Chongqing 400715, China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, Changchun 130012, China
| |
Collapse
|
31
|
Gazestani VH, Nikpour N, Mehta V, Najafabadi HS, Moshiri H, Jardim A, Salavati R. A Protein Complex Map of Trypanosoma brucei. PLoS Negl Trop Dis 2016; 10:e0004533. [PMID: 26991453 PMCID: PMC4798371 DOI: 10.1371/journal.pntd.0004533] [Citation(s) in RCA: 14] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2015] [Accepted: 02/20/2016] [Indexed: 12/27/2022] Open
Abstract
The functions of the majority of trypanosomatid-specific proteins are unknown, hindering our understanding of the biology and pathogenesis of Trypanosomatida. While protein-protein interactions are highly informative about protein function, a global map of protein interactions and complexes is still lacking for these important human parasites. Here, benefiting from in-depth biochemical fractionation, we systematically interrogated the co-complex interactions of more than 3354 protein groups in procyclic life stage of Trypanosoma brucei, the protozoan parasite responsible for human African trypanosomiasis. Using a rigorous methodology, our analysis led to identification of 128 high-confidence complexes encompassing 716 protein groups, including 635 protein groups that lacked experimental annotation. These complexes correlate well with known pathways as well as for proteins co-expressed across the T. brucei life cycle, and provide potential functions for a large number of previously uncharacterized proteins. We validated the functions of several novel proteins associated with the RNA-editing machinery, identifying a candidate potentially involved in the mitochondrial post-transcriptional regulation of T. brucei. Our data provide an unprecedented view of the protein complex map of T. brucei, and serve as a reliable resource for further characterization of trypanosomatid proteins. The presented results in this study are available at: www.TrypsNetDB.org. Due to high evolutionary divergence of trypanosomatid pathogens from other eukaryotes, accurate prediction of functional roles for most of their proteins is not feasible based on homology-based approaches. Although protein co-complex maps provide a compelling tool for the functional annotation of proteins, as subunits of a complex are expected to be involved in similar biological processes, the current knowledge about these maps is still rudimentary. Here, we systematically examined the protein co-complex membership of more than one third of T. brucei proteome using two orthogonal fractionation approaches. A high-confidence network of co-complex relationships predicts the network context of 866 proteins, including many hypothetical and experimentally unannotated proteins. To our knowledge, this study presents the largest proteomics-based interaction map of trypanosomatid parasites to date, providing a useful resource for formulating new biological hypothesises and further experimental leads.
Collapse
Affiliation(s)
- Vahid H. Gazestani
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
| | - Najmeh Nikpour
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
| | - Vaibhav Mehta
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
- Department of Biochemistry, McGill University, Montreal, Quebec, Canada
| | - Hamed S. Najafabadi
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
- McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada
| | - Houtan Moshiri
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
- Department of Biochemistry, McGill University, Montreal, Quebec, Canada
| | - Armando Jardim
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
- Centre for Host-Parasite Interactions, Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
| | - Reza Salavati
- Institute of Parasitology, McGill University, Ste. Anne de Bellevue, Quebec, Canada
- Department of Biochemistry, McGill University, Montreal, Quebec, Canada
- McGill Centre for Bioinformatics, McGill University, Montreal, Quebec, Canada
- * E-mail:
| |
Collapse
|
32
|
Yu G, Fu G, Wang J, Zhu H. Predicting Protein Function via Semantic Integration of Multiple Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016; 13:220-232. [PMID: 26800544 DOI: 10.1109/tcbb.2015.2459713] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]
Abstract
Determining the biological functions of proteins is one of the key challenges in the post-genomic era. The rapidly accumulated large volumes of proteomic and genomic data drives to develop computational models for automatically predicting protein function in large scale. Recent approaches focus on integrating multiple heterogeneous data sources and they often get better results than methods that use single data source alone. In this paper, we investigate how to integrate multiple biological data sources with the biological knowledge, i.e., Gene Ontology (GO), for protein function prediction. We propose a method, called SimNet, to Semantically integrate multiple functional association Networks derived from heterogenous data sources. SimNet firstly utilizes GO annotations of proteins to capture the semantic similarity between proteins and introduces a semantic kernel based on the similarity. Next, SimNet constructs a composite network, obtained as a weighted summation of individual networks, and aligns the network with the kernel to get the weights assigned to individual networks. Then, it applies a network-based classifier on the composite network to predict protein function. Experiment results on heterogenous proteomic data sources of Yeast, Human, Mouse, and Fly show that, SimNet not only achieves better (or comparable) results than other related competitive approaches, but also takes much less time. The Matlab codes of SimNet are available at https://sites.google.com/site/guoxian85/simnet.
Collapse
|
33
|
Faisal FE, Meng L, Crawford J, Milenković T. The post-genomic era of biological network alignment. EURASIP JOURNAL ON BIOINFORMATICS & SYSTEMS BIOLOGY 2015; 2015:3. [PMID: 28194172 PMCID: PMC5270500 DOI: 10.1186/s13637-015-0022-9] [Citation(s) in RCA: 37] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 05/18/2015] [Indexed: 11/10/2022]
Abstract
Biological network alignment aims to find regions of topological and functional (dis)similarities between molecular networks of different species. Then, network alignment can guide the transfer of biological knowledge from well-studied model species to less well-studied species between conserved (aligned) network regions, thus complementing valuable insights that have already been provided by genomic sequence alignment. Here, we review computational challenges behind the network alignment problem, existing approaches for solving the problem, ways of evaluating their alignment quality, and the approaches' biomedical applications. We discuss recent innovative efforts of improving the existing view of network alignment. We conclude with open research questions in comparative biological network research that could further our understanding of principles of life, evolution, disease, and therapeutics.
Collapse
Affiliation(s)
- Fazle E Faisal
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Lei Meng
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Joseph Crawford
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| | - Tijana Milenković
- Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556 USA
- Interdisciplinary Center for Network Science and Applications, University of Notre Dame, Notre Dame, IN, 46556 USA
- ECK Institute for Global Health, University of Notre Dame, Notre Dame, IN, 46556 USA
| |
Collapse
|
34
|
Mazandu GK, Chimusa ER, Mbiyavanga M, Mulder NJ. A-DaGO-Fun: an adaptable Gene Ontology semantic similarity-based functional analysis tool. Bioinformatics 2015; 32:477-9. [PMID: 26476781 DOI: 10.1093/bioinformatics/btv590] [Citation(s) in RCA: 20] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2015] [Accepted: 10/08/2015] [Indexed: 01/01/2023] Open
Abstract
SUMMARY Gene Ontology (GO) semantic similarity measures are being used for biological knowledge discovery based on GO annotations by integrating biological information contained in the GO structure into data analyses. To empower users to quickly compute, manipulate and explore these measures, we introduce A-DaGO-Fun (ADaptable Gene Ontology semantic similarity-based Functional analysis). It is a portable software package integrating all known GO information content-based semantic similarity measures and relevant biological applications associated with these measures. A-DaGO-Fun has the advantage not only of handling datasets from the current high-throughput genome-wide applications, but also allowing users to choose the most relevant semantic similarity approach for their biological applications and to adapt a given module to their needs. AVAILABILITY AND IMPLEMENTATION A-DaGO-Fun is freely available to the research community at http://web.cbio.uct.ac.za/ITGOM/adagofun. It is implemented in Linux using Python under free software (GNU General Public Licence). CONTACT gmazandu@cbio.uct.ac.za or Nicola.Mulder@uct.ac.za SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Gaston K Mazandu
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and African Institute for Mathematical Sciences (AIMS), Cape Town, South Africa and Cape Coast, Ghana
| | - Emile R Chimusa
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| | - Mamana Mbiyavanga
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| | - Nicola J Mulder
- Computational Biology Group, Department of Integrative Biomedical Sciences, Institute of Infectious Disease and Molecular Medicine, University of Cape Town, Cape Town, South Africa and
| |
Collapse
|
35
|
Nepomuceno JA, Troncoso A, Nepomuceno-Chamorro IA, Aguilar-Ruiz JS. Integrating biological knowledge based on functional annotations for biclustering of gene expression data. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2015; 119:163-80. [PMID: 25843807 DOI: 10.1016/j.cmpb.2015.02.010] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/22/2014] [Revised: 02/17/2015] [Accepted: 02/27/2015] [Indexed: 05/06/2023]
Abstract
Gene expression data analysis is based on the assumption that co-expressed genes imply co-regulated genes. This assumption is being reformulated because the co-expression of a group of genes may be the result of an independent activation with respect to the same experimental condition and not due to the same regulatory regime. For this reason, traditional techniques are recently being improved with the use of prior biological knowledge from open-access repositories together with gene expression data. Biclustering is an unsupervised machine learning technique that searches patterns in gene expression data matrices. A scatter search-based biclustering algorithm that integrates biological information is proposed in this paper. In addition to the gene expression data matrix, the input of the algorithm is only a direct annotation file that relates each gene to a set of terms from a biological repository where genes are annotated. Two different biological measures, FracGO and SimNTO, are proposed to integrate this information by means of its addition to-be-optimized fitness function in the scatter search scheme. The measure FracGO is based on the biological enrichment and SimNTO is based on the overlapping among GO annotations of pairs of genes. Experimental results evaluate the proposed algorithm for two datasets and show the algorithm performs better when biological knowledge is integrated. Moreover, the analysis and comparison between the two different biological measures is presented and it is concluded that the differences depend on both the data source and how the annotation file has been built in the case GO is used. It is also shown that the proposed algorithm obtains a greater number of enriched biclusters than other classical biclustering algorithms typically used as benchmark and an analysis of the overlapping among biclusters reveals that the biclusters obtained present a low overlapping. The proposed methodology is a general-purpose algorithm which allows the integration of biological information from several sources and can be extended to other biclustering algorithms based on the optimization of a merit function.
Collapse
Affiliation(s)
- Juan A Nepomuceno
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain.
| | - Alicia Troncoso
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| | - Isabel A Nepomuceno-Chamorro
- Departamento de Lenguajes y Sistemas Informáticos, Universidad de Sevilla, Avd. Reina Mercedes s/n, 41012 Seville, Spain
| | - Jesús S Aguilar-Ruiz
- Department of Computer Engineering, Pablo de Olavide University, Ctra. Utrera km. 1, 41013 Seville, Spain
| |
Collapse
|