Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 2013;29:1424-32. [PMID: 23572412 DOI: 10.1093/bioinformatics/btt160] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

For:	Teng Z, Guo M, Liu X, Dai Q, Wang C, Xuan P. Measuring gene functional similarity based on group-wise comparison of GO terms. Bioinformatics 2013;29:1424-32. [PMID: 23572412 DOI: 10.1093/bioinformatics/btt160] [Citation(s) in RCA: 72] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/13/2022] Open

Number

Cited by Other Article(s)

Mongardi S, Cascianelli S, Masseroli M. Biologically weighted LASSO: enhancing functional interpretability in gene expression data analysis. Bioinformatics 2024;40:btae605. [PMID: 39412436 PMCID: PMC11639179 DOI: 10.1093/bioinformatics/btae605] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/14/2024] [Revised: 09/12/2024] [Accepted: 10/14/2024] [Indexed: 11/01/2024] Open

Jiang H, Wang Y, Yin C, Pan H, Chen L, Feng K, Chang Y, Sun H. SLIVER: Unveiling large scale gene regulatory networks of single-cell transcriptomic data through causal structure learning and modules aggregation. Comput Biol Med 2024;178:108690. [PMID: 38879931 DOI: 10.1016/j.compbiomed.2024.108690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/19/2024] [Accepted: 06/01/2024] [Indexed: 06/18/2024]

Abstract

Prevalent Gene Regulatory Network (GRN) construction methods rely on generalized correlation analysis. However, in biological systems, regulation is essentially a causal relationship that cannot be adequately captured solely through correlation. Therefore, it is more reasonable to infer GRNs from a causal perspective. Existing causal discovery algorithms typically rely on Directed Acyclic Graphs (DAGs) to model causal relationships, but it often requires traversing the entire network, which result in computational demands skyrocketing as the number of nodes grows and make causal discovery algorithms only suitable for small networks with one or two hundred nodes or fewer. In this study, we propose the SLIVER (cauSaL dIscovery Via dimEnsionality Reduction) algorithm which integrates causal structural equation model and graph decomposition. SLIVER introduces a set of factor nodes, serving as abstractions of different functional modules to integrate the regulatory relationships between genes based on their respective functions or pathways, thus reducing the GRN to the product of two low-dimensional matrices. Subsequently, we employ the structural causal model (SCM) to learn the GRN within the gene node space, enforce the DAG constraint in the low-dimensional space, and guide each factor to aggregate various functions through cosine similarity. We evaluate the performance of the SLIVER algorithm on 12 real single cell transcriptomic datasets, and demonstrate it outperforms other 12 widely used methods both in GRN inference performance and computational resource usage. The analysis of the gene information integrated by factor nodes also demonstrate the biological explanation of factor nodes in GRNs. We apply it to scRNA-seq of Type 2 diabetes mellitus to capture the transcriptional regulatory structural changes of β cells under high insulin demand.

Collapse

Pati SK, Gupta MK, Banerjee A, Mallik S, Zhao Z. PPIGCF: A Protein-Protein Interaction-Based Gene Correlation Filter for Optimal Gene Selection. Genes (Basel) 2023;14:genes14051063. [PMID: 37239423 DOI: 10.3390/genes14051063] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2023] [Revised: 04/26/2023] [Accepted: 05/04/2023] [Indexed: 05/28/2023] Open

Kartheeswaran KP, Rayan AXA, Varrieth GT. Enhanced disease-disease association with information enriched disease representation. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023;20:8892-8932. [PMID: 37161227 DOI: 10.3934/mbe.2023391] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]

Abstract

OBJECTIVE

Quantification of disease-disease association (DDA) enables the understanding of disease relationships for discovering disease progression and finding comorbidity. For effective DDA strength calculation, there is a need to address the main challenge of integration of various biomedical aspects of DDA is to obtain an information rich disease representation.

MATERIALS AND METHODS

An enhanced and integrated DDA framework is developed that integrates enriched literature-based with concept-based DDA representation. The literature component of the proposed framework uses PubMed abstracts and consists of improved neural network model that classifies DDAs for an enhanced literature-based DDA representation. Similarly, an ontology-based joint multi-source association embedding model is proposed in the ontology component using Disease Ontology (DO), UMLS, claims insurance, clinical notes etc. Results and Discussion: The obtained information rich disease representation is evaluated on different aspects of DDA datasets such as Gene, Variant, Gene Ontology (GO) and a human rated benchmark dataset. The DDA scores calculated using the proposed method achieved a high correlation mainly in gene-based dataset. The quantified scores also shown better correlation of 0.821, when evaluated on human rated 213 disease pairs. In addition, the generated disease representation is proved to have substantial effect on correlation of DDA scores for different categories of disease pairs.

CONCLUSION

The enhanced context and semantic DDA framework provides an enriched disease representation, resulting in high correlated results with different DDA datasets. We have also presented the biological interpretation of disease pairs. The developed framework can also be used for deriving the strength of other biomedical associations.

Collapse

Tian Z, Fang H, Teng Z, Ye Y. GOGCN: Graph Convolutional Network on Gene Ontology for Functional Similarity Analysis of Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023;20:1053-1064. [PMID: 35687647 DOI: 10.1109/tcbb.2022.3181300] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]

Ismail E, Gad W, Hashem M. HEC-ASD: a hybrid ensemble-based classification model for predicting autism spectrum disorder disease genes. BMC Bioinformatics 2022;23:554. [PMID: 36544099 PMCID: PMC9768984 DOI: 10.1186/s12859-022-05099-7] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2022] [Accepted: 12/06/2022] [Indexed: 12/24/2022] Open

Tian Z, Peng X, Fang H, Zhang W, Dai Q, Ye Y. MHADTI: predicting drug-target interactions via multiview heterogeneous information network embedding with hierarchical attention mechanisms. Brief Bioinform 2022;23:6761042. [PMID: 36242566 DOI: 10.1093/bib/bbac434] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/05/2022] [Revised: 08/19/2022] [Accepted: 09/08/2022] [Indexed: 12/14/2022] Open

Abstract

MOTIVATION

Discovering the drug-target interactions (DTIs) is a crucial step in drug development such as the identification of drug side effects and drug repositioning. Since identifying DTIs by web-biological experiments is time-consuming and costly, many computational-based approaches have been proposed and have become an efficient manner to infer the potential interactions. Although extensive effort is invested to solve this task, the prediction accuracy still needs to be improved. More especially, heterogeneous network-based approaches do not fully consider the complex structure and rich semantic information in these heterogeneous networks. Therefore, it is still a challenge to predict DTIs efficiently.

RESULTS

In this study, we develop a novel method via Multiview heterogeneous information network embedding with Hierarchical Attention mechanisms to discover potential Drug-Target Interactions (MHADTI). Firstly, MHADTI constructs different similarity networks for drugs and targets by utilizing their multisource information. Combined with the known DTI network, three drug-target heterogeneous information networks (HINs) with different views are established. Secondly, MHADTI learns embeddings of drugs and targets from multiview HINs with hierarchical attention mechanisms, which include the node-level, semantic-level and graph-level attentions. Lastly, MHADTI employs the multilayer perceptron to predict DTIs with the learned deep feature representations. The hierarchical attention mechanisms could fully consider the importance of nodes, meta-paths and graphs in learning the feature representations of drugs and targets, which makes their embeddings more comprehensively. Extensive experimental results demonstrate that MHADTI performs better than other SOTA prediction models. Moreover, analysis of prediction results for some interested drugs and targets further indicates that MHADTI has advantages in discovering DTIs.

AVAILABILITY AND IMPLEMENTATION

https://github.com/pxystudy/MHADTI.

Collapse

Pati SK, Gupta MK, Shai R, Banerjee A, Ghosh A. Missing value estimation of microarray data using Sim-GAN. Knowl Inf Syst 2022. [DOI: 10.1007/s10115-022-01718-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/16/2022]

Tian Z, Fang H, Ye Y, Zhu Z. A novel gene functional similarity calculation model by utilizing the specificity of terms and relationships in gene ontology. BMC Bioinformatics 2022;23:47. [PMID: 35057740 PMCID: PMC8772239 DOI: 10.1186/s12859-022-04557-6] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 01/03/2022] [Indexed: 11/18/2022] Open

Pesaranghader A, Matwin S, Sokolova M, Grenier JC, Beiko RG, Hussin J. OUP accepted manuscript. Bioinformatics 2022;38:3051-3061. [PMID: 35536192 PMCID: PMC9154256 DOI: 10.1093/bioinformatics/btac304] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2021] [Revised: 02/12/2022] [Indexed: 11/24/2022] Open

Paul M, Anand A. A New Family of Similarity Measures for Scoring Confidence of Protein Interactions Using Gene Ontology. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022;19:19-30. [PMID: 34029194 DOI: 10.1109/tcbb.2021.3083150] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]

Nourani E. GoVec: Gene Ontology Representation Learning Using Weighted Heterogeneous Graph and Meta-Path. J Comput Biol 2021;28:1196-1207. [PMID: 34847734 DOI: 10.1089/cmb.2021.0069] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/30/2022] Open

An Efficient Parallelized Ontology Network-Based Semantic Similarity Measure for Big Biomedical Document Clustering. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2021;2021:7937573. [PMID: 34795792 PMCID: PMC8594978 DOI: 10.1155/2021/7937573] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 05/27/2021] [Accepted: 10/11/2021] [Indexed: 01/03/2023]

Chen Q, Li Y, Tan K, Qiao Y, Pan S, Jiang T, Chen YPP. Network-based methods for gene function prediction. Brief Funct Genomics 2021;20:249-257. [PMID: 33686431 DOI: 10.1093/bfgp/elab006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2020] [Revised: 01/25/2021] [Accepted: 01/26/2021] [Indexed: 12/23/2022] Open

Zhou G, Wang J, Zhang X, Guo M, Yu G. Predicting functions of maize proteins using graph convolutional network. BMC Bioinformatics 2020;21:420. [PMID: 33323113 PMCID: PMC7739465 DOI: 10.1186/s12859-020-03745-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/27/2022] Open

Abstract

Background

Maize (Zea mays ssp. mays L.) is the most widely grown and yield crop in the world, as well as an important model organism for fundamental research of the function of genes. The functions of Maize proteins are annotated using the Gene Ontology (GO), which has more than 40000 terms and organizes GO terms in a direct acyclic graph (DAG). It is a huge challenge to accurately annotate relevant GO terms to a Maize protein from such a large number of candidate GO terms. Some deep learning models have been proposed to predict the protein function, but the effectiveness of these approaches is unsatisfactory. One major reason is that they inadequately utilize the GO hierarchy.

Results

To use the knowledge encoded in the GO hierarchy, we propose a deep Graph Convolutional Network (GCN) based model (DeepGOA) to predict GO annotations of proteins. DeepGOA firstly quantifies the correlations (or edges) between GO terms and updates the edge weights of the DAG by leveraging GO annotations and hierarchy, then learns the semantic representation and latent inter-relations of GO terms in the way by applying GCN on the updated DAG. Meanwhile, Convolutional Neural Network (CNN) is used to learn the feature representation of amino acid sequences with respect to the semantic representations. After that, DeepGOA computes the dot product of the two representations, which enable to train the whole network end-to-end coherently. Extensive experiments show that DeepGOA can effectively integrate GO structural information and amino acid information, and then annotates proteins accurately.

Conclusions

Experiments on Maize PH207 inbred line and Human protein sequence dataset show that DeepGOA outperforms the state-of-the-art deep learning based methods. The ablation study proves that GCN can employ the knowledge of GO and boost the performance. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=DeepGOA.

Collapse

Zhong X, Rajapakse JC. Graph embeddings on gene ontology annotations for protein-protein interaction prediction. BMC Bioinformatics 2020;21:560. [PMID: 33323115 PMCID: PMC7739483 DOI: 10.1186/s12859-020-03816-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2020] [Accepted: 10/13/2020] [Indexed: 01/15/2023] Open

Parraga-Alava J, Inostroza-Ponta M. Influence of the go-based semantic similarity measures in multi-objective gene clustering algorithm performance. J Bioinform Comput Biol 2020;18:2050038. [PMID: 33148094 DOI: 10.1142/s0219720020500389] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022]

Li JK, Li L, Li W, Wang Z, Gao F, Hu FY, Zhang S, Qu SF, Huang J, Wang LS, Wu JH, Chen F. Panel-based targeted exome sequencing reveals novel candidate susceptibility loci for age-related cataracts in Chinese Cohort. Mol Genet Genomic Med 2020;8:e1218. [PMID: 32337810 PMCID: PMC7336732 DOI: 10.1002/mgg3.1218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/24/2019] [Revised: 02/05/2020] [Accepted: 02/25/2020] [Indexed: 01/19/2023] Open

Abstract

BACKGROUND

Age-related cataracts (ARC) is the most common blinding eye disease worldwide, and its incidence tend to become younger. However, the relationship between genetic factors and mechanisms is not fully understood. The aim of the study was to further clarify the relationship between ARC and genetic mechanisms in East Asian populations and to elucidate the pathogenesis.

METHODS

The study collected 191 sporadic cataracts and 208 healthy people from the eastern provinces of China, with an average age of about 60 years. All participants were subjected to a comprehensive ophthalmic clinical examination and peripheral blood samples were collected and their genomic DNA was extracted. Mutations were screened among 792 candidate genes to enhance understanding of the disease through targeted capture and high-throughput sequencing.

RESULTS

We identified novel candidate susceptibility gene, which may serve as a potential susceptibility factor leading to an increase in the incidence of age-related cataracts. Three novel loci are associated with age-related cataracts significant significance: rs129882 in DBH (p = 5.27E-07, odds ratio = 3.9), rs1800280 in DMD (p = 2.85E-06, odds ratio = 1.4) and rs2871776 in ATP13A2 (p = 4.18E-05, odds ratio = 0.04). Gene-gene interaction analysis revealed that the most significant interactions between genes include the interaction between DBH and TUB (rs17847537 in TUB, rs129882 in DBH, p-value = 2.12E-14), and the interaction between DBH and DMD (rs1800280 in DMD, rs129882 in DBH, p-value = 2.12E-14). Pathway analysis shows that the most significant processes are concentrated in response to light stimulation (adjusted p-Value = 5.56E-03), response to radiation (adjusted P-Value = 5.56E-03), abiotic stimulus (adjusted p-Value = 5.56E-03). eQTL analysis shows that DBH rs129882 could regulate the expression of DBH mRNA in various tissues including retina.

CONCLUSION

Our study indicates rs129882 and rs1800280 loci are associated with age-related cataracts, which enlarge the gene map of age-related cataracts.

Collapse

Affiliation(s)

Jian-Kang Li Dept of Computer ScienceCity University of Hong KongKowloonHong Kong BGI‐ShenzhenShenzhenChina Guangdong Provincial Key Laboratory of Human Disease Genomics Shenzhen Key Laboratory of GenomicsBGI-ShenzhenShanghaiChina
Li‐Li Li National Institutes of food and drug Control (NIFDC)BeijingP. R. China
Wei Li BGI‐ShenzhenShenzhenChina Guangdong Provincial Key Laboratory of Human Disease Genomics Shenzhen Key Laboratory of GenomicsBGI-ShenzhenShanghaiChina BGI Education CenterUniversity of Chinese Academy of SciencesShenzhenChina
Zi‐Wei Wang BGI‐ShenzhenShenzhenChina BGI Education CenterUniversity of Chinese Academy of SciencesShenzhenChina
Feng‐Juan Gao Eye Institute, Eye and ENT HospitalCollege of MedicineFudan UniversityShanghaiChina Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai MunicipalityShanghaiChina Key Laboratory of MyopiaMinistry of HealthShanghaiChina
Fang-Yuan Hu Eye Institute, Eye and ENT HospitalCollege of MedicineFudan UniversityShanghaiChina Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai MunicipalityShanghaiChina Key Laboratory of MyopiaMinistry of HealthShanghaiChina
Sheng‐Hai Zhang Eye Institute, Eye and ENT HospitalCollege of MedicineFudan UniversityShanghaiChina Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai MunicipalityShanghaiChina Key Laboratory of MyopiaMinistry of HealthShanghaiChina
Shou-Fang Qu National Institutes of food and drug Control (NIFDC)BeijingP. R. China
Jie Huang National Institutes of food and drug Control (NIFDC)BeijingP. R. China
Lu-Sheng Wang Dept of Computer ScienceCity University of Hong KongKowloonHong Kong BGI‐ShenzhenShenzhenChina
Ji-Hong Wu Eye Institute, Eye and ENT HospitalCollege of MedicineFudan UniversityShanghaiChina Shanghai Key Laboratory of Visual Impairment and Restoration, Science and Technology Commission of Shanghai MunicipalityShanghaiChina Key Laboratory of MyopiaMinistry of HealthShanghaiChina
Fang Chen BGI‐ShenzhenShenzhenChina Guangdong Provincial Key Laboratory of Human Disease Genomics Shenzhen Key Laboratory of GenomicsBGI-ShenzhenShanghaiChina

Collapse

Zhao Y, Wang J, Chen J, Zhang X, Guo M, Yu G. A Literature Review of Gene Function Prediction by Modeling Gene Ontology. Front Genet 2020;11:400. [PMID: 32391061 PMCID: PMC7193026 DOI: 10.3389/fgene.2020.00400] [Citation(s) in RCA: 48] [Impact Index Per Article: 9.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/30/2020] [Indexed: 12/14/2022] Open

GO2Vec: transforming GO terms and proteins to vector representations via graph embeddings. BMC Genomics 2019;20:918. [PMID: 31874639 PMCID: PMC8424702 DOI: 10.1186/s12864-019-6272-2] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2019] [Accepted: 11/12/2019] [Indexed: 12/13/2022] Open

Cockroft NT, Cheng X, Fuchs JR. STarFish: A Stacked Ensemble Target Fishing Approach and its Application to Natural Products. J Chem Inf Model 2019;59:4906-4920. [PMID: 31589422 DOI: 10.1021/acs.jcim.9b00489] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/22/2022]

Abstract

Target fishing is the process of identifying the protein target of a bioactive small molecule. To do so experimentally requires a significant investment of time and resources, which can be expedited with a reliable computational target fishing model. The development of computational target fishing models using machine learning has become very popular over the last several years because of the increased availability of large amounts of public bioactivity data. Unfortunately, the applicability and performance of such models for natural products has not yet been comprehensively assessed. This is, in part, due to the relative lack of bioactivity data available for natural products compared to synthetic compounds. Moreover, the databases commonly used to train such models do not annotate which compounds are natural products, which makes the collection of a benchmarking set difficult. To address this knowledge gap, a data set composed of natural product structures and their associated protein targets was generated by cross-referencing 20 publicly available natural product databases with the bioactivity database ChEMBL. This data set contains 5589 compound-target pairs for 1943 unique compounds and 1023 unique targets. A synthetic data set comprising 107 190 compound-target pairs for 88 728 unique compounds and 1907 unique targets was used to train k-nearest neighbors, random forest, and multilayer perceptron models. The predictive performance of each model was assessed by stratified 10-fold cross-validation and benchmarking on the newly collected natural product data set. Strong performance was observed for each model during cross-validation with area under the receiver operating characteristic (AUROC) scores ranging from 0.94 to 0.99 and Boltzmann-enhanced discrimination of receiver operating characteristic (BEDROC) scores from 0.89 to 0.94. When tested on the natural product data set, performance dramatically decreased with AUROC scores ranging from 0.70 to 0.85 and BEDROC scores from 0.43 to 0.59. However, the implementation of a model stacking approach, which uses logistic regression as a meta-classifier to combine model predictions, dramatically improved the ability to correctly predict the protein targets of natural products and increased the AUROC score to 0.94 and BEDROC score to 0.73. This stacked model was deployed as a web application, called STarFish, and has been made available for use to aid in target identification for natural products.

Collapse

Yang Y, Fu X, Qu W, Xiao Y, Shen HB. MiRGOFS: a GO-based functional similarity measurement for miRNAs, with applications to the prediction of miRNA subcellular localization and miRNA-disease association. Bioinformatics 2019;34:3547-3556. [PMID: 29718114 DOI: 10.1093/bioinformatics/bty343] [Citation(s) in RCA: 45] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2017] [Accepted: 04/26/2018] [Indexed: 01/22/2023] Open

Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019;13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]

Rajakovich LJ, Pandelia ME, Mitchell AJ, Chang WC, Zhang B, Boal AK, Krebs C, Bollinger JM. A New Microbial Pathway for Organophosphonate Degradation Catalyzed by Two Previously Misannotated Non-Heme-Iron Oxygenases. Biochemistry 2019;58:1627-1647. [PMID: 30789718 PMCID: PMC6503667 DOI: 10.1021/acs.biochem.9b00044] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]

Abstract

The assignment of biochemical functions to hypothetical proteins is challenged by functional diversification within many protein structural superfamilies. This diversification, which is particularly common for metalloenzymes, renders functional annotations that are founded solely on sequence and domain similarities unreliable and often erroneous. Definitive biochemical characterization to delineate functional subgroups within these superfamilies will aid in improving bioinformatic approaches for functional annotation. We describe here the structural and functional characterization of two non-heme-iron oxygenases, TmpA and TmpB, which are encoded by a genomically clustered pair of genes found in more than 350 species of bacteria. TmpA and TmpB are functional homologues of a pair of enzymes (PhnY and PhnZ) that degrade 2-aminoethylphosphonate but instead act on its naturally occurring, quaternary ammonium analogue, 2-(trimethylammonio)ethylphosphonate (TMAEP). TmpA, an iron(II)- and 2-(oxo)glutarate-dependent oxygenase misannotated as a γ-butyrobetaine (γbb) hydroxylase, shows no activity toward γbb but efficiently hydroxylates TMAEP. The product, ( R)-1-hydroxy-2-(trimethylammonio)ethylphosphonate [( R)-OH-TMAEP], then serves as the substrate for the second enzyme, TmpB. By contrast to its purported phosphohydrolytic activity, TmpB is an HD-domain oxygenase that uses a mixed-valent diiron cofactor to enact oxidative cleavage of the C-P bond of its substrate, yielding glycine betaine and phosphate. The high specificities of TmpA and TmpB for their N-trimethylated substrates suggest that they have evolved specifically to degrade TMAEP, which was not previously known to be subject to microbial catabolism. This study thus adds to the growing list of known pathways through which microbes break down organophosphonates to harvest phosphorus, carbon, and nitrogen in nutrient-limited niches.

Collapse

Affiliation(s)

Lauren J. Rajakovich Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Chemistry and Chemical Biology, Harvard University, Cambridge, Massachusetts 02138, United States
Maria-Eirini Pandelia Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Biochemistry, Brandeis University, Waltham, Massachusetts 02453, United States
Andrew J. Mitchell Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Whitehead Institute for Biomedical Research, Cambridge, Massachusetts 02142
Wei-chen Chang Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: Department of Chemistry, North Carolina State University, Raleigh, North Carolina 27695, United States
Bo Zhang Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Present address: REG Life Sciences, LLC, South San Francisco, California 94080
Amie K. Boal Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
Carsten Krebs Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States
J. Martin Bollinger Department of Biochemistry and Molecular Biology, The Pennsylvania State University, University Park, Pennsylvania 16802, United States Department of Chemistry, The Pennsylvania State University, University Park, Pennsylvania 16802, United States

Collapse

Handling Big Data Scalability in Biological Domain Using Parallel and Distributed Processing: A Case of Three Biological Semantic Similarity Measures. BIOMED RESEARCH INTERNATIONAL 2019;2019:6750296. [PMID: 30809545 PMCID: PMC6369486 DOI: 10.1155/2019/6750296] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 09/12/2018] [Accepted: 01/13/2019] [Indexed: 11/30/2022]

Abstract

In the field of biology, researchers need to compare genes or gene products using semantic similarity measures (SSM). Continuous data growth and diversity in data characteristics comprise what is called big data; current biological SSMs cannot handle big data. Therefore, these measures need the ability to control the size of big data. We used parallel and distributed processing by splitting data into multiple partitions and applied SSM measures to each partition; this approach helped manage big data scalability and computational problems. Our solution involves three steps: split gene ontology (GO), data clustering, and semantic similarity calculation. To test this method, split GO and data clustering algorithms were defined and assessed for performance in the first two steps. Three of the best SSMs in biology [Resnik, Shortest Semantic Differentiation Distance (SSDD), and SORA] are enhanced by introducing threaded parallel processing, which is used in the third step. Our results demonstrate that introducing threads in SSMs reduced the time of calculating semantic similarity between gene pairs and improved performance of the three SSMs. Average time was reduced by 24.51% for Resnik, 22.93%, for SSDD, and 33.68% for SORA. Total time was reduced by 8.88% for Resnik, 23.14% for SSDD, and 39.27% for SORA. Using these threaded measures in the distributed system, combined with using split GO and data clustering algorithms to split input data based on their similarity, reduced the average time more than did the approach of equally dividing input data. Time reduction increased with increasing number of splits. Time reduction percentage was 24.1%, 39.2%, and 66.6% for Threaded SSDD; 33.0%, 78.2%, and 93.1% for Threaded SORA in the case of 2, 3, and 4 slaves, respectively; and 92.04% for Threaded Resnik in the case of four slaves.

Collapse

Acharya S, Saha S, Pradhan P. Novel symmetry-based gene-gene dissimilarity measures utilizing Gene Ontology: Application in gene clustering. Gene 2018;679:341-351. [PMID: 30184472 DOI: 10.1016/j.gene.2018.08.062] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/11/2018] [Revised: 08/21/2018] [Accepted: 08/21/2018] [Indexed: 11/25/2022]

Liu W, Liu J, Rajapakse JC. Gene Ontology Enrichment Improves Performances of Functional Similarity of Genes. Sci Rep 2018;8:12100. [PMID: 30108262 PMCID: PMC6092333 DOI: 10.1038/s41598-018-30455-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2017] [Accepted: 07/25/2018] [Indexed: 12/23/2022] Open

Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018;19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open

Liang X, Zhu L, Huang DS. Optimization of Gene Set Annotations Using Robust Trace-Norm Multitask Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018;15:1016-1021. [PMID: 28391202 DOI: 10.1109/tcbb.2017.2690427] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/07/2023]

Peng J, Zhang X, Hui W, Lu J, Li Q, Liu S, Shang X. Improving the measurement of semantic similarity by combining gene ontology and co-functional network: a random walk based approach. BMC SYSTEMS BIOLOGY 2018;12:18. [PMID: 29560823 PMCID: PMC5861498 DOI: 10.1186/s12918-018-0539-0] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]

Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017;18:573. [PMID: 29297309 PMCID: PMC5751813 DOI: 10.1186/s12859-017-1959-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Tian Z, Guo M, Wang C, Liu X, Wang S. Refine gene functional similarity network based on interaction networks. BMC Bioinformatics 2017;18:550. [PMID: 29297381 PMCID: PMC5751769 DOI: 10.1186/s12859-017-1969-1] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

HashGO: hashing gene ontology for protein function prediction. Comput Biol Chem 2017;71:264-273. [DOI: 10.1016/j.compbiolchem.2017.09.010] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2017] [Accepted: 09/25/2017] [Indexed: 10/18/2022]

Acharya S, Saha S, Nikhil N. Unsupervised gene selection using biological knowledge : application in sample clustering. BMC Bioinformatics 2017;18:513. [PMID: 29166852 PMCID: PMC5700545 DOI: 10.1186/s12859-017-1933-0] [Citation(s) in RCA: 21] [Impact Index Per Article: 2.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/08/2017] [Accepted: 11/08/2017] [Indexed: 11/10/2022] Open

Teng Z, Guo M, Liu X, Tian Z, Che K. Revealing protein functions based on relationships of interacting proteins and GO terms. J Biomed Semantics 2017;8:27. [PMID: 29297388 PMCID: PMC5763294 DOI: 10.1186/s13326-017-0139-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open

Peng J, Li Q, Shang X. Investigations on factors influencing HPO-based semantic similarity calculation. J Biomed Semantics 2017;8:34. [PMID: 29297376 PMCID: PMC5763495 DOI: 10.1186/s13326-017-0144-y] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open

Yu G, Lu C, Wang J. NoGOA: predicting noisy GO annotations using evidences and sparse representation. BMC Bioinformatics 2017;18:350. [PMID: 28732468 PMCID: PMC5521088 DOI: 10.1186/s12859-017-1764-z] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2017] [Accepted: 07/14/2017] [Indexed: 01/11/2023] Open

Abstract

BACKGROUND

Gene Ontology (GO) is a community effort to represent functional features of gene products. GO annotations (GOA) provide functional associations between GO terms and gene products. Due to resources limitation, only a small portion of annotations are manually checked by curators, and the others are electronically inferred. Although quality control techniques have been applied to ensure the quality of annotations, the community consistently report that there are still considerable noisy (or incorrect) annotations. Given the wide application of annotations, however, how to identify noisy annotations is an important but yet seldom studied open problem.

RESULTS

We introduce a novel approach called NoGOA to predict noisy annotations. NoGOA applies sparse representation on the gene-term association matrix to reduce the impact of noisy annotations, and takes advantage of sparse representation coefficients to measure the semantic similarity between genes. Secondly, it preliminarily predicts noisy annotations of a gene based on aggregated votes from semantic neighborhood genes of that gene. Next, NoGOA estimates the ratio of noisy annotations for each evidence code based on direct annotations in GOA files archived on different periods, and then weights entries of the association matrix via estimated ratios and propagates weights to ancestors of direct annotations using GO hierarchy. Finally, it integrates evidence-weighted association matrix and aggregated votes to predict noisy annotations. Experiments on archived GOA files of six model species (H. sapiens, A. thaliana, S. cerevisiae, G. gallus, B. Taurus and M. musculus) demonstrate that NoGOA achieves significantly better results than other related methods and removing noisy annotations improves the performance of gene function prediction.

CONCLUSIONS

The comparative study justifies the effectiveness of integrating evidence codes with sparse representation for predicting noisy GO annotations. Codes and datasets are available at http://mlda.swu.edu.cn/codes.php?name=NoGOA .

Collapse

Exploring Approaches for Detecting Protein Functional Similarity within an Orthology-based Framework. Sci Rep 2017;7:381. [PMID: 28336965 PMCID: PMC5428484 DOI: 10.1038/s41598-017-00465-5] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2016] [Accepted: 02/28/2017] [Indexed: 11/21/2022] Open

Shui Y, Cho YR. Alignment of PPI Networks Using Semantic Similarity for Conserved Protein Complex Prediction. IEEE Trans Nanobioscience 2017;15:380-389. [PMID: 28113907 DOI: 10.1109/tnb.2016.2555802] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/07/2022]

Pesquita C. Semantic Similarity in the Gene Ontology. Methods Mol Biol 2017;1446:161-173. [PMID: 27812942 DOI: 10.1007/978-1-4939-3743-1_12] [Citation(s) in RCA: 35] [Impact Index Per Article: 4.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/03/2022]

Yu G, Luo W, Fu G, Wang J. Interspecies gene function prediction using semantic similarity. BMC SYSTEMS BIOLOGY 2016;10:121. [PMID: 28155711 PMCID: PMC5260010 DOI: 10.1186/s12918-016-0361-5] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 01/09/2023]

Abstract

BACKGROUND

Gene Ontology (GO) is a collaborative project that maintains and develops controlled vocabulary (or terms) to describe the molecular function, biological roles and cellular location of gene products in a hierarchical ontology. GO also provides GO annotations that associate genes with GO terms. GO consortium independently and collaboratively annotate terms to gene products, mainly from model organisms (or species) they are interested in. Due to experiment ethics, research interests of biologists and resources limitations, homologous genes from different species currently are annotated with different terms. These differences can be more attributed to incomplete annotations of genes than to functional difference between them.

RESULTS

Semantic similarity between genes is derived from GO hierarchy and annotations of genes. It is positively correlated with the similarity derived from various types of biological data and has been applied to predict gene function. In this paper, we investigate whether it is possible to replenish annotations of incompletely annotated genes by using semantic similarity between genes from two species with homology. For this investigation, we utilize three representative semantic similarity metrics to compute similarity between genes from two species. Next, we determine the k nearest neighborhood genes from the two species based on the chosen metric and then use terms annotated to k neighbors of a gene to replenish annotations of that gene. We perform experiments on archived (from Jan-2014 to Jan-2016) GO annotations of four species (Human, Mouse, Danio rerio and Arabidopsis thaliana) to assess the contribution of semantic similarity between genes from different species. The experimental results demonstrate that: (1) semantic similarity between genes from homologous species contributes much more on the improved accuracy (by 53.22%) than genes from single species alone, and genes from two species with low homology; (2) GO annotations of genes from homologous species are complementary to each other.

CONCLUSIONS

Our study shows that semantic similarity based interspecies gene function annotation from homologous species is more prominent than traditional intraspecies approaches. This work can promote more research on semantic similarity based function prediction across species.

Collapse

Tian Z, Wang C, Guo M, Liu X, Teng Z. An improved method for functional similarity analysis of genes based on Gene Ontology. BMC SYSTEMS BIOLOGY 2016;10:119. [PMID: 28155727 PMCID: PMC5259995 DOI: 10.1186/s12918-016-0359-z] [Citation(s) in RCA: 17] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/11/2022]

Tian Z, Wang C, Guo M, Liu X, Teng Z. SGFSC: speeding the gene functional similarity calculation based on hash tables. BMC Bioinformatics 2016;17:445. [PMID: 27814675 PMCID: PMC5096311 DOI: 10.1186/s12859-016-1294-0] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/04/2016] [Accepted: 10/19/2016] [Indexed: 12/23/2022] Open

Lu C, Wang J, Zhang Z, Yang P, Yu G. NoisyGOA: Noisy GO annotations prediction using taxonomic and semantic similarity. Comput Biol Chem 2016;65:203-211. [PMID: 27670689 DOI: 10.1016/j.compbiolchem.2016.09.005] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2016] [Accepted: 09/07/2016] [Indexed: 10/21/2022]

A path-based measurement for human miRNA functional similarities using miRNA-disease associations. Sci Rep 2016;6:32533. [PMID: 27585796 PMCID: PMC5009308 DOI: 10.1038/srep32533] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2016] [Accepted: 08/04/2016] [Indexed: 01/09/2023] Open

Yu G, Fu G, Wang J, Zhu H. Predicting Protein Function via Semantic Integration of Multiple Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2016;13:220-232. [PMID: 26800544 DOI: 10.1109/tcbb.2015.2459713] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Pesaranghader A, Matwin S, Sokolova M, Beiko RG. simDEF: definition-based semantic similarity measure of gene ontology terms for functional similarity analysis of genes. Bioinformatics 2015;32:1380-7. [PMID: 26708333 DOI: 10.1093/bioinformatics/btv755] [Citation(s) in RCA: 22] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2015] [Accepted: 12/21/2015] [Indexed: 12/19/2022] Open

Cheng L, Li J, Hu Y, Jiang Y, Liu Y, Chu Y, Wang Z, Wang Y. Using Semantic Association to Extend and Infer Literature-Oriented Relativity Between Terms. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2015;12:1219-1226. [PMID: 26684460 DOI: 10.1109/tcbb.2015.2430289] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/05/2023]

Yu G, Zhu H, Domeniconi C, Liu J. Predicting protein function via downward random walks on a gene ontology. BMC Bioinformatics 2015;16:271. [PMID: 26310806 PMCID: PMC4551531 DOI: 10.1186/s12859-015-0713-y] [Citation(s) in RCA: 18] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/14/2015] [Accepted: 08/20/2015] [Indexed: 12/24/2022] Open

Abstract

Background

High-throughput bio-techniques accumulate ever-increasing amount of genomic and proteomic data. These data are far from being functionally characterized, despite the advances in gene (or gene’s product proteins) functional annotations. Due to experimental techniques and to the research bias in biology, the regularly updated functional annotation databases, i.e., the Gene Ontology (GO), are far from being complete. Given the importance of protein functions for biological studies and drug design, proteins should be more comprehensively and precisely annotated.

Results

We proposed downward Random Walks (dRW) to predict missing (or new) functions of partially annotated proteins. Particularly, we apply downward random walks with restart on the GO directed acyclic graph, along with the available functions of a protein, to estimate the probability of missing functions. To further boost the prediction accuracy, we extend dRW to dRW-kNN. dRW-kNN computes the semantic similarity between proteins based on the functional annotations of proteins; it then predicts functions based on the functions estimated by dRW, together with the functions associated with the k nearest proteins. Our proposed models can predict two kinds of missing functions: (i) the ones that are missing for a protein but associated with other proteins of interest; (ii) the ones that are not available for any protein of interest, but exist in the GO hierarchy. Experimental results on the proteins of Yeast and Human show that dRW and dRW-kNN can replenish functions more accurately than other related approaches, especially for sparse functions associated with no more than 10 proteins.

Conclusion

The empirical study shows that the semantic similarity between GO terms and the ontology hierarchy play important roles in predicting protein function. The proposed dRW and dRW-kNN can serve as tools for replenishing functions of partially annotated proteins.

Electronic supplementary material

The online version of this article (doi:10.1186/s12859-015-0713-y) contains supplementary material, which is available to authorized users.

Collapse

Zhang SB, Lai JH. Semantic similarity measurement between gene ontology terms based on exclusively inherited shared information. Gene 2015;558:108-17. [DOI: 10.1016/j.gene.2014.12.062] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/16/2014] [Revised: 12/15/2014] [Accepted: 12/24/2014] [Indexed: 11/25/2022]