1
|
Grady SK, Peterson KA, Murray SA, Baker EJ, Langston MA, Chesler EJ. A graph theoretical approach to experimental prioritization in genome-scale investigations. Mamm Genome 2024; 35:724-733. [PMID: 39191873 PMCID: PMC11522061 DOI: 10.1007/s00335-024-10066-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2024] [Accepted: 08/14/2024] [Indexed: 08/29/2024]
Abstract
The goal of systems biology is to gain a network level understanding of how gene interactions influence biological states, and ultimately inform upon human disease. Given the scale and scope of systems biology studies, resource constraints often limit researchers when validating genome-wide phenomena and potentially lead to an incomplete understanding of the underlying mechanisms. Further, prioritization strategies are often biased towards known entities (e.g. previously studied genes/proteins with commercially available reagents), and other technical issues that limit experimental breadth. Here, heterogeneous biological information is modeled as an association graph to which a high-performance minimum dominating set solver is applied to maximize coverage across the graph, and thus increase the breadth of experimentation. First, we tested our model on retrieval of existing gene functional annotations and demonstrated that minimum dominating set returns more diverse terms when compared to other computational methods. Next, we utilized our heterogenous network and minimum dominating set solver to assist in the process of identifying understudied genes to be interrogated by the International Mouse Phenotyping Consortium. Using an unbiased algorithmic strategy, poorly studied genes are prioritized from the remaining thousands of genes yet to be characterized. This method is tunable and extensible with the potential to incorporate additional user-defined prioritizing information. The minimum dominating set approach can be applied to any biological network in order to identify a tractable subset of features to test experimentally or to assist in prioritizing candidate genes associated with human disease.
Collapse
Affiliation(s)
- Stephen K Grady
- Graduate School of Genome Science and Technology, University of Tennessee, Knoxville, TN, USA.
| | | | | | - Erich J Baker
- Department of Computer Science, Baylor University, Waco, TX, USA
| | - Michael A Langston
- Department of Electrical Engineering and Computer Science, University of Tennessee, Knoxville, TN, USA
| | | |
Collapse
|
2
|
Jagodnik KM, Shvili Y, Bartal A. HetIG-PreDiG: A Heterogeneous Integrated Graph Model for Predicting Human Disease Genes based on gene expression. PLoS One 2023; 18:e0280839. [PMID: 36791052 PMCID: PMC9931161 DOI: 10.1371/journal.pone.0280839] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2022] [Accepted: 01/10/2023] [Indexed: 02/16/2023] Open
Abstract
Graph analytical approaches permit identifying novel genes involved in complex diseases, but are limited by (i) inferring structural network similarity of connected gene nodes, ignoring potentially relevant unconnected nodes; (ii) using homogeneous graphs, missing gene-disease associations' complexity; (iii) relying on disease/gene-phenotype associations' similarities, involving highly incomplete data; (iv) using binary classification, with gene-disease edges as positive training samples, and non-associated gene and disease nodes as negative samples that may include currently unknown disease genes; or (v) reporting predicted novel associations without systematically evaluating their accuracy. Addressing these limitations, we develop the Heterogeneous Integrated Graph for Predicting Disease Genes (HetIG-PreDiG) model that includes gene-gene, gene-disease, and gene-tissue associations. We predict novel disease genes using low-dimensional representation of nodes accounting for network structure, and extending beyond network structure using the developed Gene-Disease Prioritization Score (GDPS) reflecting the degree of gene-disease association via gene co-expression data. For negative training samples, we select non-associated gene and disease nodes with lower GDPS that are less likely to be affiliated. We evaluate the developed model's success in predicting novel disease genes by analyzing the prediction probabilities of gene-disease associations. HetIG-PreDiG successfully predicts (Micro-F1 = 0.95) gene-disease associations, outperforming baseline models, and is validated using published literature, thus advancing our understanding of complex genetic diseases.
Collapse
Affiliation(s)
- Kathleen M. Jagodnik
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- Department of Psychiatry, Harvard Medical School, Boston, MA, United States of America
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States of America
| | - Yael Shvili
- Department of Surgery A, Meir Medical Center, Kfar Sava, Israel
| | - Alon Bartal
- The School of Business Administration, Bar-Ilan University, Ramat Gan, Israel
- * E-mail:
| |
Collapse
|
3
|
Network-Based Methods for Approaching Human Pathologies from a Phenotypic Point of View. Genes (Basel) 2022; 13:genes13061081. [PMID: 35741843 PMCID: PMC9222217 DOI: 10.3390/genes13061081] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2022] [Revised: 06/10/2022] [Accepted: 06/14/2022] [Indexed: 01/27/2023] Open
Abstract
Network and systemic approaches to studying human pathologies are helping us to gain insight into the molecular mechanisms of and potential therapeutic interventions for human diseases, especially for complex diseases where large numbers of genes are involved. The complex human pathological landscape is traditionally partitioned into discrete “diseases”; however, that partition is sometimes problematic, as diseases are highly heterogeneous and can differ greatly from one patient to another. Moreover, for many pathological states, the set of symptoms (phenotypes) manifested by the patient is not enough to diagnose a particular disease. On the contrary, phenotypes, by definition, are directly observable and can be closer to the molecular basis of the pathology. These clinical phenotypes are also important for personalised medicine, as they can help stratify patients and design personalised interventions. For these reasons, network and systemic approaches to pathologies are gradually incorporating phenotypic information. This review covers the current landscape of phenotype-centred network approaches to study different aspects of human diseases.
Collapse
|
4
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
5
|
Vogt L. FAIR data representation in times of eScience: a comparison of instance-based and class-based semantic representations of empirical data using phenotype descriptions as example. J Biomed Semantics 2021; 12:20. [PMID: 34823588 PMCID: PMC8613519 DOI: 10.1186/s13326-021-00254-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/17/2021] [Accepted: 11/11/2021] [Indexed: 12/27/2022] Open
Abstract
BACKGROUND The size, velocity, and heterogeneity of Big Data outclasses conventional data management tools and requires data and metadata to be fully machine-actionable (i.e., eScience-compliant) and thus findable, accessible, interoperable, and reusable (FAIR). This can be achieved by using ontologies and through representing them as semantic graphs. Here, we discuss two different semantic graph approaches of representing empirical data and metadata in a knowledge graph, with phenotype descriptions as an example. Almost all phenotype descriptions are still being published as unstructured natural language texts, with far-reaching consequences for their FAIRness, substantially impeding their overall usability within the life sciences. However, with an increasing amount of anatomy ontologies becoming available and semantic applications emerging, a solution to this problem becomes available. Researchers are starting to document and communicate phenotype descriptions through the Web in the form of highly formalized and structured semantic graphs that use ontology terms and Uniform Resource Identifiers (URIs) to circumvent the problems connected with unstructured texts. RESULTS Using phenotype descriptions as an example, we compare and evaluate two basic representations of empirical data and their accompanying metadata in the form of semantic graphs: the class-based TBox semantic graph approach called Semantic Phenotype and the instance-based ABox semantic graph approach called Phenotype Knowledge Graph. Their main difference is that only the ABox approach allows for identifying every individual part and property mentioned in the description in a knowledge graph. This technical difference results in substantial practical consequences that significantly affect the overall usability of empirical data. The consequences affect findability, accessibility, and explorability of empirical data as well as their comparability, expandability, universal usability and reusability, and overall machine-actionability. Moreover, TBox semantic graphs often require querying under entailment regimes, which is computationally more complex. CONCLUSIONS We conclude that, from a conceptual point of view, the advantages of the instance-based ABox semantic graph approach outweigh its shortcomings and outweigh the advantages of the class-based TBox semantic graph approach. Therefore, we recommend the instance-based ABox approach as a FAIR approach for documenting and communicating empirical data and metadata in a knowledge graph.
Collapse
Affiliation(s)
- Lars Vogt
- TIB Leibniz Information Centre for Science and Technology, Welfengarten 1B, 30167, Hanover, Germany.
| |
Collapse
|
6
|
Chen J, Althagafi A, Hoehndorf R. Predicting candidate genes from phenotypes, functions and anatomical site of expression. Bioinformatics 2021; 37:853-860. [PMID: 33051643 PMCID: PMC8248315 DOI: 10.1093/bioinformatics/btaa879] [Citation(s) in RCA: 24] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/30/2020] [Revised: 08/26/2020] [Accepted: 09/28/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation Over the past years, many computational methods have been developed to
incorporate information about phenotypes for disease–gene
prioritization task. These methods generally compute the similarity between
a patient’s phenotypes and a database of gene-phenotype to find the
most phenotypically similar match. The main limitation in these methods is
their reliance on knowledge about phenotypes associated with particular
genes, which is not complete in humans as well as in many model organisms,
such as the mouse and fish. Information about functions of gene products and
anatomical site of gene expression is available for more genes and can also
be related to phenotypes through ontologies and machine-learning models. Results We developed a novel graph-based machine-learning method for biomedical
ontologies, which is able to exploit axioms in ontologies and other
graph-structured data. Using our machine-learning method, we embed genes
based on their associated phenotypes, functions of the gene products and
anatomical location of gene expression. We then develop a machine-learning
model to predict gene–disease associations based on the associations
between genes and multiple biomedical ontologies, and this model
significantly improves over state-of-the-art methods. Furthermore, we extend
phenotype-based gene prioritization methods significantly to all genes,
which are associated with phenotypes, functions or site of expression. Availability and implementation Software and data are available at https://github.com/bio-ontology-research-group/DL2Vec. Supplementary information Supplementary data
are available at Bioinformatics online.
Collapse
Affiliation(s)
- Jun Chen
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| | - Azza Althagafi
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia.,Computer Science Department, College of Computers and Information Technology, Taif University, Taif 26571, Saudi Arabia
| | - Robert Hoehndorf
- Computational Bioscience Research Center (CBRC), Computer, Electrical & Mathematical Sciences and Engineering (CEMSE) Division, King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia
| |
Collapse
|
7
|
Díaz-Santiago E, Jabato FM, Rojano E, Seoane P, Pazos F, Perkins JR, Ranea JAG. Phenotype-genotype comorbidity analysis of patients with rare disorders provides insight into their pathological and molecular bases. PLoS Genet 2020; 16:e1009054. [PMID: 33001999 PMCID: PMC7553355 DOI: 10.1371/journal.pgen.1009054] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2019] [Revised: 10/13/2020] [Accepted: 08/16/2020] [Indexed: 12/15/2022] Open
Abstract
Genetic and molecular analysis of rare disease is made difficult by the small numbers of affected patients. Phenotypic comorbidity analysis can help rectify this by combining information from individuals with similar phenotypes and looking for overlap in terms of shared genes and underlying functional systems. However, few studies have combined comorbidity analysis with genomic data. We present a computational approach that connects patient phenotypes based on phenotypic co-occurence and uses genomic information related to the patient mutations to assign genes to the phenotypes, which are used to detect enriched functional systems. These phenotypes are clustered using network analysis to obtain functionally coherent phenotype clusters. We applied the approach to the DECIPHER database, containing phenotypic and genomic information for thousands of patients with heterogeneous rare disorders and copy number variants. Validity was demonstrated through overlap with known diseases, co-mention within the biomedical literature, semantic similarity measures, and patient cluster membership. These connected pairs formed multiple phenotype clusters, showing functional coherence, and mapped to genes and systems involved in similar pathological processes. Examples include claudin genes from the 22q11 genomic region associated with a cluster of phenotypes related to DiGeorge syndrome and genes related to the GO term anterior/posterior pattern specification associated with abnormal development. The clusters generated can help with the diagnosis of rare diseases, by suggesting additional phenotypes for a given patient and potential underlying functional systems. Other tools to find causal genes based on phenotype were also investigated. The approach has been implemented as a workflow, named PhenCo, which can be adapted to any set of patients for which phenomic and genomic data is available. Full details of the analysis, including the clusters formed, their constituent functional systems and underlying genes are given. Code to implement the workflow is available from GitHub. Although rare diseases each affect a small number of people, taken together they affect millions. Better diagnosis and understanding of the underlying mechanisms are needed. By combining phenotypic data for many rare disease patients, we can build clusters of comorbid phenotypes that tend to co-occur together. By using genomic information, we can supplement these clusters and look for related genes and functional systems, such as pathways and molecular mechanisms. We applied such an approach to thousands of rare disease patients from the DECIPHER resources. We were able to detect hundreds of pairs of comorbid phenotypes, and use them to build tens of phenotype clusters. By mapping genes to these phenotypes, based on data from the same patients, we were able to detect related genes and functional systems, such as genes mapping to the 22q11 genomic region underlying a cluster of phenotypes related to DiGeorge syndrome. To ensure that these clusters made sensible predictions, results were validated using literature co-mention, overlap with known disease and semantic similarity measures. These comorbidity patterns, along with their underlying molecular systems, can give important insights into disease mechanisms, moreover they can be used to direct differential-diagnosis of rare disease patients.
Collapse
Affiliation(s)
- Elena Díaz-Santiago
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Fernando M. Jabato
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Elena Rojano
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
| | - Pedro Seoane
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
| | | | - James R. Perkins
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- The Biomedical Research Institute of Malaga (IBIMA), Malaga, Spain
- * E-mail:
| | - Juan A. G. Ranea
- Department of Molecular Biology and Biochemistry, University of Malaga, Malaga, Spain
- CIBER de Enfermedades Raras (CIBERER), ISCIII, Madrid, Spain
- The Biomedical Research Institute of Malaga (IBIMA), Malaga, Spain
| |
Collapse
|
8
|
Peng J, Guan J, Hui W, Shang X. A novel subnetwork representation learning method for uncovering disease-disease relationships. Methods 2020; 192:77-84. [PMID: 32946974 DOI: 10.1016/j.ymeth.2020.09.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/20/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022] Open
Abstract
Analyzing disease-disease relationships plays an important role for understanding disease mechanisms and finding alternative uses for a drug. A disease is usually the result of abnormal state of multiple molecular process. Since biological networks can model the interplay of multiple molecular processes, network-based methods have been proposed to uncover the disease-disease relationships recently. Given a disease and a network, the disease could be represented as a subnetwork constructed by the disease genes involved in the given network, named disease subnetwork. Because it is difficult to learn the feature representation of disease subnetworks, most existing methods are unsupervised ones without using labeled information. To fill this gap, we propose a novel method named SubNet2vec to learn the feature vectors of diseases from their corresponding subnetwork in the biological network. By utilizing the feature representation of disease subnetwork, we can analyze disease-disease relationships in a supervised fashion. The evaluation results show that the proposed framework outperforms some state-of-the-art approaches in a large margin on disease-disease/disease-drug association prediction. The source code and data are available athttps://github.com/MedicineBiology-AI/SubNet2vec.git.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| | - Jiaojiao Guan
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| | - Weiwei Hui
- Vivo mobile communications (Hang Zhou) co. LTD, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| |
Collapse
|
9
|
Masoudi-Sobhanzadeh Y, Masoudi-Nejad A. Synthetic repurposing of drugs against hypertension: a datamining method based on association rules and a novel discrete algorithm. BMC Bioinformatics 2020; 21:313. [PMID: 32677879 PMCID: PMC7469914 DOI: 10.1186/s12859-020-03644-w] [Citation(s) in RCA: 11] [Impact Index Per Article: 2.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/21/2020] [Accepted: 07/06/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Drug repurposing aims to detect the new therapeutic benefits of the existing drugs and reduce the spent time and cost of the drug development projects. The synthetic repurposing of drugs may prove to be more useful than the single repurposing in terms of reducing toxicity and enhancing efficacy. However, the researchers have not given it serious consideration. To address the issue, a novel datamining method is introduced and applied to repositioning of drugs for hypertension (HT) which is a serious medical condition and needs some improved treatment plans to help treat it. RESULTS A novel two-step data mining method, which is based on the If-Then association rules as well as a novel discrete optimization algorithm, was introduced and applied to the synthetic repurposing of drugs for HT. The required data were also extracted from DrugBank, KEGG, and DrugR+ databases. The findings indicated that based on the different statistical criteria, the proposed method outperformed the other state-of-the-art approaches. In contrast to the previously proposed methods which had failed to discover a list on some datasets, our method could find a combination list for all of them. CONCLUSION Since the proposed synthetic method uses medications in small dosages, it might revive some failed drug development projects and put forward a suitable plan for treating different diseases such as COVID-19 and HT. It is also worth noting that applying efficient computational methods helps to produce better results.
Collapse
Affiliation(s)
- Yosef Masoudi-Sobhanzadeh
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| | - Ali Masoudi-Nejad
- Laboratory of Systems Biology and Bioinformatics (LBB), Institute of Biochemistry and Biophysics, University of Tehran, Tehran, Iran
| |
Collapse
|
10
|
Peng J, Zhu L, Wang Y, Chen J. Mining Relationships among Multiple Entities in Biological Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:769-776. [PMID: 30872239 DOI: 10.1109/tcbb.2019.2904965] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/09/2023]
Abstract
Identifying topological relationships among multiple entities in biological networks is critical towards the understanding of the organizational principles of network functionality. Theoretically, this problem can be solved using minimum Steiner tree (MSTT) algorithms. However, due to large network size, it remains to be computationally challenging, and the predictive value of multi-entity topological relationships is still unclear. We present a novel solution called Cluster-based Steiner Tree Miner (CST-Miner) to instantly identify multi-entity topological relationships in biological networks. Given a list of user-specific entities, CST-Miner decomposes a biological network into nested cluster-based subgraphs, on which multiple minimum Steiner trees are identified. By merging all of them into a minimum cost tree, the optimal topological relationships among all the user-specific entities are revealed. Experimental results showed that CST-Miner can finish in nearly log-linear time and the tree constructed by CST-Miner is close to the global minimum.
Collapse
|
11
|
Peng J, Lu G, Xue H, Wang T, Shang X. TS-GOEA: a web tool for tissue-specific gene set enrichment analysis based on gene ontology. BMC Bioinformatics 2019; 20:572. [PMID: 31760951 PMCID: PMC6876092 DOI: 10.1186/s12859-019-3125-6] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/15/2022] Open
Abstract
BACKGROUND The Gene Ontology (GO) knowledgebase is the world's largest source of information on the functions of genes. Since the beginning of GO project, various tools have been developed to perform GO enrichment analysis experiments. GO enrichment analysis has become a commonly used method of gene function analysis. Existing GO enrichment analysis tools do not consider tissue-specific information, although this information is very important to current research. RESULTS In this paper, we built an easy-to-use web tool called TS-GOEA that allows users to easily perform experiments based on tissue-specific GO enrichment analysis. TS-GOEA uses strict threshold statistical method for GO enrichment analysis, and provides statistical tests to improve the reliability of the analysis results. Meanwhile, TS-GOEA provides tools to compare different experimental results, which is convenient for users to compare the experimental results. To evaluate its performance, we tested the genes associated with platelet disease with TS-GOEA. CONCLUSIONS TS-GOEA is an effective GO analysis tool with unique features. The experimental results show that our method has better performance and provides a useful supplement for the existing GO enrichment analysis tools. TS-GOEA is available at http://120.77.47.2:5678.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Guilin Lu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Tao Wang
- School of Computer Science, Harbin Institute of Technology, Harbin, 150001 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| |
Collapse
|
12
|
Wang Z, Wang Y. Extracting a biologically latent space of lung cancer epigenetics with variational autoencoders. BMC Bioinformatics 2019; 20:568. [PMID: 31760935 PMCID: PMC6876071 DOI: 10.1186/s12859-019-3130-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022] Open
Abstract
Background Lung cancer is one of the most malignant tumors, causing over 1,000,000 deaths each year worldwide. Deep learning has brought success in many domains in recent years. DNA methylation, an epigenetic factor, is used for model training in many studies. There is an opportunity for deep learning methods to analyze the lung cancer epigenetic data to determine their subtypes for appropriate treatment. Results Here, we employ variational autoencoders (VAEs), an unsupervised deep learning framework, on 450K DNA methylation data of TCGA-LUAD and TCGA-LUSC to learn latent representations of the DNA methylation landscape. We extract a biologically relevant latent space of LUAD and LUSC samples. It is showed that the bivariate classifiers on the further compressed latent features could classify the subtypes accurately. Through clustering of methylation-based latent space features, we demonstrate that the VAEs can capture differential methylation patterns about subtypes of lung cancer. Conclusions VAEs can distinguish the original subtypes from manually mixed methylation data frame with the encoded features of latent space. Further applications about VAEs should focus on fine-grained subtypes identification for precision medicine.
Collapse
Affiliation(s)
- Zhenxing Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
13
|
Wang Y, Juan L, Peng J, Zang T, Wang Y. Prioritizing candidate diseases-related metabolites based on literature and functional similarity. BMC Bioinformatics 2019; 20:574. [PMID: 31760947 PMCID: PMC6876110 DOI: 10.1186/s12859-019-3127-4] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/18/2022] Open
Abstract
Background As the terminal products of cellular regulatory process, functional related metabolites have a close relationship with complex diseases, and are often associated with the same or similar diseases. Therefore, identification of disease related metabolites play a critical role in understanding comprehensively pathogenesis of disease, aiming at improving the clinical medicine. Considering that a large number of metabolic markers of diseases need to be explored, we propose a computational model to identify potential disease-related metabolites based on functional relationships and scores of referred literatures between metabolites. First, obtaining associations between metabolites and diseases from the Human Metabolome database, we calculate the similarities of metabolites based on modified recommendation strategy of collaborative filtering utilizing the similarities between diseases. Next, a disease-associated metabolite network (DMN) is built with similarities between metabolites as weight. To improve the ability of identifying disease-related metabolites, we introduce scores of text mining from the existing database of chemicals and proteins into DMN and build a new disease-associated metabolite network (FLDMN) by fusing functional associations and scores of literatures. Finally, we utilize random walking with restart (RWR) in this network to predict candidate metabolites related to diseases. Results We construct the disease-associated metabolite network and its improved network (FLDMN) with 245 diseases, 587 metabolites and 28,715 disease-metabolite associations. Subsequently, we extract training sets and testing sets from two different versions of the Human Metabolome database and assess the performance of DMN and FLDMN on 19 diseases, respectively. As a result, the average AUC (area under the receiver operating characteristic curve) of DMN is 64.35%. As a further improved network, FLDMN is proven to be successful in predicting potential metabolic signatures for 19 diseases with an average AUC value of 76.03%. Conclusion In this paper, a computational model is proposed for exploring metabolite-disease pairs and has good performance in predicting potential metabolites related to diseases through adequate validation. This result suggests that integrating literature and functional associations can be an effective way to construct disease associated metabolite network for prioritizing candidate diseases-related metabolites.
Collapse
Affiliation(s)
- Yongtian Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Liran Juan
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, People's Republic of China
| | - Tianyi Zang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| |
Collapse
|
14
|
Zhan Q, Wang N, Jin S, Tan R, Jiang Q, Wang Y. ProbPFP: a multiple sequence alignment algorithm combining hidden Markov model optimized by particle swarm optimization with partition function. BMC Bioinformatics 2019; 20:573. [PMID: 31760933 PMCID: PMC6876095 DOI: 10.1186/s12859-019-3132-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND During procedures for conducting multiple sequence alignment, that is so essential to use the substitution score of pairwise alignment. To compute adaptive scores for alignment, researchers usually use Hidden Markov Model or probabilistic consistency methods such as partition function. Recent studies show that optimizing the parameters for hidden Markov model, as well as integrating hidden Markov model with partition function can raise the accuracy of alignment. The combination of partition function and optimized HMM, which could further improve the alignment's accuracy, however, was ignored by these researches. RESULTS A novel algorithm for MSA called ProbPFP is presented in this paper. It intergrate optimized HMM by particle swarm with partition function. The algorithm of PSO was applied to optimize HMM's parameters. After that, the posterior probability obtained by the HMM was combined with the one obtained by partition function, and thus to calculate an integrated substitution score for alignment. In order to evaluate the effectiveness of ProbPFP, we compared it with 13 outstanding or classic MSA methods. The results demonstrate that the alignments obtained by ProbPFP got the maximum mean TC scores and mean SP scores on these two benchmark datasets: SABmark and OXBench, and it got the second highest mean TC scores and mean SP scores on the benchmark dataset BAliBASE. ProbPFP is also compared with 4 other outstanding methods, by reconstructing the phylogenetic trees for six protein families extracted from the database TreeFam, based on the alignments obtained by these 5 methods. The result indicates that the reference trees are closer to the phylogenetic trees reconstructed from the alignments obtained by ProbPFP than the other methods. CONCLUSIONS We propose a new multiple sequence alignment method combining optimized HMM and partition function in this paper. The performance validates this method could make a great improvement of the alignment's accuracy.
Collapse
Affiliation(s)
- Qing Zhan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Nan Wang
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Shuilin Jin
- Department of Mathematics, Harbin Institute of Technology, Harbin, 150001, China
| | - Renjie Tan
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, China.
| |
Collapse
|
15
|
Peng J, Wang X, Shang X. Combining gene ontology with deep neural networks to enhance the clustering of single cell RNA-Seq data. BMC Bioinformatics 2019; 20:284. [PMID: 31182005 PMCID: PMC6557741 DOI: 10.1186/s12859-019-2769-6] [Citation(s) in RCA: 30] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/24/2022] Open
Abstract
Background Single cell RNA sequencing (scRNA-seq) is applied to assay the individual transcriptomes of large numbers of cells. The gene expression at single-cell level provides an opportunity for better understanding of cell function and new discoveries in biomedical areas. To ensure that the single-cell based gene expression data are interpreted appropriately, it is crucial to develop new computational methods. Results In this article, we try to re-construct a neural network based on Gene Ontology (GO) for dimension reduction of scRNA-seq data. By integrating GO with both unsupervised and supervised models, two novel methods are proposed, named GOAE (Gene Ontology AutoEncoder) and GONN (Gene Ontology Neural Network) respectively. Conclusions The evaluation results show that the proposed models outperform some state-of-the-art dimensionality reduction approaches. Furthermore, incorporating with GO, we provide an opportunity to interpret the underlying biological mechanism behind the neural network-based model.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China.,Centre for Multidisciplinary Convergence Computing, School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xiaoyu Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China. .,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China.
| |
Collapse
|
16
|
Xue H, Peng J, Shang X. Predicting disease-related phenotypes using an integrated phenotype similarity measurement based on HPO. BMC SYSTEMS BIOLOGY 2019; 13:34. [PMID: 30953559 PMCID: PMC6449884 DOI: 10.1186/s12918-019-0697-8] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 11/13/2022]
Abstract
Background Improving efficiency of disease diagnosis based on phenotype ontology is a critical yet challenging research area. Recently, Human Phenotype Ontology (HPO)-based semantic similarity has been affectively and widely used to identify causative genes and diseases. However, current phenotype similarity measurements just consider the annotations and hierarchy structure of HPO, neglecting the definition description of phenotype terms. Results In this paper, we propose a novel phenotype similarity measurement, termed as DisPheno, which adequately incorporates the definition of phenotype terms in addition to HPO structure and annotations to measure the similarity between phenotype terms. DisPheno also integrates phenotype term associations into phenotype-set similarity measurement using gene and disease annotations of phenotype terms. Conclusions Compared with five existing state-of-the-art methods, DisPheno shows great performance in HPO-based phenotype semantic similarity measurement and improves the efficiency of disease identification, especially on noisy patients dataset.
Collapse
Affiliation(s)
- Hansheng Xue
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.,School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|
17
|
Peng J, Guan J, Shang X. Predicting Parkinson's Disease Genes Based on Node2vec and Autoencoder. Front Genet 2019; 10:226. [PMID: 31001311 PMCID: PMC6454041 DOI: 10.3389/fgene.2019.00226] [Citation(s) in RCA: 61] [Impact Index Per Article: 10.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2018] [Accepted: 02/28/2019] [Indexed: 12/26/2022] Open
Abstract
Identifying genes associated with Parkinson's disease plays an extremely important role in the diagnosis and treatment of Parkinson's disease. In recent years, based on the guilt-by-association hypothesis, many methods have been proposed to predict disease-related genes, but few of these methods are designed or used for Parkinson's disease gene prediction. In this paper, we propose a novel prediction method for Parkinson's disease gene prediction, named N2A-SVM. N2A-SVM includes three parts: extracting features of genes based on network, reducing the dimension using deep neural network, and predicting Parkinson's disease genes using a machine learning method. The evaluation test shows that N2A-SVM performs better than existing methods. Furthermore, we evaluate the significance of each step in the N2A-SVM algorithm and the influence of the hyper-parameters on the result. In addition, we train N2A-SVM on the recent dataset and used it to predict Parkinson's disease genes. The predicted top-rank genes can be verified based on literature study.
Collapse
Affiliation(s)
| | | | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| |
Collapse
|
18
|
Cheng L, Zhuang H, Ju H, Yang S, Han J, Tan R, Hu Y. Exposing the Causal Effect of Body Mass Index on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front Genet 2019; 10:94. [PMID: 30891058 PMCID: PMC6413727 DOI: 10.3389/fgene.2019.00094] [Citation(s) in RCA: 43] [Impact Index Per Article: 7.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/21/2018] [Accepted: 01/29/2019] [Indexed: 12/17/2022] Open
Abstract
Introduction: High body mass index (BMI) is a positive associated phenotype of type 2 diabetes mellitus (T2DM). Abundant studies have observed this from a clinical perspective. Since the rapid increase in a large number of genetic variants from the genome-wide association studies (GWAS), common SNPs of BMI and T2DM were identified as the genetic basis for understanding their associations. Currently, their causality is beginning to blur. Materials and Methods: To classify it, a Mendelian randomisation (MR), using genetic instrumental variables (IVs) to explore the causality of intermediate phenotype and disease, was utilized here to test the effect of BMI on the risk of T2DM. In this article, MR was carried out on GWAS data using 52 independent BMI SNPs as IVs. The pooled odds ratio (OR) of these SNPs was calculated using inverse-variance weighted method for the assessment of 5 kg/m2 higher BMI on the risk of T2DM. The leave-one-out validation was conducted to identify the effect of individual SNPs. MR-Egger regression was utilized to detect potential pleiotropic bias of variants. Results: We obtained the high OR (1.470; 95% CI 1.170 to 1.847; P = 0.001), low intercept (0.004, P = 0.661), and small fluctuation of ORs {from -0.039 [(1.412 - 1.470) / 1.470)] to 0.075 [(1.568- 1.470) / 1.470)] in leave-one-out validation. Conclusion: We validate the causal effect of high BMI on the risk of T2DM. The low intercept shows no pleiotropic bias of IVs. The small alterations of ORs activated by removing individual SNPs showed no single SNP drives our estimate.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Hong Ju
- Department of Information Engineering, Heilongjiang Biological Science and Technology Career Academy, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Renjie Tan
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Yang Hu
- School of Life Sciences and Technology, Harbin Institute of Technology, Harbin, China
| |
Collapse
|
19
|
Xu L, Liang G, Liao C, Chen GD, Chang CC. k-Skip-n-Gram-RF: A Random Forest Based Method for Alzheimer's Disease Protein Identification. Front Genet 2019; 10:33. [PMID: 30809242 PMCID: PMC6379451 DOI: 10.3389/fgene.2019.00033] [Citation(s) in RCA: 53] [Impact Index Per Article: 8.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2018] [Accepted: 01/17/2019] [Indexed: 11/18/2022] Open
Abstract
In this paper, a computational method based on machine learning technique for identifying Alzheimer's disease genes is proposed. Compared with most existing machine learning based methods, existing methods predict Alzheimer's disease genes by using structural magnetic resonance imaging (MRI) technique. Most methods have attained acceptable results, but the cost is expensive and time consuming. Thus, we proposed a computational method for identifying Alzheimer disease genes by use of the sequence information of proteins, and classify the feature vectors by random forest. In the proposed method, the gene protein information is extracted by adaptive k-skip-n-gram features. The proposed method can attain the accuracy to 85.5% on the selected UniProt dataset, which has been demonstrated by the experimental results.
Collapse
Affiliation(s)
- Lei Xu
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Guangmin Liang
- School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen, China
| | - Changrui Liao
- Key Laboratory of Optoelectronic Devices and Systems of Ministry of Education and Guangdong Province, College of Optoelectronic Engineering, Shenzhen University, Shenzhen, China
| | - Gin-Den Chen
- Department of Obstetrics and Gynecology, Chung Shan Medical University Hospital, Taichung, Taiwan
| | - Chi-Chang Chang
- School of Medical Informatics, Chung Shan Medical University, Taichung, Taiwan
- IT Office, Chung Shan Medical University Hospital, Taichung, Taiwan
| |
Collapse
|
20
|
Li Y, Niu M, Zou Q. ELM-MHC: An Improved MHC Identification Method with Extreme Learning Machine Algorithm. J Proteome Res 2019; 18:1392-1401. [DOI: 10.1021/acs.jproteome.9b00012] [Citation(s) in RCA: 42] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2023]
Affiliation(s)
- Yanjuan Li
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Mengting Niu
- School of Information and Computer Engineering, Northeast Forestry University, Harbin 150040, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu 610054, China
- Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu 610054, China
| |
Collapse
|
21
|
Tian Z, Teng Z, Cheng S, Guo M. Computational drug repositioning using meta-path-based semantic network analysis. BMC SYSTEMS BIOLOGY 2018; 12:134. [PMID: 30598084 PMCID: PMC6311940 DOI: 10.1186/s12918-018-0658-7] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 02/01/2023]
Abstract
BACKGROUND Drug repositioning is a promising and efficient way to discover new indications for existing drugs, which holds the great potential for precision medicine in the post-genomic era. Many network-based approaches have been proposed for drug repositioning based on similarity networks, which integrate multiple sources of drugs and diseases. However, these methods may simply view nodes as the same-typed and neglect the semantic meanings of different meta-paths in the heterogeneous network. Therefore, it is urgent to develop a rational method to infer new indications for approved drugs. RESULTS In this study, we proposed a novel methodology named HeteSim_DrugDisease (HSDD) for the prediction of drug repositioning. Firstly, we build the drug-drug similarity network and disease-disease similarity network by integrating the information of drugs and diseases. Secondly, a drug-disease heterogeneous network is constructed, which combines the drug similarity network, disease similarity network as well as the known drug-disease association network. Finally, HSDD predicts novel drug-disease associations based on the HeteSim scores of different meta-paths. The experimental results show that HSDD performs significantly better than the existing state-of-the-art approaches. HSDD achieves an AUC score of 0.8994 in the leave-one-out cross validation experiment. Moreover, case studies for selected drugs further illustrate the practical usefulness of HSDD. CONCLUSIONS HSDD can be an effective and feasible way to infer the associations between drugs and diseases using on meta-path-based semantic network analysis.
Collapse
Affiliation(s)
- Zhen Tian
- School of Information Engineering, Zhengzhou University, Zhengzhou, 450001, People's Republic of China
| | - Zhixia Teng
- School of information and computer engineering, Northeast Forestry, Harbin, 150001, People's Republic of China
| | - Shuang Cheng
- Institute of Materials, China Academy of Engineering Physics, Jiang You, 621907, Sichuan, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, 100044, People's Republic of China. .,Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing, 100044, China.
| |
Collapse
|
22
|
Cheng L, Zhuang H, Yang S, Jiang H, Wang S, Zhang J. Exposing the Causal Effect of C-Reactive Protein on the Risk of Type 2 Diabetes Mellitus: A Mendelian Randomization Study. Front Genet 2018; 9:657. [PMID: 30619477 PMCID: PMC6306438 DOI: 10.3389/fgene.2018.00657] [Citation(s) in RCA: 57] [Impact Index Per Article: 8.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/14/2018] [Accepted: 12/03/2018] [Indexed: 12/21/2022] Open
Abstract
As a biomarker of inflammation, C-reactive protein (CRP) has attracted much attention due to its role in the incidence of type 2 diabetes mellitus (T2DM). Prospective studies have observed a positive correlation between the level of serum CRP and the incidence of T2DM. Recently, studies have reported that drugs for curing T2DM can also decrease the level of serum CRP. However, it is not yet clear whether high CRP levels cause T2DM. To evaluate this, we conducted a Mendelian randomization (MR) analysis using genetic variations as instrumental variables (IVs). Significantly associated single nucleotide polymorphisms (SNPs) of CRP were obtained from a genome-wide study and a replication study. Therein, 17,967 participants were utilized for the genome-wide association study (GWAS), and another 14,747 participants were utilized for the replication of identifying SNPs associated with CRP levels. The associations between SNPs and T2DM were from the DIAbetes Genetics Replication And Meta-analysis (DIAGRAM) consortium. After removing SNPs in linkage disequilibrium (LD) and T2DM-related SNPs, the four remaining CRP-related SNPs were deemed as IVs. To evaluate the pooled influence of these IVs on the risk of developing T2DM through CRP, the penalized robust inverse-variance weighted (IVW) method was carried out. The combined result (OR 1.114048; 95% CI 1.058656 to 1.172338; P = 0.024) showed that high levels of CRP significantly increase the risk of T2DM. In the subsequent analysis of the relationship between CRP and type 1 diabetes mellitus (T1DM), the pooled result (OR 1.017145; 95% CI 0.9066489 to 1.14225; P = 0.909) supported that CRP levels cannot determine the risk of developing T1DM.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - He Zhuang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Shuo Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, China
| | - Huijie Jiang
- Department of Radiology, The Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Song Wang
- Department of Radiology, Longhua Hospital, Shanghai University of Traditional Chinese Medicine, Shanghai, China
| | - Jun Zhang
- Heilongjiang Provincial Hospital, Harbin, China
| |
Collapse
|
23
|
Peng J, Xue H, Hui W, Lu J, Chen B, Jiang Q, Shang X, Wang Y. An online tool for measuring and visualizing phenotype similarities using HPO. BMC Genomics 2018; 19:571. [PMID: 30367579 PMCID: PMC6101067 DOI: 10.1186/s12864-018-4927-z] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/29/2022] Open
Abstract
BACKGROUND The Human Phenotype Ontology (HPO) is one of the most popular bioinformatics resources. Recently, HPO-based phenotype semantic similarity has been effectively applied to model patient phenotype data. However, the existing tools are revised based on the Gene Ontology (GO)-based term similarity. The design of the models are not optimized for the unique features of HPO. In addition, existing tools only allow HPO terms as input and only provide pure text-based outputs. RESULTS We present PhenoSimWeb, a web application that allows researchers to measure HPO-based phenotype semantic similarities using four approaches borrowed from GO-based similarity measurements. Besides, we provide a approach considering the unique properties of HPO. And, PhenoSimWeb allows text that describes phenotypes as input, since clinical phenotype data is always in text. PhenoSimWeb also provides a graphic visualization interface to visualize the resulting phenotype network. CONCLUSIONS PhenoSimWeb is an easy-to-use and functional online application. Researchers can use it to calculate phenotype similarity conveniently, predict phenotype associated genes or diseases, and visualize the network of phenotype interactions. PhenoSimWeb is available at http://120.77.47.2:8080.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Hansheng Xue
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055 China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, 150001 China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Yadong Wang
- Department of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, 518055 China
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001 China
| |
Collapse
|
24
|
Peng J, Wang H, Lu J, Hui W, Wang Y, Shang X. Identifying term relations cross different gene ontology categories. BMC Bioinformatics 2017; 18:573. [PMID: 29297309 PMCID: PMC5751813 DOI: 10.1186/s12859-017-1959-3] [Citation(s) in RCA: 39] [Impact Index Per Article: 4.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The Gene Ontology (GO) is a community-based bioinformatics resource that employs ontologies to represent biological knowledge and describes information about gene and gene product function. GO includes three independent categories: molecular function, biological process and cellular component. For better biological reasoning, identifying the biological relationships between terms in different categories are important. However, the existing measurements to calculate similarity between terms in different categories are either developed by using the GO data only or only take part of combined gene co-function network information. Results We propose an iterative ranking-based method called CroGO2 to measure the cross-categories GO term similarities by incorporating level information of GO terms with both direct and indirect interactions in the gene co-function network. Conclusions The evaluation test shows that CroGO2 performs better than the existing methods. A genome-specific term association network for yeast is also generated by connecting terms with the high confidence score. The linkages in the term association network could be supported by the literature. Given a gene set, the related terms identified by using the association network have overlap with the related terms identified by GO enrichment analysis.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Honggang Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Junya Lu
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, China.
| |
Collapse
|