1
|
Yang H, Fu H, Zhang M, Liu Y, He YO, Wang C, Cheng L. EnrichDO: a global weighted model for Disease Ontology enrichment analysis. Gigascience 2025; 14:giaf021. [PMID: 40139908 PMCID: PMC11945307 DOI: 10.1093/gigascience/giaf021] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2024] [Revised: 12/18/2024] [Accepted: 02/14/2025] [Indexed: 03/29/2025] Open
Abstract
BACKGROUND Disease Ontology (DO) has been widely studied in biomedical research and clinical practice to describe the roles of genes. DO enrichment analysis is an effective means to discover associations between genes and diseases. Compared to hundreds of Gene Ontology (GO)-based enrichment analysis methods, however, DO-based methods are relatively scarce, and most current DO-based approaches are term-for-term and thus are unable to solve over-enrichment problems caused by the "true-path" rule. RESULTS Here, we describe a novel double-weighted model, EnrichDO, which leverages the latest annotations of the human genome with DO terms and integrates DO graph topology on a global scale. Compared to classic enrichment methods (mainly for GO) and existing DO-based enrichment tools, EnrichDO performs better in both GO and DO enrichment analysis cases. It can accurately identify more specific terms, without ignoring the truly associated parent terms, as shown in the Alzheimer's disease (AD) case (AD ranked first). Moreover, both a simulated test and a data perturbation test validate the accuracy and robustness of EnrichDO. Finally, EnrichDO is applied to other types of datasets to expand its application, including gene expression profile datasets, a host gene set of microorganisms, and hallmark gene sets. Based on the findings reported here, EnrichDO shows significant improvement via all experimental results. CONCLUSIONS EnrichDO provides an effective DO enrichment analysis model for gaining insight into the significance of a particular gene set in the context of disease. To increase the usability of EnrichDO, we have developed an R-based software package, which is freely available through Bioconductor (https://bioconductor.org/packages/release/bioc/html/EnrichDO.html) or at https://github.com/liangcheng-hrbmu/EnrichDO.
Collapse
Affiliation(s)
- Haixiu Yang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Hongyu Fu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Meiyi Zhang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Yangyang Liu
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Yongqun Oliver He
- Unit for Laboratory Animal Medicine, University of Michigan Medical School, Ann Arbor, MI 48109, USA
| | - Chao Wang
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, Heilongjiang 150081, China
- National Health Commission (NHC) Key Laboratory of Molecular Probes and Targeted Diagnosis and Therapy, Harbin Medical University, Harbin 150028, China
| |
Collapse
|
2
|
Xiang J, Zhang J, Zhao Y, Wu FX, Li M. Biomedical data, computational methods and tools for evaluating disease-disease associations. Brief Bioinform 2022; 23:6522999. [PMID: 35136949 DOI: 10.1093/bib/bbac006] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 01/04/2022] [Accepted: 01/05/2022] [Indexed: 12/12/2022] Open
Abstract
In recent decades, exploring potential relationships between diseases has been an active research field. With the rapid accumulation of disease-related biomedical data, a lot of computational methods and tools/platforms have been developed to reveal intrinsic relationship between diseases, which can provide useful insights to the study of complex diseases, e.g. understanding molecular mechanisms of diseases and discovering new treatment of diseases. Human complex diseases involve both external phenotypic abnormalities and complex internal molecular mechanisms in organisms. Computational methods with different types of biomedical data from phenotype to genotype can evaluate disease-disease associations at different levels, providing a comprehensive perspective for understanding diseases. In this review, available biomedical data and databases for evaluating disease-disease associations are first summarized. Then, existing computational methods for disease-disease associations are reviewed and classified into five groups in terms of the usages of biomedical data, including disease semantic-based, phenotype-based, function-based, representation learning-based and text mining-based methods. Further, we summarize software tools/platforms for computation and analysis of disease-disease associations. Finally, we give a discussion and summary on the research of disease-disease associations. This review provides a systematic overview for current disease association research, which could promote the development and applications of computational methods and tools/platforms for disease-disease associations.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, China
| | - Jiashuai Zhang
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Yichao Zhao
- School of Computer Science and Engineering, Central South University, China
| | - Fang-Xiang Wu
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, Hunan 410083, China
| | - Min Li
- Division of Biomedical Engineering and Department of Mechanical Engineering at University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
3
|
García Del Valle EP, Lagunes García G, Prieto Santamaría L, Zanin M, Menasalvas Ruiz E, Rodríguez-González A. DisMaNET: A network-based tool to cross map disease vocabularies. COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE 2021; 207:106233. [PMID: 34157517 DOI: 10.1016/j.cmpb.2021.106233] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 08/26/2020] [Accepted: 06/02/2021] [Indexed: 06/13/2023]
Abstract
BACKGROUND AND OBJECTIVES The growing integration of healthcare sources is improving our understanding of diseases. Cross-mapping resources such as UMLS play a very important role in this area, but their coverage is still incomplete. With the aim to facilitate the integration and interoperability of biological, clinical and literary sources in studies of diseases, we built DisMaNET, a system to cross-map terms from disease vocabularies by leveraging the power and interpretability of network analysis. METHODS First, we collected and normalized data from 8 disease vocabularies and mapping sources to generate our datasets. Next, we built DisMaNET by integrating the generated datasets into a Neo4j graph database. Then we exploited the query mechanisms of Neo4j to cross-map disease terms of different vocabularies with a relevance score metric and contrasted the results with some state-of-the-art solutions. Finally, we made our system publicly available for its exploitation and evaluation both through a graphical user interface and REST APIs. RESULTS DisMaNET contains almost half a million nodes and near nine hundred thousand edges, including hierarchical and mapping relationships. Its query capabilities enabled the detection of connections between disease vocabularies that are not present in major mapping sources such as UMLS and the Disease Ontology, even for rare diseases. Furthermore, DisMaNET was capable of obtaining more than 80% of the mappings with UMLS reported in MonDO and DisGeNET, and it was successfully exploited to resolve the missing mappings in the DISNET project. CONCLUSIONS DisMaNET is a powerful, intuitive and publicly available system to cross-map terms from different disease vocabularies. Our study proves that it is a competitive alternative to existing mapping systems, incorporating the potential of network analysis and the interpretability of the results through a visual interface as its main advantages. Expansion with new sources, versioning and the improvement of the search and scoring algorithms are envisioned as future lines of work.
Collapse
Affiliation(s)
| | - Gerardo Lagunes García
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Lucía Prieto Santamaría
- Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Massimiliano Zanin
- Instituto de Física Interdisciplinar y Sistemas Complejos IFISC (CSIC-UIB), Campus UIB, Palma de Mallorca, Spain
| | - Ernestina Menasalvas Ruiz
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| | - Alejandro Rodríguez-González
- ETS de Ingenieros Informáticos. Universidad Politécnica de Madrid. Boadilla del Monte, Madrid, Spain; Centro de Tecnología Biomédica, ETS Ingenieros Informáticos. Universidad Politécnica de Madrid. Pozuelo de Alarcón, Madrid, Spain
| |
Collapse
|
4
|
Ma J, Zhang L, Chen J, Song B, Zang C, Liu H. m 7GDisAI: N7-methylguanosine (m 7G) sites and diseases associations inference based on heterogeneous network. BMC Bioinformatics 2021; 22:152. [PMID: 33761868 PMCID: PMC7992861 DOI: 10.1186/s12859-021-04007-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/28/2020] [Accepted: 02/08/2021] [Indexed: 12/11/2022] Open
Abstract
Background Recent studies have confirmed that N7-methylguanosine (m7G) modification plays an important role in regulating various biological processes and has associations with multiple diseases. Wet-lab experiments are cost and time ineffective for the identification of disease-associated m7G sites. To date, tens of thousands of m7G sites have been identified by high-throughput sequencing approaches and the information is publicly available in bioinformatics databases, which can be leveraged to predict potential disease-associated m7G sites using a computational perspective. Thus, computational methods for m7G-disease association prediction are urgently needed, but none are currently available at present. Results To fill this gap, we collected association information between m7G sites and diseases, genomic information of m7G sites, and phenotypic information of diseases from different databases to build an m7G-disease association dataset. To infer potential disease-associated m7G sites, we then proposed a heterogeneous network-based model, m7G Sites and Diseases Associations Inference (m7GDisAI) model. m7GDisAI predicts the potential disease-associated m7G sites by applying a matrix decomposition method on heterogeneous networks which integrate comprehensive similarity information of m7G sites and diseases. To evaluate the prediction performance, 10 runs of tenfold cross validation were first conducted, and m7GDisAI got the highest AUC of 0.740(± 0.0024). Then global and local leave-one-out cross validation (LOOCV) experiments were implemented to evaluate the model’s accuracy in global and local situations respectively. AUC of 0.769 was achieved in global LOOCV, while 0.635 in local LOOCV. A case study was finally conducted to identify the most promising ovarian cancer-related m7G sites for further functional analysis. Gene Ontology (GO) enrichment analysis was performed to explore the complex associations between host gene of m7G sites and GO terms. The results showed that m7GDisAI identified disease-associated m7G sites and their host genes are consistently related to the pathogenesis of ovarian cancer, which may provide some clues for pathogenesis of diseases. Conclusion The m7GDisAI web server can be accessed at http://180.208.58.66/m7GDisAI/, which provides a user-friendly interface to query disease associated m7G. The list of top 20 m7G sites predicted to be associted with 177 diseases can be achieved. Furthermore, detailed information about specific m7G sites and diseases are also shown. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04007-9.
Collapse
Affiliation(s)
- Jiani Ma
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Lin Zhang
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China. .,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.
| | - Jin Chen
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Bowen Song
- Department of Biological Sciences, AI University Research Center, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Chenxuan Zang
- Department of Biological Sciences, AI University Research Center, Xi'an Jiaotong-Liverpool University, Suzhou, 215123, China
| | - Hui Liu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, 221116, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
5
|
Zhang L, Chen J, Ma J, Liu H. HN-CNN: A Heterogeneous Network Based on Convolutional Neural Network for m 7 G Site Disease Association Prediction. Front Genet 2021; 12:655284. [PMID: 33747055 PMCID: PMC7970120 DOI: 10.3389/fgene.2021.655284] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2021] [Accepted: 02/15/2021] [Indexed: 12/24/2022] Open
Abstract
N7-methylguanosine (m7G) is a typical positively charged RNA modification, playing a vital role in transcriptional regulation. m7G can affect the biological processes of mRNA and tRNA and has associations with multiple diseases including cancers. Wet-lab experiments are cost and time ineffective for the identification of disease-related m7G sites. Thus, a heterogeneous network method based on Convolutional Neural Networks (HN-CNN) has been proposed to predict unknown associations between m7G sites and diseases. HN-CNN constructs a heterogeneous network with m7G site similarity, disease similarity, and disease-associated m7G sites to formulate features for m7G site-disease pairs. Next, a convolutional neural network (CNN) obtains multidimensional and irrelevant features prominently. Finally, XGBoost is adopted to predict the association between m7G sites and diseases. The performance of HN-CNN is compared with Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), as well as Gradient Boosting Decision Tree (GBDT) through 10-fold cross-validation. The average AUC of HN-CNN is 0.827, which is superior to others.
Collapse
Affiliation(s)
- Lin Zhang
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jin Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Jiani Ma
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| | - Hui Liu
- Engineering Research Center of Intelligent Control for Underground Space, Ministry of Education, China University of Mining and Technology, Xuzhou, China.,School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, China
| |
Collapse
|
6
|
Peng J, Guan J, Hui W, Shang X. A novel subnetwork representation learning method for uncovering disease-disease relationships. Methods 2020; 192:77-84. [PMID: 32946974 DOI: 10.1016/j.ymeth.2020.09.002] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2020] [Revised: 08/20/2020] [Accepted: 09/07/2020] [Indexed: 12/12/2022] Open
Abstract
Analyzing disease-disease relationships plays an important role for understanding disease mechanisms and finding alternative uses for a drug. A disease is usually the result of abnormal state of multiple molecular process. Since biological networks can model the interplay of multiple molecular processes, network-based methods have been proposed to uncover the disease-disease relationships recently. Given a disease and a network, the disease could be represented as a subnetwork constructed by the disease genes involved in the given network, named disease subnetwork. Because it is difficult to learn the feature representation of disease subnetworks, most existing methods are unsupervised ones without using labeled information. To fill this gap, we propose a novel method named SubNet2vec to learn the feature vectors of diseases from their corresponding subnetwork in the biological network. By utilizing the feature representation of disease subnetwork, we can analyze disease-disease relationships in a supervised fashion. The evaluation results show that the proposed framework outperforms some state-of-the-art approaches in a large margin on disease-disease/disease-drug association prediction. The source code and data are available athttps://github.com/MedicineBiology-AI/SubNet2vec.git.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| | - Jiaojiao Guan
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| | - Weiwei Hui
- Vivo mobile communications (Hang Zhou) co. LTD, China.
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China.
| |
Collapse
|
7
|
Cheng L, Zhao H, Wang P, Zhou W, Luo M, Li T, Han J, Liu S, Jiang Q. Computational Methods for Identifying Similar Diseases. MOLECULAR THERAPY. NUCLEIC ACIDS 2019; 18:590-604. [PMID: 31678735 PMCID: PMC6838934 DOI: 10.1016/j.omtn.2019.09.019] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/25/2019] [Revised: 09/11/2019] [Accepted: 09/12/2019] [Indexed: 02/01/2023]
Abstract
Although our knowledge of human diseases has increased dramatically, the molecular basis, phenotypic traits, and therapeutic targets of most diseases still remain unclear. An increasing number of studies have observed that similar diseases often are caused by similar molecules, can be diagnosed by similar markers or phenotypes, or can be cured by similar drugs. Thus, the identification of diseases similar to known ones has attracted considerable attention worldwide. To this end, the associations between diseases at the molecular, phenotypic, and taxonomic levels were used to measure the pairwise similarity in diseases. The corresponding performance assessment strategies for these methods involving the terms “category-based,” “simulated-patient-based,” and “benchmark-data-based” were thus further emphasized. Then, frequently used methods were evaluated using a benchmark-data-based strategy. To facilitate the assessment of disease similarity scores, researchers have designed dozens of tools that implement these methods for calculating disease similarity. Currently, disease similarity has been advantageous in predicting noncoding RNA (ncRNA) function and therapeutic drugs for diseases. In this article, we review disease similarity methods, evaluation strategies, tools, and their applications in the biomedical community. We further evaluate the performance of these methods and discuss the current limitations and future trends for calculating disease similarity.
Collapse
Affiliation(s)
- Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Hengqiang Zhao
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China
| | - Pingping Wang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Wenyang Zhou
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Meng Luo
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Tianxin Li
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China
| | - Junwei Han
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin 150081, China.
| | - Shulin Liu
- Systemomics Center, College of Pharmacy, and Genomics Research Center (State-Province Key Laboratories of Biomedicine-Pharmaceutics of China), Harbin Medical University, Harbin, Heilongjiang, China; Department of Microbiology, Immunology and Infectious Diseases, University of Calgary, Calgary, AB, Canada.
| | - Qinghua Jiang
- School of Life Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, China.
| |
Collapse
|
8
|
PWCDA: Path Weighted Method for Predicting circRNA-Disease Associations. Int J Mol Sci 2018; 19:ijms19113410. [PMID: 30384427 PMCID: PMC6274797 DOI: 10.3390/ijms19113410] [Citation(s) in RCA: 62] [Impact Index Per Article: 8.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/04/2018] [Revised: 10/25/2018] [Accepted: 10/26/2018] [Indexed: 12/22/2022] Open
Abstract
CircRNAs have particular biological structure and have proven to play important roles in diseases. It is time-consuming and costly to identify circRNA-disease associations by biological experiments. Therefore, it is appealing to develop computational methods for predicting circRNA-disease associations. In this study, we propose a new computational path weighted method for predicting circRNA-disease associations. Firstly, we calculate the functional similarity scores of diseases based on disease-related gene annotations and the semantic similarity scores of circRNAs based on circRNA-related gene ontology, respectively. To address missing similarity scores of diseases and circRNAs, we calculate the Gaussian Interaction Profile (GIP) kernel similarity scores for diseases and circRNAs, respectively, based on the circRNA-disease associations downloaded from circR2Disease database (http://bioinfo.snnu.edu.cn/CircR2Disease/). Then, we integrate disease functional similarity scores and circRNA semantic similarity scores with their related GIP kernel similarity scores to construct a heterogeneous network made up of three sub-networks: disease similarity network, circRNA similarity network and circRNA-disease association network. Finally, we compute an association score for each circRNA-disease pair based on paths connecting them in the heterogeneous network to determine whether this circRNA-disease pair is associated. We adopt leave one out cross validation (LOOCV) and five-fold cross validations to evaluate the performance of our proposed method. In addition, three common diseases, Breast Cancer, Gastric Cancer and Colorectal Cancer, are used for case studies. Experimental results illustrate the reliability and usefulness of our computational method in terms of different validation measures, which indicates PWCDA can effectively predict potential circRNA-disease associations.
Collapse
|
9
|
Abstract
BACKGROUND Recently, measuring phenotype similarity began to play an important role in disease diagnosis. Researchers have begun to pay attention to develop phenotype similarity measurement. However, existing methods ignore the interactions between phenotype-associated proteins, which may lead to inaccurate phenotype similarity. RESULTS We proposed a network-based method PhenoNet to calculate the similarity between phenotypes. We localized phenotypes in the network and calculated the similarity between phenotype-associated modules by modeling both the inter- and intra-similarity. CONCLUSIONS PhenoNet was evaluated on two independent evaluation datasets: gene ontology and gene expression data. The result shows that PhenoNet performs better than the state-of-art methods on all evaluation tests.
Collapse
Affiliation(s)
- Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Weiwei Hui
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
10
|
Hu Y, Zhao T, Zhang N, Zang T, Zhang J, Cheng L. Identifying diseases-related metabolites using random walk. BMC Bioinformatics 2018; 19:116. [PMID: 29671398 PMCID: PMC5907145 DOI: 10.1186/s12859-018-2098-1] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
Background Metabolites disrupted by abnormal state of human body are deemed as the effect of diseases. In comparison with the cause of diseases like genes, these markers are easier to be captured for the prevention and diagnosis of metabolic diseases. Currently, a large number of metabolic markers of diseases need to be explored, which drive us to do this work. Methods The existing metabolite-disease associations were extracted from Human Metabolome Database (HMDB) using a text mining tool NCBO annotator as priori knowledge. Next we calculated the similarity of a pair-wise metabolites based on the similarity of disease sets of them. Then, all the similarities of metabolite pairs were utilized for constructing a weighted metabolite association network (WMAN). Subsequently, the network was utilized for predicting novel metabolic markers of diseases using random walk. Results Totally, 604 metabolites and 228 diseases were extracted from HMDB. From 604 metabolites, 453 metabolites are selected to construct the WMAN, where each metabolite is deemed as a node, and the similarity of two metabolites as the weight of the edge linking them. The performance of the network is validated using the leave one out method. As a result, the high area under the receiver operating characteristic curve (AUC) (0.7048) is achieved. The further case studies for identifying novel metabolites of diabetes mellitus were validated in the recent studies. Conclusion In this paper, we presented a novel method for prioritizing metabolite-disease pairs. The superior performance validates its reliability for exploring novel metabolic markers of diseases.
Collapse
Affiliation(s)
- Yang Hu
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Tianyi Zhao
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Ningyi Zhang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China
| | - Tianyi Zang
- School of Life Science and Technology, Department of Computer Science and Technology, Harbin Institute of Technology, Harbin, 150001, People's Republic of China.
| | - Jun Zhang
- Department of rehabilitation, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, 150001, People's Republic of China.
| | - Liang Cheng
- College of Bioinformatics Science and Technology, Harbin Medical University, Harbin, 150001, China.
| |
Collapse
|
11
|
Hao X, Hao J, Wang L, Hou H. Effective norm emergence in cell systems under limited communication. BMC Bioinformatics 2018; 19:119. [PMID: 29671391 PMCID: PMC5907317 DOI: 10.1186/s12859-018-2097-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/24/2022] Open
Abstract
Background The cooperation of cells in biological systems is similar to that of agents in cooperative multi-agent systems. Research findings in multi-agent systems literature can provide valuable inspirations to biological research. The well-coordinated states in cell systems can be viewed as desirable social norms in cooperative multi-agent systems. One important research question is how a norm can rapidly emerge with limited communication resources. Results In this work, we propose a learning approach which can trade off the agents’ performance of coordinating on a consistent norm and the communication cost involved. During the learning process, the agents can dynamically adjust their coordination set according to their own observations and pick out the most crucial agents to coordinate with. In this way, our method significantly reduces the coordination dependence among agents. Conclusion The experiment results show that our method can efficiently facilitate the social norm emergence among agents, and also scale well to large-scale populations.
Collapse
Affiliation(s)
- Xiaotian Hao
- School of Computer Science and Software, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350, China
| | - Jianye Hao
- School of Computer Science and Software, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350, China
| | - Li Wang
- School of Computer Science and Software, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350, China.
| | - Hanxu Hou
- School of Electrical Engineering and Intelligentization, Dongguan University of Technology, No. 1, university road, songshan lake district, dongguan, 221116, China.
| |
Collapse
|
12
|
Sun S, Sun X, Zheng Y. Higher-order partial least squares for predicting gene expression levels from chromatin states. BMC Bioinformatics 2018; 19:113. [PMID: 29671394 PMCID: PMC5907142 DOI: 10.1186/s12859-018-2100-y] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/23/2023] Open
Abstract
Background Extensive studies have shown that gene expression levels are strongly affected by chromatin mark combinations via at least two mechanisms, i.e., activation or repression. But their combinatorial patterns are still unclear. To further understand the relationship between histone modifications and gene expression levels, here in this paper, we introduce a purely geometric higher-order representation, tensor (also called multidimensional array), which might borrow more unknown interactions in chromatin states to predicting gene expression levels. Results The prediction models were learned from regions around upstream 10k base pairs and downstream 10k base pairs of the transcriptional start sites (TSSs) on three species (i.e., Human, Rhesus Macaque, and Chimpanzee) with five histone modifications (i.e., H3K4me1, H3K4me3, H3K27ac, H3K27me3, and Pol II). Experimental results demonstrate that the proposed method is more powerful to predicting gene expression levels than several other popular methods. Specifically, our method enable to get more powerful performance on both commonly used criteria, R and RMSE, as high as 1.7% and 11%, respectively. Conclusions The overall aim of this work is to show that the higher-order representation is able to include more unknown interaction information between histone modifications across different species.
Collapse
Affiliation(s)
- Shiquan Sun
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, People's Republic of China. .,Department of Biostatistics, University of Michigan, Ann Arbor, 48109, MI, USA.
| | - Xifang Sun
- School of Science, Xi'an Shiyou University, Xi'an, 710065, Shaanxi, People's Republic of China
| | - Yan Zheng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, Shaanxi, People's Republic of China
| |
Collapse
|
13
|
Tou H, Yao L, Wei Z, Zhuang X, Zhang B. Automatic infection detection based on electronic medical records. BMC Bioinformatics 2018; 19:117. [PMID: 29671399 PMCID: PMC5907141 DOI: 10.1186/s12859-018-2101-x] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/20/2023] Open
Abstract
BACKGROUND Making accurate patient care decision, as early as possible, is a constant challenge, especially for physicians in the emergency department. The increasing volumes of electronic medical records (EMRs) open new horizons for automatic diagnosis. In this paper, we propose to use machine learning approaches for automatic infection detection based on EMRs. Five categories of information are utilized for prediction, including personal information, admission note, vital signs, diagnose test results and medical image diagnose. RESULTS Experimental results on a newly constructed EMRs dataset from emergency department show that machine learning models can achieve a decent performance for infection detection with area under the receiver operator characteristic curve (AUC) of 0.88. Out of all the five types of information, admission note in text form makes the most contribution with the AUC of 0.87. CONCLUSIONS This study provides a state-of-the-art EMRs processing system to automatically make medical decisions. It extracts five types of features associated with infection and achieves a decent performance on automatic infection detection based on machine learning models.
Collapse
Affiliation(s)
- Huaixiao Tou
- School of Data Science, Fudan University, Shanghai, China
| | - Lu Yao
- Zhongshan Hospital Affiliated to Fudan University, Shanghai, China
| | - Zhongyu Wei
- School of Data Science, Fudan University, Shanghai, China.
| | - Xiahai Zhuang
- School of Data Science, Fudan University, Shanghai, China
| | - Bo Zhang
- Zhongshan Hospital Affiliated to Fudan University, Shanghai, China.
| |
Collapse
|
14
|
Guo Y, Liu S, Li Z, Shang X. BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data. BMC Bioinformatics 2018; 19:118. [PMID: 29671390 PMCID: PMC5907304 DOI: 10.1186/s12859-018-2095-4] [Citation(s) in RCA: 46] [Impact Index Per Article: 6.6] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. RESULTS In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. CONCLUSIONS The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
Collapse
Affiliation(s)
- Yang Guo
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072 People’s Republic of China
| | - Shuhui Liu
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072 People’s Republic of China
| | - Zhanhuai Li
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072 People’s Republic of China
| | - Xuequn Shang
- School of Computer Science and Engineering, Northwestern Polytechnical University, Xi’an, 710072 People’s Republic of China
| |
Collapse
|