1
|
Si Y, Huang Z, Fang Z, Yuan Z, Huang Z, Li Y, Wei Y, Wu F, Yao YF. Global-local aware Heterogeneous Graph Contrastive Learning for multifaceted association prediction in miRNA-gene-disease networks. Brief Bioinform 2024; 25:bbae443. [PMID: 39256197 PMCID: PMC11387071 DOI: 10.1093/bib/bbae443] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2024] [Revised: 08/11/2024] [Accepted: 08/30/2024] [Indexed: 09/12/2024] Open
Abstract
Unraveling the intricate network of associations among microRNAs (miRNAs), genes, and diseases is pivotal for deciphering molecular mechanisms, refining disease diagnosis, and crafting targeted therapies. Computational strategies, leveraging link prediction within biological graphs, present a cost-efficient alternative to high-cost empirical assays. However, while plenty of methods excel at predicting specific associations, such as miRNA-disease associations (MDAs), miRNA-target interactions (MTIs), and disease-gene associations (DGAs), a holistic approach harnessing diverse data sources for multifaceted association prediction remains largely unexplored. The limited availability of high-quality data, as vitro experiments to comprehensively confirm associations are often expensive and time-consuming, results in a sparse and noisy heterogeneous graph, hindering an accurate prediction of these complex associations. To address this challenge, we propose a novel framework called Global-local aware Heterogeneous Graph Contrastive Learning (GlaHGCL). GlaHGCL combines global and local contrastive learning to improve node embeddings in the heterogeneous graph. In particular, global contrastive learning enhances the robustness of node embeddings against noise by aligning global representations of the original graph and its augmented counterpart. Local contrastive learning enforces representation consistency between functionally similar or connected nodes across diverse data sources, effectively leveraging data heterogeneity and mitigating the issue of data scarcity. The refined node representations are applied to downstream tasks, such as MDA, MTI, and DGA prediction. Experiments show GlaHGCL outperforming state-of-the-art methods, and case studies further demonstrate its ability to accurately uncover new associations among miRNAs, genes, and diseases. We have made the datasets and source code publicly available at https://github.com/Sue-syx/GlaHGCL.
Collapse
Affiliation(s)
- Yuxuan Si
- Department of Ophthalmology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, East Qingchun Road, 310016 Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Zihan Huang
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Zhengqing Fang
- Department of Ophthalmology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, East Qingchun Road, 310016 Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Zhouhang Yuan
- Department of Ophthalmology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, East Qingchun Road, 310016 Zhejiang, China
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Zhengxing Huang
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Yingming Li
- College of Information Science and Electronic Engineering, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Ying Wei
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Fei Wu
- College of Computer Science and Technology, Zhejiang University, 38 Zheda Road, 310027 Zhejiang, China
| | - Yu-Feng Yao
- Department of Ophthalmology, Sir Run Run Shaw Hospital, Zhejiang University School of Medicine, East Qingchun Road, 310016 Zhejiang, China
- Department of Ophthalmology, The Fourth Affiliated Hospital of Soochow University, 215000 Suzhou, China
| |
Collapse
|
2
|
Li ZW, Wang QK, Yuan CA, Han PY, You ZH, Wang L. Predicting MiRNA-Disease Associations by Graph Representation Learning Based on Jumping Knowledge Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2629-2638. [PMID: 35925844 DOI: 10.1109/tcbb.2022.3196394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Growing studies have shown that miRNAs are inextricably linked with many human diseases, and a great deal of effort has been spent on identifying their potential associations. Compared with traditional experimental methods, computational approaches have achieved promising results. In this article, we propose a graph representation learning method to predict miRNA-disease associations. Specifically, we first integrate the verified miRNA-disease associations with the similarity information of miRNA and disease to construct a miRNA-disease heterogeneous graph. Then, we apply a graph attention network to aggregate the neighbor information of nodes in each layer, and then feed the representation of the hidden layer into the structure-aware jumping knowledge network to obtain the global features of nodes. The output features of miRNAs and diseases are then concatenated and fed into a fully connected layer to score the potential associations. Through five-fold cross-validation, the average AUC, accuracy and precision values of our model are 93.30%, 85.18% and 88.90%, respectively. In addition, for three case studies of the esophageal tumor, lymphoma and prostate tumor, 46, 45 and 45 of the top 50 miRNAs predicted by our model were confirmed by relevant databases. Overall, our method could provide a reliable alternative for miRNA-disease association prediction.
Collapse
|
3
|
Chen M, Deng Y, Li Z, Ye Y, He Z. KATZNCP: a miRNA-disease association prediction model integrating KATZ algorithm and network consistency projection. BMC Bioinformatics 2023; 24:229. [PMID: 37268893 DOI: 10.1186/s12859-023-05365-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 05/26/2023] [Indexed: 06/04/2023] Open
Abstract
BACKGROUND Clinical studies have shown that miRNAs are closely related to human health. The study of potential associations between miRNAs and diseases will contribute to a profound understanding of the mechanism of disease development, as well as human disease prevention and treatment. MiRNA-disease associations predicted by computational methods are the best complement to biological experiments. RESULTS In this research, a federated computational model KATZNCP was proposed on the basis of the KATZ algorithm and network consistency projection to infer the potential miRNA-disease associations. In KATZNCP, a heterogeneous network was initially constructed by integrating the known miRNA-disease association, integrated miRNA similarities, and integrated disease similarities; then, the KATZ algorithm was implemented in the heterogeneous network to obtain the estimated miRNA-disease prediction scores. Finally, the precise scores were obtained by the network consistency projection method as the final prediction results. KATZNCP achieved the reliable predictive performance in leave-one-out cross-validation (LOOCV) with an AUC value of 0.9325, which was better than the state-of-the-art comparable algorithms. Furthermore, case studies of lung neoplasms and esophageal neoplasms demonstrated the excellent predictive performance of KATZNCP. CONCLUSION A new computational model KATZNCP was proposed for predicting potential miRNA-drug associations based on KATZ and network consistency projections, which can effectively predict the potential miRNA-disease interactions. Therefore, KATZNCP can be used to provide guidance for future experiments.
Collapse
Affiliation(s)
- Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yingwei Deng
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China.
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yifan Ye
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Ziyi He
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| |
Collapse
|
4
|
Ma J, Qin T, Xiang J. Disease-gene prediction based on preserving structure network embedding. Front Aging Neurosci 2023; 15:1061892. [PMID: 36896421 PMCID: PMC9990751 DOI: 10.3389/fnagi.2023.1061892] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2022] [Accepted: 01/30/2023] [Indexed: 02/23/2023] Open
Abstract
Many diseases, such as Alzheimer's disease (AD) and Parkinson's disease (PD), are caused by abnormalities or mutations of related genes. Many computational methods based on the network relationship between diseases and genes have been proposed to predict potential pathogenic genes. However, how to effectively mine the disease-gene relationship network to predict disease genes better is still an open problem. In this paper, a disease-gene-prediction method based on preserving structure network embedding (PSNE) is introduced. In order to predict pathogenic genes more effectively, a heterogeneous network with multiple types of bio-entities was constructed by integrating disease-gene associations, human protein network, and disease-disease associations. Furthermore, the low-dimension features of nodes extracted from the network were used to reconstruct a new disease-gene heterogeneous network. Compared with other advanced methods, the performance of PSNE has been confirmed more effective in disease-gene prediction. Finally, we applied the PSNE method to predict potential pathogenic genes for age-associated diseases such as AD and PD. We verified the effectiveness of these predicted potential genes by literature verification. Overall, this work provides an effective method for disease-gene prediction, and a series of high-confidence potential pathogenic genes of AD and PD which may be helpful for the experimental discovery of disease genes.
Collapse
Affiliation(s)
- Jinlong Ma
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Tian Qin
- School of Information Science and Engineering, Hebei University of Science and Technology, Shijiazhuang, China
| | - Ju Xiang
- School of Computer and Communication Engineering, Changsha University of Science and Technology, Changsha, China.,Department of Basic Medical Sciences, Changsha Medical University, Changsha, China
| |
Collapse
|
5
|
Jeong B, Lee J, Kim H, Gwak S, Kim YK, Yoo SY, Lee D, Choi JS. Multiple-Kernel Support Vector Machine for Predicting Internet Gaming Disorder Using Multimodal Fusion of PET, EEG, and Clinical Features. Front Neurosci 2022; 16:856510. [PMID: 35844227 PMCID: PMC9279895 DOI: 10.3389/fnins.2022.856510] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/17/2022] [Accepted: 05/31/2022] [Indexed: 11/22/2022] Open
Abstract
Internet gaming disorder (IGD) has become an important social and psychiatric issue in recent years. To prevent IGD and provide the appropriate intervention, an accurate prediction method for identifying IGD is necessary. In this study, we investigated machine learning methods of multimodal neuroimaging data including Positron Emission Tomography (PET), Electroencephalography (EEG), and clinical features to enhance prediction accuracy. Unlike the conventional methods which usually concatenate all features into one feature vector, we adopted a multiple-kernel support vector machine (MK-SVM) to classify IGD. We compared the prediction performance of standard machine learning methods such as SVM, random forest, and boosting with the proposed method in patients with IGD (N = 28) and healthy controls (N = 24). We showed that the prediction accuracy of the optimal MK-SVM using three kinds of modalities was much higher than other conventional machine learning methods, with the highest accuracy being 86.5%, the sensitivity 89.3%, and the specificity 83.3%. Furthermore, we deduced that clinical variables had the highest contribution to the optimal IGD prediction model and that the other two modalities were also indispensable. We found that more efficient integration of multimodal data through kernel combination could contribute to better performance of the prediction model. This study is a novel attempt to integrate each method from different sources and suggests that integrating each method, such as self-administrated reports, PET, and EEG, improves the prediction of IGD.
Collapse
Affiliation(s)
- Boram Jeong
- Department of Statistics, Ewha Womans University, Seoul, South Korea
| | - Jiyoon Lee
- Department of Psychiatry, Samsung Medical Center, Seoul, South Korea
| | - Heejung Kim
- Department of Nuclear Medicine, SMG-SNU Boramae Medical Center, Seoul, South Korea
- Institute of Radiation Medicine, Medical Research Center, Seoul National University, Seoul, South Korea
| | - Seungyeon Gwak
- Department of Statistics, Ewha Womans University, Seoul, South Korea
| | - Yu Kyeong Kim
- Department of Nuclear Medicine, SMG-SNU Boramae Medical Center, Seoul, South Korea
| | - So Young Yoo
- Department of Psychiatry, SMG-SNU Boramae Medical Center, Seoul, South Korea
| | - Donghwan Lee
- Department of Statistics, Ewha Womans University, Seoul, South Korea
- Donghwan Lee
| | - Jung-Seok Choi
- Department of Psychiatry, Samsung Medical Center, Seoul, South Korea
- *Correspondence: Jung-Seok Choi
| |
Collapse
|
6
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
7
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
8
|
Moturi S, Rao SNT, Vemuru S. Grey wolf assisted dragonfly-based weighted rule generation for predicting heart disease and breast cancer. Comput Med Imaging Graph 2021; 91:101936. [PMID: 34218121 DOI: 10.1016/j.compmedimag.2021.101936] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/05/2020] [Revised: 01/06/2021] [Accepted: 05/07/2021] [Indexed: 11/29/2022]
Abstract
Disease prediction plays a significant role in the life of people, as predicting the threat of diseases is necessary for citizens to live life in a healthy manner. The current development of data mining schemes has offered several systems that concern on disease prediction. Even though the disease prediction system includes more advantages, there are still many challenges that might limit its realistic use, such as the efficiency of prediction and information protection. This paper intends to develop an improved disease prediction model, which includes three phases: Weighted Coalesce rule generation, Optimized feature extraction, and Classification. At first, Coalesce rule generation is carried out after data transformation that involves normalization and sequential labeling. Here, rule generation is done based on the weights (priority level) assigned for each attribute by the expert. The support of each rule is multiplied with the proposed weighted function, and the resultant weighted support is compared with the minimum support for selecting the rules. Further, the obtained rule is subject to the optimal feature selection process. The hybrid classifiers that merge Support Vector Machine (SVM), and Deep Belief Network (DBN) takes the role of classification, which characterizes whether the patient is affected with the disease or not. In fact, the optimized feature selection process depends on a new hybrid optimization algorithm by linking the Grey Wolf Optimization (GWO) with Dragonfly Algorithm (DA) and hence, the presented model is termed as Grey Wolf Levy Updated-DA (GWU-DA). Here, the heart disease and breast cancer data are taken, where the efficiency of the proposed model is validated by comparing over the state-of-the-art models. From the analysis, the proposed GWU-DA model for accuracy is 65.98 %, 53.61 %, 42.27 %, 35.05 %, 34.02 %, 11.34 %, 13.4 %, 10.31 %, 9.28 % and 9.89 % better than CBA + CPAR, MKL + ANFIS, RF + EA, WCBA, IQR + KNN + PSO, NL-DA + SVM + DBN, AWFS-RA, HCS-RFRS, ADS-SM-DNN and OSSVM-HGSA models at 60th learning percentage.
Collapse
Affiliation(s)
- Sireesha Moturi
- Research Scholar, Computer Science and Engineering, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh, 522502, India.
| | - S N Tirumala Rao
- Professor, Computer Science and Engineering, Narasaraopeta Engineering College, Narasaraopet, Guntur(Dt), Andhra Pradesh, India
| | - Srikanth Vemuru
- Professor, Computer Science and Engineering, KLEF, Green Fields, Vaddeswaram, Andhra Pradesh, 522502, India
| |
Collapse
|
9
|
Peng W, Du J, Dai W, Lan W. Predicting miRNA-Disease Association Based on Modularity Preserving Heterogeneous Network Embedding. Front Cell Dev Biol 2021; 9:603758. [PMID: 34178973 PMCID: PMC8223753 DOI: 10.3389/fcell.2021.603758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/23/2021] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are a category of small non-coding RNAs that profoundly impact various biological processes related to human disease. Inferring the potential miRNA-disease associations benefits the study of human diseases, such as disease prevention, disease diagnosis, and drug development. In this work, we propose a novel heterogeneous network embedding-based method called MDN-NMTF (Module-based Dynamic Neighborhood Non-negative Matrix Tri-Factorization) for predicting miRNA-disease associations. MDN-NMTF constructs a heterogeneous network of disease similarity network, miRNA similarity network and a known miRNA-disease association network. After that, it learns the latent vector representation for miRNAs and diseases in the heterogeneous network. Finally, the association probability is computed by the product of the latent miRNA and disease vectors. MDN-NMTF not only successfully integrates diverse biological information of miRNAs and diseases to predict miRNA-disease associations, but also considers the module properties of miRNAs and diseases in the course of learning vector representation, which can maximally preserve the heterogeneous network structural information and the network properties. At the same time, we also extend MDN-NMTF to a new version (called MDN-NMTF2) by using modular information to improve the miRNA-disease association prediction ability. Our methods and the other four existing methods are applied to predict miRNA-disease associations in four databases. The prediction results show that our methods can improve the miRNA-disease association prediction to a high level compared with the four existing methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Jielin Du
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China
| |
Collapse
|
10
|
Wang C, Sun K, Wang J, Guo M. Data fusion-based algorithm for predicting miRNA–Disease associations. Comput Biol Chem 2020; 88:107357. [DOI: 10.1016/j.compbiolchem.2020.107357] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/02/2020] [Revised: 07/24/2020] [Accepted: 08/05/2020] [Indexed: 11/30/2022]
|
11
|
Zhao Y, Wang CC, Chen X. Microbes and complex diseases: from experimental results to computational models. Brief Bioinform 2020; 22:5882184. [PMID: 32766753 DOI: 10.1093/bib/bbaa158] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/09/2020] [Revised: 06/19/2020] [Accepted: 06/22/2020] [Indexed: 12/13/2022] Open
Abstract
Studies have shown that the number of microbes in humans is almost 10 times that of cells. These microbes have been proven to play an important role in a variety of physiological processes, such as enhancing immunity, improving the digestion of gastrointestinal tract and strengthening metabolic function. In addition, in recent years, more and more research results have indicated that there are close relationships between the emergence of the human noncommunicable diseases and microbes, which provides a novel insight for us to further understand the pathogenesis of the diseases. An in-depth study about the relationships between diseases and microbes will not only contribute to exploring new strategies for the diagnosis and treatment of diseases but also significantly heighten the efficiency of new drugs development. However, applying the methods of biological experimentation to reveal the microbe-disease associations is costly and inefficient. In recent years, more and more researchers have constructed multiple computational models to predict microbes that are potentially associated with diseases. Here, we start with a brief introduction of microbes and databases as well as web servers related to them. Then, we mainly introduce four kinds of computational models, including score function-based models, network algorithm-based models, machine learning-based models and experimental analysis-based models. Finally, we summarize the advantages as well as disadvantages of them and set the direction for the future work of revealing microbe-disease associations based on computational models. We firmly believe that computational models are expected to be important tools in large-scale predictions of disease-related microbes.
Collapse
Affiliation(s)
- Yan Zhao
- School of Information and Control Engineering, China University of Mining
| | - Chun-Chun Wang
- School of Information and Control Engineering, China University of Mining
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining
| |
Collapse
|
12
|
Yuan L, Guo F, Wang L, Zou Q. Prediction of tumor metastasis from sequencing data in the era of genome sequencing. Brief Funct Genomics 2020; 18:412-418. [PMID: 31204784 DOI: 10.1093/bfgp/elz010] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2019] [Revised: 02/22/2019] [Accepted: 04/26/2019] [Indexed: 02/01/2023] Open
Abstract
Tumor metastasis is the key reason for the high mortality rate of tumor. Growing number of scholars have begun to pay attention to the research on tumor metastasis and have achieved satisfactory results in this field. The advent of the era of sequencing has enabled us to study cancer metastasis at the molecular level, which is essential for understanding the molecular mechanism of metastasis, identifying diagnostic markers and therapeutic targets and guiding clinical decision-making. We reviewed the metastasis-related studies using sequencing data, covering detection of metastasis origin sites, determination of metastasis potential and identification of distal metastasis sites. These findings include the discovery of relevant markers and the presentation of prediction tools. Finally, we discussed the challenge of studying metastasis considering the difficulty of obtaining metastatic cancer data, the complexity of tumor heterogeneity and the uncertainty of sample labels.
Collapse
Affiliation(s)
- Linlin Yuan
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Fei Guo
- College of Intelligence and Computing, Tianjin University, Tianjin, China
| | - Lei Wang
- College of Computer Engineering & Applied Mathematics, Changsha University, Changsha, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.,Center for Informational Biology, University of Electronic Science and Technology of China, Chengdu, China
| |
Collapse
|
13
|
Wang C, Zhang J, Wang X, Han K, Guo M. Pathogenic Gene Prediction Algorithm Based on Heterogeneous Information Fusion. Front Genet 2020; 11:5. [PMID: 32117433 PMCID: PMC7010852 DOI: 10.3389/fgene.2020.00005] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/26/2019] [Accepted: 01/06/2020] [Indexed: 12/23/2022] Open
Abstract
Complex diseases seriously affect people's physical and mental health. The discovery of disease-causing genes has become a target of research. With the emergence of bioinformatics and the rapid development of biotechnology, to overcome the inherent difficulties of the long experimental period and high cost of traditional biomedical methods, researchers have proposed many gene prioritization algorithms that use a large amount of biological data to mine pathogenic genes. However, because the currently known gene-disease association matrix is still very sparse and lacks evidence that genes and diseases are unrelated, there are limits to the predictive performance of gene prioritization algorithms. Based on the hypothesis that functionally related gene mutations may lead to similar disease phenotypes, this paper proposes a PU induction matrix completion algorithm based on heterogeneous information fusion (PUIMCHIF) to predict candidate genes involved in the pathogenicity of human diseases. On the one hand, PUIMCHIF uses different compact feature learning methods to extract features of genes and diseases from multiple data sources, making up for the lack of sparse data. On the other hand, based on the prior knowledge that most of the unknown gene-disease associations are unrelated, we use the PU-Learning strategy to treat the unknown unlabeled data as negative examples for biased learning. The experimental results of the PUIMCHIF algorithm regarding the three indexes of precision, recall, and mean percentile ranking (MPR) were significantly better than those of other algorithms. In the top 100 global prediction analysis of multiple genes and multiple diseases, the probability of recovering true gene associations using PUIMCHIF reached 50% and the MPR value was 10.94%. The PUIMCHIF algorithm has higher priority than those from other methods, such as IMC and CATAPULT.
Collapse
Affiliation(s)
- Chunyu Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Jie Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Xueping Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Ke Han
- School of Computer and Information Engineering, Harbin University of Commerce, Harbin, China
| | - Maozu Guo
- School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture, Beijing, China
- Beijing Key Laboratory of Intelligent Processing for Building Big Data, Beijing University of Civil Engineering and Architecture, Beijing, China
| |
Collapse
|
14
|
Xiong Y, Guo M, Ruan L, Kong X, Tang C, Zhu Y, Wang W. Heterogeneous network embedding enabling accurate disease association predictions. BMC Med Genomics 2019; 12:186. [PMID: 31865913 PMCID: PMC6927100 DOI: 10.1186/s12920-019-0623-3] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
BACKGROUND It is significant to identificate complex biological mechanisms of various diseases in biomedical research. Recently, the growing generation of tremendous amount of data in genomics, epigenomics, metagenomics, proteomics, metabolomics, nutriomics, etc., has resulted in the rise of systematic biological means of exploring complex diseases. However, the disparity between the production of the multiple data and our capability of analyzing data has been broaden gradually. Furthermore, we observe that networks can represent many of the above-mentioned data, and founded on the vector representations learned by network embedding methods, entities which are in close proximity but at present do not actually possess direct links are very likely to be related, therefore they are promising candidate subjects for biological investigation. RESULTS We incorporate six public biological databases to construct a heterogeneous biological network containing three categories of entities (i.e., genes, diseases, miRNAs) and multiple types of edges (i.e., the known relationships). To tackle the inherent heterogeneity, we develop a heterogeneous network embedding model for mapping the network into a low dimensional vector space in which the relationships between entities are preserved well. And in order to assess the effectiveness of our method, we conduct gene-disease as well as miRNA-disease associations predictions, results of which show the superiority of our novel method over several state-of-the-arts. Furthermore, many associations predicted by our method are verified in the latest real-world dataset. CONCLUSIONS We propose a novel heterogeneous network embedding method which can adequately take advantage of the abundant contextual information and structures of heterogeneous network. Moreover, we illustrate the performance of the proposed method on directing studies in biology, which can assist in identifying new hypotheses in biological investigation.
Collapse
Affiliation(s)
- Yun Xiong
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
- Shanghai Institute for Advance Communication and Data Science, Fudan University, Shanghai, China
| | - Mengjie Guo
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
- Shanghai Institute for Advance Communication and Data Science, Fudan University, Shanghai, China
| | - Lu Ruan
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
- Shanghai Institute for Advance Communication and Data Science, Fudan University, Shanghai, China
| | - Xiangnan Kong
- Department of Computer Science, Worcester Polytechnic Institute, Worcester, USA
| | - Chunlei Tang
- Brigham and Women’s Hospital, Harvard Medical School, Boston, USA
| | - Yangyong Zhu
- Shanghai Key Laboratory of Data Science, School of Computer Science, Fudan University, Shanghai, China
- Shanghai Institute for Advance Communication and Data Science, Fudan University, Shanghai, China
| | - Wei Wang
- Department of Computer Science, Scalable Analytics Institute (ScAi), University of California, Los Angeles, USA
| |
Collapse
|
15
|
Zhang Y, Chen M, Cheng X, Chen Z. LSGSP: a novel miRNA-disease association prediction model using a Laplacian score of the graphs and space projection federated method. RSC Adv 2019; 9:29747-29759. [PMID: 35531537 PMCID: PMC9071959 DOI: 10.1039/c9ra05554a] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2019] [Accepted: 09/09/2019] [Indexed: 12/31/2022] Open
Abstract
Lots of research findings have indicated that miRNAs (microRNAs) are involved in many important biological processes; their mutations and disorders are closely related to diseases, therefore, determining the associations between human diseases and miRNAs is key to understand pathogenic mechanisms. Existing biological experimental methods for identifying miRNA-disease associations are usually expensive and time consuming. Therefore, the development of efficient and reliable computational methods for identifying disease-related miRNAs has become an important topic in the field of biological research in recent years. In this study, we developed a novel miRNA-disease association prediction model using a Laplacian score of the graphs and space projection federated method (LSGSP). This integrates experimentally validated miRNA-disease associations, disease semantic similarity scores, miRNA functional scores, and miRNA family information to build a new disease similarity network and miRNA similarity network, and then obtains the global similarities of these networks through calculating the Laplacian score of the graphs, based on which the miRNA-disease weighted network can be constructed through combination with the miRNA-disease Boolean network. Finally, the miRNA-disease score was obtained via projecting the miRNA space and disease space onto the miRNA-disease weighted network. Compared with several other state-of-the-art methods, using leave-one-out cross validation (LOOCV) to evaluate the accuracy of LSGSP with respect to a benchmark dataset, prediction dataset and compare dataset, LSGSP showed excellent predictive performance with high AUC values of 0.9221, 0.9745 and 0.9194, respectively. In addition, for prostate neoplasms and lung neoplasms, the consistencies between the top 50 predicted miRNAs (obtained from LSGSP) and the results (confirmed from the updated HMDD, miR2Disease, and dbDEMC databases) reached 96% and 100%, respectively. Similarly, for isolated diseases (diseases not associated with any miRNAs), the consistencies between the top 50 predicted miRNAs (obtained from LSGSP) and the results (confirmed from the above-mentioned three databases) reached 98% and 100%, respectively. These results further indicate that LSGSP can effectively predict potential associations between miRNAs and diseases.
Collapse
Affiliation(s)
- Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology 541004 Guilin China
| | - Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology 421002 Hengyang China
| | - Xiaohui Cheng
- School of Information Science and Engineering, Guilin University of Technology 541004 Guilin China
| | - Zheng Chen
- School of Computer Science and Technology, Hunan Institute of Technology 421002 Hengyang China
| |
Collapse
|
16
|
Li C, Liu H, Hu Q, Que J, Yao J. A Novel Computational Model for Predicting microRNA-Disease Associations Based on Heterogeneous Graph Convolutional Networks. Cells 2019; 8:cells8090977. [PMID: 31455028 PMCID: PMC6769654 DOI: 10.3390/cells8090977] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 08/22/2019] [Accepted: 08/23/2019] [Indexed: 01/13/2023] Open
Abstract
Identifying the interactions between disease and microRNA (miRNA) can accelerate drugs development, individualized diagnosis, and treatment for various human diseases. However, experimental methods are time-consuming and costly. So computational approaches to predict latent miRNA-disease interactions are eliciting increased attention. But most previous studies have mainly focused on designing complicated similarity-based methods to predict latent interactions between miRNAs and diseases. In this study, we propose a novel computational model, termed heterogeneous graph convolutional network for miRNA-disease associations (HGCNMDA), which is based on known human protein-protein interaction (PPI) and integrates four biological networks: miRNA-disease, miRNA-gene, disease-gene, and PPI network. HGCNMDA achieved reliable performance using leave-one-out cross-validation (LOOCV). HGCNMDA is then compared to three state-of-the-art algorithms based on five-fold cross-validation. HGCNMDA achieves an AUC of 0.9626 and an average precision of 0.9660, respectively, which is ahead of other competitive algorithms. We further analyze the top-10 unknown interactions between miRNA and disease. In summary, HGCNMDA is a useful computational model for predicting miRNA-disease interactions.
Collapse
Affiliation(s)
- Chunyan Li
- School of Informatics, Xiamen University, Xiamen 361005, China
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| | - Hongju Liu
- College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
| | - Qian Hu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Jinlong Que
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Junfeng Yao
- School of Informatics, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
17
|
Chen M, Zhang Y, Li A, Li Z, Liu W, Chen Z. Bipartite Heterogeneous Network Method Based on Co-neighbor for MiRNA-Disease Association Prediction. Front Genet 2019; 10:385. [PMID: 31080459 PMCID: PMC6497741 DOI: 10.3389/fgene.2019.00385] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/28/2018] [Accepted: 04/10/2019] [Indexed: 12/22/2022] Open
Abstract
In recent years, miRNA variation and dysregulation have been found to be closely related to human tumors, and identifying miRNA-disease associations is helpful for understanding the mechanisms of disease or tumor development and is greatly significant for the prognosis, diagnosis, and treatment of human diseases. This article proposes a Bipartite Heterogeneous network link prediction method based on co-neighbor to predict miRNA-disease association (BHCN). According to the structural characteristics of the bipartite network, the concept of bipartite network co-neighbors is proposed, and the co-neighbors were used to represent the probability of association between disease and miRNA. To predict the isolated diseases and the new miRNA based on the association probability expressed by co-neighbors, we utilized the similarity between disease nodes and the similarity between miRNA nodes in heterogeneous networks to represent the association probability between disease and miRNA. The model's predictive performance was evaluated by the leave-one-out cross validation (LOOCV) on different datasets. The AUC value of BHCN on the gold benchmark dataset was 0.7973, and the AUC obtained on the prediction dataset was 0.9349, which was better than that of the classic global algorithm. In this case study, we conducted predictive studies on breast neoplasms and colon neoplasms. Most of the top 50 predicted results were confirmed by three databases, namely, HMDD, miR2disease, and dbDEMC, with accuracy rates of 96 and 82%. In addition, BHCN can be used for predicting isolated diseases (without any known associated diseases) and new miRNAs (without any known associated miRNAs). In the isolated disease case study, the top 50 of breast neoplasm and colon neoplasm potentials associated with miRNAs predicted an accuracy of 100 and 96%, respectively, thereby demonstrating the favorable predictive power of BHCN for potentially relevant miRNAs.
Collapse
Affiliation(s)
- Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Yi Zhang
- School of Information Science and Engineering, Guilin University of Technology, Guilin, China
| | - Ang Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Wenhua Liu
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| | - Zheng Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, China
| |
Collapse
|
18
|
Predicting the associations between microbes and diseases by integrating multiple data sources and path-based HeteSim scores. Neurocomputing 2019. [DOI: 10.1016/j.neucom.2018.09.054] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022]
|
19
|
Xiong Y, Ruan L, Guo M, Tang C, Kong X, Zhu Y, Wang W. Predicting Disease-related Associations by Heterogeneous Network Embedding. 2018 IEEE INTERNATIONAL CONFERENCE ON BIOINFORMATICS AND BIOMEDICINE (BIBM) 2018:548-555. [DOI: 10.1109/bibm.2018.8621538] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/03/2025]
|
20
|
Liu Y, Wang SL, Zhang JF. Prediction of Microbe-Disease Associations by Graph Regularized Non-Negative Matrix Factorization. J Comput Biol 2018; 25:1385-1394. [PMID: 30106318 DOI: 10.1089/cmb.2018.0072] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/06/2023] Open
Abstract
More and more evidence shows that microbes play crucial roles in human health and disease. The exploration of the relationship between microbes and diseases will help people to better understand the underlying pathogenesis and have important implications for disease diagnosis and prevention. However, the known associations between microbes and diseases are very less. We proposed a new method called non-negative matrix factorization microbe-disease associations (NMFMDA), which used Gaussian interaction profile kernel similarity measure, to calculate microbial similarity and disease similarity, and applied a logistic function to regulate disease similarity. And, based on the known microbe-disease associations, a graph-regularized non-negative matrix factorization model was utilized to simultaneously identify potential microbe-disease associations. Moreover, fivefold cross-validation was utilized to evaluate the performance of our method. It reached the reliable area under receiver operating characteristic curve (AUC) of 0.8891, higher than other state-of-the-art methods. Finally, the case studies on three complex human diseases (i.e., asthma, inflammatory bowel disease, and colon cancer) demonstrated the good performance of our method. In summary, our method can be considered as an effective computational model for predicting potential disease-microbe associations.
Collapse
Affiliation(s)
- Yue Liu
- College of Computer Science and Electronic Engineering, Hunan University , Changsha, Hunan 410082, China
| | - Shu-Lin Wang
- College of Computer Science and Electronic Engineering, Hunan University , Changsha, Hunan 410082, China
| | - Jun-Feng Zhang
- College of Computer Science and Electronic Engineering, Hunan University , Changsha, Hunan 410082, China
| |
Collapse
|
21
|
Global Similarity Method Based on a Two-tier Random Walk for the Prediction of microRNA-Disease Association. Sci Rep 2018; 8:6481. [PMID: 29691434 PMCID: PMC5915491 DOI: 10.1038/s41598-018-24532-7] [Citation(s) in RCA: 27] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/31/2018] [Accepted: 04/03/2018] [Indexed: 12/15/2022] Open
Abstract
microRNAs (miRNAs) mutation and maladjustment are related to the occurrence and development of human diseases. Studies on disease-associated miRNA have contributed to disease diagnosis and treatment. To address the problems, such as low prediction accuracy and failure to predict the relationship between new miRNAs and diseases and so on, we design a Laplacian score of graphs to calculate the global similarity of networks and propose a Global Similarity method based on a Two-tier Random Walk for the prediction of miRNA-disease association (GSTRW) to reveal the correlation between miRNAs and diseases. This method is a global approach that can simultaneously predict the correlation between all diseases and miRNAs in the absence of negative samples. Experimental results reveal that this method is better than existing approaches in terms of overall prediction accuracy and ability to predict orphan diseases and novel miRNAs. A case study on GSTRW for breast cancer and conlon cancer is also conducted, and the majority of miRNA-disease association can be verified by our experiment. This study indicates that this method is feasible and effective.
Collapse
|
22
|
Chen M, Peng Y, Li A, Li Z, Deng Y, Liu W, Liao B, Dai C. A novel information diffusion method based on network consistency for identifying disease related microRNAs. RSC Adv 2018; 8:36675-36690. [PMID: 35558942 PMCID: PMC9088870 DOI: 10.1039/c8ra07519k] [Citation(s) in RCA: 15] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/09/2018] [Accepted: 10/17/2018] [Indexed: 12/27/2022] Open
Abstract
The abnormal expression of miRNAs is directly related to the development of human diseases. Predicting the potential candidate miRNAs associated with diseases can contribute to the detection, diagnosis, treatment and prevention of human complex diseases. The effective inference of the calculation method of the relationship between miRNAs and diseases is an effective supplement to biological experiments. It is of great help in the prevention, treatment and prognosis of complex diseases. This paper proposes a novel information diffusion method based on network consistency (IDNC) for identifying disease related microRNAs. The model first synthesizes the miRNA family information and the miRNA function similarity to reconstruct the miRNA network, and reconstruct the disease network by using the known disease and miRNA-related information and the semantic score between diseases. Then the global similarity of the two networks is obtained by using the Laplacian score of graphs. The global similarity score is a measure of the similarity between diseases and miRNAs. The disease–miRNA relation network was reconstructed by integrating the global similarity relation. The network consistency diffusion seed is then obtained by combining the global similarity network with the reconstructed disease–miRNA association network. Thereafter, the stable diffusion spectrum is generated as the prediction score by using the restarted random walk algorithm. The AUC value obtained by performing the LOOCV in the gold benchmark dataset is 0.8814. The AUC value obtained by performing the LOOCV in the predictive dataset is 0.9512. Compared with other frontier methods, our method has higher accuracy, which is further illustrated by case studies of breast neoplasms and colon neoplasms to prove that IDNC is valuable. The abnormal expression of miRNAs is directly related to the development of human diseases.![]()
Collapse
Affiliation(s)
- Min Chen
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
- College of Information Science and Engineering
| | - Yan Peng
- College of International Communication
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Ang Li
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Zejun Li
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
- College of Information Science and Engineering
| | - Yingwei Deng
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Wenhua Liu
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha 410082
- China
| | - Chengqiu Dai
- College of Computer Science and Technology
- Hunan Institute of Technology
- 421002 Hengyang
- China
| |
Collapse
|
23
|
Abstract
Computational manipulation of knowledge is an important, and often under-appreciated, aspect of biomedical Data Science. The first Data Science initiative from the US National Institutes of Health was entitled "Big Data to Knowledge (BD2K)." The main emphasis of the more than $200M allocated to that program has been on "Big Data;" the "Knowledge" component has largely been the implicit assumption that the work will lead to new biomedical knowledge. However, there is long-standing and highly productive work in computational knowledge representation and reasoning, and computational processing of knowledge has a role in the world of Data Science. Knowledge-based biomedical Data Science involves the design and implementation of computer systems that act as if they knew about biomedicine. There are many ways in which a computational approach might act as if it knew something: for example, it might be able to answer a natural language question about a biomedical topic, or pass an exam; it might be able to use existing biomedical knowledge to rank or evaluate hypotheses; it might explain or interpret data in light of prior knowledge, either in a Bayesian or other sort of framework. These are all examples of automated reasoning that act on computational representations of knowledge. After a brief survey of existing approaches to knowledge-based data science, this position paper argues that such research is ripe for expansion, and expanded application.
Collapse
Affiliation(s)
- Lawrence E Hunter
- Computational Bioscience, University of Colorado School of Medicine, Aurora, CO 80045, USA ; ORCID: https://orcid.org/0000-0003-1455-3370
| |
Collapse
|
24
|
Li Z, Yuan X, Cui X, Liu X, Wang L, Zhang W, Lu Q, Zhu H. Optimal experimental conditions for Welan gum production by support vector regression and adaptive genetic algorithm. PLoS One 2017; 12:e0185942. [PMID: 29016652 PMCID: PMC5633192 DOI: 10.1371/journal.pone.0185942] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 09/21/2017] [Indexed: 11/19/2022] Open
Abstract
Welan gum is a kind of novel microbial polysaccharide, which is widely produced during the process of microbial growth and metabolism in different external conditions. Welan gum can be used as the thickener, suspending agent, emulsifier, stabilizer, lubricant, film-forming agent and adhesive usage in agriculture. In recent years, finding optimal experimental conditions to maximize the production is paid growing attentions. In this work, a hybrid computational method is proposed to optimize experimental conditions for producing Welan gum with data collected from experiments records. Support Vector Regression (SVR) is used to model the relationship between Welan gum production and experimental conditions, and then adaptive Genetic Algorithm (AGA, for short) is applied to search optimized experimental conditions. As results, a mathematic model of predicting production of Welan gum from experimental conditions is obtained, which achieves accuracy rate 88.36%. As well, a class of optimized experimental conditions is predicted for producing Welan gum 31.65g/L. Comparing the best result in chemical experiment 30.63g/L, the predicted production improves it by 3.3%. The results provide potential optimal experimental conditions to improve the production of Welan gum.
Collapse
Affiliation(s)
- Zhongwei Li
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Xiang Yuan
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Xuerong Cui
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Xin Liu
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Leiquan Wang
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Weishan Zhang
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Qinghua Lu
- College of Computer and Communication Engineering, China University of Petroleum, Qingdao 266580, Shandong, China
| | - Hu Zhu
- College of Chemistry and Materials, Fujian Normal University, Fuzhou 350007, China
- * E-mail:
| |
Collapse
|
25
|
Proctor CJ, Goljanek-Whysall K. Using computer simulation models to investigate the most promising microRNAs to improve muscle regeneration during ageing. Sci Rep 2017; 7:12314. [PMID: 28951568 PMCID: PMC5614911 DOI: 10.1038/s41598-017-12538-6] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/26/2017] [Accepted: 09/05/2017] [Indexed: 01/17/2023] Open
Abstract
MicroRNAs (miRNAs) regulate gene expression through interactions with target sites within mRNAs, leading to enhanced degradation of the mRNA or inhibition of translation. Skeletal muscle expresses many different miRNAs with important roles in adulthood myogenesis (regeneration) and myofibre hypertrophy and atrophy, processes associated with muscle ageing. However, the large number of miRNAs and their targets mean that a complex network of pathways exists, making it difficult to predict the effect of selected miRNAs on age-related muscle wasting. Computational modelling has the potential to aid this process as it is possible to combine models of individual miRNA:target interactions to form an integrated network. As yet, no models of these interactions in muscle exist. We created the first model of miRNA:target interactions in myogenesis based on experimental evidence of individual miRNAs which were next validated and used to make testable predictions. Our model confirms that miRNAs regulate key interactions during myogenesis and can act by promoting the switch between quiescent/proliferating/differentiating myoblasts and by maintaining the differentiation process. We propose that a threshold level of miR-1 acts in the initial switch to differentiation, with miR-181 keeping the switch on and miR-378 maintaining the differentiation and miR-143 inhibiting myogenesis.
Collapse
Affiliation(s)
- Carole J Proctor
- MRC/Arthritis Research UK Centre for Musculoskeletal Ageing (CIMA), Institute of Cellular Medicine and Newcastle University Institute for Ageing, Newcastle University, Newcastle upon Tyne, UK.
| | - Katarzyna Goljanek-Whysall
- MRC/Arthritis Research UK Centre for Musculoskeletal Ageing (CIMA), Department of Musculoskeletal Biology, Institute of Ageing and Chronic Disease, University of Liverpool, Liverpool, UK
| |
Collapse
|
26
|
Integrative Pathway Analysis of Genes and Metabolites Reveals Metabolism Abnormal Subpathway Regions and Modules in Esophageal Squamous Cell Carcinoma. Molecules 2017; 22:molecules22101599. [PMID: 28937628 PMCID: PMC6151487 DOI: 10.3390/molecules22101599] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/22/2017] [Revised: 09/20/2017] [Accepted: 09/20/2017] [Indexed: 02/07/2023] Open
Abstract
Aberrant metabolism is one of the main driving forces in the initiation and development of ESCC. Both genes and metabolites play important roles in metabolic pathways. Integrative pathway analysis of both genes and metabolites will thus help to interpret the underlying biological phenomena. Here, we performed integrative pathway analysis of gene and metabolite profiles by analyzing six gene expression profiles and seven metabolite profiles of ESCC. Multiple known and novel subpathways associated with ESCC, such as 'beta-Alanine metabolism', were identified via the cooperative use of differential genes, differential metabolites, and their positional importance information in pathways. Furthermore, a global ESCC-Related Metabolic (ERM) network was constructed and 31 modules were identified on the basis of clustering analysis in the ERM network. We found that the three modules located just to the center regions of the ERM network-especially the core region of Module_1-primarily consisted of aldehyde dehydrogenase (ALDH) superfamily members, which contributes to the development of ESCC. For Module_4, pyruvate and the genes and metabolites in its adjacent region were clustered together, and formed a core region within the module. Several prognostic genes, including GPT, ALDH1B1, ABAT, WBSCR22 and MDH1, appeared in the three center modules of the network, suggesting that they can become potentially prognostic markers in ESCC.
Collapse
|
27
|
Zou S, Zhang J, Zhang Z. A novel approach for predicting microbe-disease associations by bi-random walk on the heterogeneous network. PLoS One 2017; 12:e0184394. [PMID: 28880967 PMCID: PMC5589230 DOI: 10.1371/journal.pone.0184394] [Citation(s) in RCA: 38] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2017] [Accepted: 08/23/2017] [Indexed: 02/07/2023] Open
Abstract
Since the microbiome has a significant impact on human health and disease, microbe-disease associations can be utilized as a valuable resource for understanding disease pathogenesis and promoting disease diagnosis and prognosis. Accordingly, it is necessary for researchers to achieve a comprehensive and deep understanding of the associations between microbes and diseases. Nevertheless, to date, little work has been achieved in implementing novel human microbe-disease association prediction models. In this paper, we develop a novel computational model to predict potential microbe-disease associations by bi-random walk on the heterogeneous network (BiRWHMDA). The heterogeneous network was constructed by connecting the microbe similarity network and the disease similarity network via known microbe-disease associations. Microbe similarity and disease similarity were calculated by the Gaussian interaction profile kernel similarity measure; moreover, a logistic function was applied to regulate disease similarity. Additionally, leave-one-out cross validation and 5-fold cross validation were implemented to evaluate the predictive performance of our method; both cross validation methods performed well. The leave-one-out cross validation experiment results illustrate that our method outperforms other previously proposed methods. Furthermore, case studies on asthma and inflammatory bowel disease prove the favorable performance of our method. In conclusion, our method can be considered as an effective computational model for predicting novel microbe-disease associations.
Collapse
Affiliation(s)
- Shuai Zou
- School of Information Science and Engineering, Central South University, Changsha, Hunan, China
| | - Jingpu Zhang
- School of Information Science and Engineering, Central South University, Changsha, Hunan, China
| | - Zuping Zhang
- School of Information Science and Engineering, Central South University, Changsha, Hunan, China
| |
Collapse
|
28
|
Su Y, Wang B, Cheng F, Zhang L, Zhang X, Pan L. An algorithm based on positive and negative links for community detection in signed networks. Sci Rep 2017; 7:10874. [PMID: 28883663 PMCID: PMC5589891 DOI: 10.1038/s41598-017-11463-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/23/2017] [Accepted: 08/24/2017] [Indexed: 12/14/2022] Open
Abstract
Community detection problem in networks has received a great deal of attention during the past decade. Most of community detection algorithms took into account only positive links, but they are not suitable for signed networks. In our work, we propose an algorithm based on random walks for community detection in signed networks. Firstly, the local maximum degree node which has a larger degree compared with its neighbors is identified, and the initial communities are detected based on local maximum degree nodes. Then, we calculate a probability for the node to be attracted into a community by positive links based on random walks, as well as a probability for the node to be away from the community on the basis of negative links. If the former probability is larger than the latter, then it is added into a community; otherwise, the node could not be added into any current communities, and a new initial community may be identified. Finally, we use the community optimization method to merge similar communities. The proposed algorithm makes full use of both positive and negative links to enhance its performance. Experimental results on both synthetic and real-world signed networks demonstrate the effectiveness of the proposed algorithm.
Collapse
Affiliation(s)
- Yansen Su
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230039, China
| | - Bangju Wang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230039, China
| | - Fan Cheng
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230039, China
| | - Lei Zhang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230039, China
| | - Xingyi Zhang
- Key Lab of Intelligent Computing and Signal Processing of Ministry of Education, School of Computer Science and Technology, Anhui University, Hefei, 230039, China.
| | - Linqiang Pan
- Key Laboratory of Image Processing and Intelligent Control, School of Automation, Huazhong University of Science and Technology, Wuhan, 430074, China. .,School of Electric and Information Engineering, Zhengzhou University of Light Industry, Zhengzhou, 450002, Henan, China.
| |
Collapse
|
29
|
Mugunga I, Ju Y, Liu X, Huang X. Computational prediction of human disease-related microRNAs by path-based random walk. Oncotarget 2017; 8:58526-58535. [PMID: 28938576 PMCID: PMC5601672 DOI: 10.18632/oncotarget.17226] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/08/2017] [Accepted: 03/22/2017] [Indexed: 01/09/2023] Open
Abstract
MicroRNAs (miRNAs) are a class of small, endogenous RNAs that are 21–25 nucleotides in length. In animals and plants, miRNAs target specific genes for degradation or translation repression. Discovering disease-related miRNA is fundamental for understanding the pathogenesis of diseases. The association between miRNA and a disease is mainly determined via biological investigation, which is complicated by increased biological information due to big data from different databases. Researchers have utilized different computational methods to harmonize experimental approaches to discover miRNA that articulates restrictively in specific environmental situations. In this work, we present a prediction model that is based on the theory of path features and random walk to obtain a relevancy score of miRNA-related disease. In this model, highly ranked scores are potential miRNA-disease associations. Features were extracted from positive and negative samples of miRNA-disease association. Then, we compared our method with other presented models using the five-fold cross-validation method, which obtained an area under the receiver operating characteristic curve of 88.6%. This indicated that our method has a better performance compared to previous methods and will help future biological investigations.
Collapse
Affiliation(s)
- Israel Mugunga
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Ying Ju
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Xiangrong Liu
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| | - Xiaoyang Huang
- Department of Computer Science, Xiamen University, Xiamen, 361005, China
| |
Collapse
|
30
|
Yu H, Chen X, Lu L. Large-scale prediction of microRNA-disease associations by combinatorial prioritization algorithm. Sci Rep 2017; 7:43792. [PMID: 28317855 PMCID: PMC5357838 DOI: 10.1038/srep43792] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2016] [Accepted: 01/30/2017] [Indexed: 12/12/2022] Open
Abstract
Identification of the associations between microRNA molecules and human diseases from large-scale heterogeneous biological data is an important step for understanding the pathogenesis of diseases in microRNA level. However, experimental verification of microRNA-disease associations is expensive and time-consuming. To overcome the drawbacks of conventional experimental methods, we presented a combinatorial prioritization algorithm to predict the microRNA-disease associations. Importantly, our method can be used to predict microRNAs (diseases) associated with the diseases (microRNAs) without the known associated microRNAs (diseases). The predictive performance of our proposed approach was evaluated and verified by the internal cross-validations and external independent validations based on standard association datasets. The results demonstrate that our proposed method achieves the impressive performance for predicting the microRNA-disease association with the Area Under receiver operation characteristic Curve (AUC), 86.93%, which is indeed outperform the previous prediction methods. Particularly, we observed that the ensemble-based method by integrating the predictions of multiple algorithms can give more reliable and robust prediction than the single algorithm, with the AUC score improved to 92.26%. We applied our combinatorial prioritization algorithm to lung neoplasms and breast neoplasms, and revealed their top 30 microRNA candidates, which are in consistent with the published literatures and databases.
Collapse
Affiliation(s)
- Hua Yu
- State Key Laboratory of Plant Genomics, Institute of Genetic and Developmental Biology, Chinese Academy of Sciences, No. 1 West Beichen Road, Chaoyang District, Beijing, 100101, China
| | - Xiaojun Chen
- Key Lab of Agricultural Biotechnology of Ningxia, Agricultural Biotechnology Center, Ningxia Academy of Agriculture and Forestry Sciences, 590 Huanghe East Road, Jinfeng District, Yinchuan, Ningxia, 750002, China.
| | - Lu Lu
- Beijing Computing Center, Beijing Academy of Science and Technology, Building 3 BeiKe Industrial park, Fengxian road 7, Haidian District, Beijing, 100094, China
| |
Collapse
|
31
|
Gu C, Liao B, Li X, Cai L, Chen H, Li K, Yang J. Network-based collaborative filtering recommendation model for inferring novel disease-related miRNAs. RSC Adv 2017. [DOI: 10.1039/c7ra09229f] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/21/2022] Open
Abstract
According to the miRNA and disease similarity network, the unknown associations are predicted by combining the known miRNA-disease association network based on collaborative filtering recommendation algorithm.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Bo Liao
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Xiaoying Li
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Lijun Cai
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Haowen Chen
- College of Information Science and Engineering
- Hunan University
- Changsha
- China
| | - Keqin Li
- Department of Computer Science
- State University of New York
- New York 12561
- USA
| | - Jialiang Yang
- Department of Genetics and Gnomic Science
- Icahn School of Medicine at Mount Sinai
- New York 10029
- USA
| |
Collapse
|
32
|
Cai Y, Huang T. Systems genetics - deciphering the complex disease with a systems approach. Biochim Biophys Acta Gen Subj 2016; 1860:2611-2. [DOI: 10.1016/j.bbagen.2016.07.034] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/21/2022]
|
33
|
Gu C, Liao B, Li X, Li K. Network Consistency Projection for Human miRNA-Disease Associations Inference. Sci Rep 2016; 6:36054. [PMID: 27779232 PMCID: PMC5078764 DOI: 10.1038/srep36054] [Citation(s) in RCA: 68] [Impact Index Per Article: 7.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2016] [Accepted: 10/11/2016] [Indexed: 11/20/2022] Open
Abstract
Prediction and confirmation of the presence of disease-related miRNAs is beneficial to understand disease mechanisms at the miRNA level. However, the use of experimental verification to identify disease-related miRNAs is expensive and time-consuming. Effective computational approaches used to predict miRNA-disease associations are highly specific. In this study, we develop the Network Consistency Projection for miRNA-Disease Associations (NCPMDA) method to reveal the potential associations between miRNAs and diseases. NCPMDA is a non-parametric universal network-based method that can simultaneously predict miRNA-disease associations in all diseases but does not require negative samples. NCPMDA can also confirm the presence of miRNAs in isolated diseases (diseases without any known miRNA association). Leave-one-out cross validation and case studies have shown that the predictive performance of NCPMDA is superior over that of previous method.
Collapse
Affiliation(s)
- Changlong Gu
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Bo Liao
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Xiaoying Li
- College of Information Science and Engineering, Hunan University, Changsha, Hunan 410082, China
| | - Keqin Li
- Department of Computer Science, State University of New York, New Paltz, New York 12561, USA
| |
Collapse
|
34
|
OAHG: an integrated resource for annotating human genes with multi-level ontologies. Sci Rep 2016; 6:34820. [PMID: 27703231 PMCID: PMC5050487 DOI: 10.1038/srep34820] [Citation(s) in RCA: 70] [Impact Index Per Article: 7.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2016] [Accepted: 09/20/2016] [Indexed: 01/04/2023] Open
Abstract
OAHG, an integrated resource, aims to establish a comprehensive functional annotation resource for human protein-coding genes (PCGs), miRNAs, and lncRNAs by multi-level ontologies involving Gene Ontology (GO), Disease Ontology (DO), and Human Phenotype Ontology (HPO). Many previous studies have focused on inferring putative properties and biological functions of PCGs and non-coding RNA genes from different perspectives. During the past several decades, a few of databases have been designed to annotate the functions of PCGs, miRNAs, and lncRNAs, respectively. A part of functional descriptions in these databases were mapped to standardize terminologies, such as GO, which could be helpful to do further analysis. Despite these developments, there is no comprehensive resource recording the function of these three important types of genes. The current version of OAHG, release 1.0 (Jun 2016), integrates three ontologies involving GO, DO, and HPO, six gene functional databases and two interaction databases. Currently, OAHG contains 1,434,694 entries involving 16,929 PCGs, 637 miRNAs, 193 lncRNAs, and 24,894 terms of ontologies. During the performance evaluation, OAHG shows the consistencies with existing gene interactions and the structure of ontology. For example, terms with more similar structure could be associated with more associated genes (Pearson correlation γ2 = 0.2428, p < 2.2e-16).
Collapse
|
35
|
Accurate Identification of Cancerlectins through Hybrid Machine Learning Technology. Int J Genomics 2016; 2016:7604641. [PMID: 27478823 PMCID: PMC4961832 DOI: 10.1155/2016/7604641] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2016] [Revised: 05/24/2016] [Accepted: 06/14/2016] [Indexed: 01/03/2023] Open
Abstract
Cancerlectins are cancer-related proteins that function as lectins. They have been identified through computational identification techniques, but these techniques have sometimes failed to identify proteins because of sequence diversity among the cancerlectins. Advanced machine learning identification methods, such as support vector machine and basic sequence features (n-gram), have also been used to identify cancerlectins. In this study, various protein fingerprint features and advanced classifiers, including ensemble learning techniques, were utilized to identify this group of proteins. We improved the prediction accuracy of the original feature extraction methods and classification algorithms by more than 10% on average. Our work provides a basis for the computational identification of cancerlectins and reveals the power of hybrid machine learning techniques in computational proteomics.
Collapse
|