1
|
Zhang M, Wang J, Wang W, Yang G, Peng J. Predicting cell-type specific disease genes of diabetes with the biological network. Comput Biol Med 2024; 169:107849. [PMID: 38101116 DOI: 10.1016/j.compbiomed.2023.107849] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2023] [Revised: 11/21/2023] [Accepted: 12/11/2023] [Indexed: 12/17/2023]
Abstract
Type 2 diabetes (T2D) is a chronic condition that can lead to significant harm, such as heart disease, kidney disease, nerve damage, and blindness. Although T2D-related genes have been identified through Genome-wide association studies (GWAS) and various computational methods, the biological mechanism of T2D at the cell type level remains unclear. Exploring cell type-specific genes related to T2D is essential to understand the cellular mechanisms underlying the disease. To address this issue, we introduce DiGCellNet (predicting Disease Genes with Cell type specificity based on biological Networks), a model that integrates graph convolutional network (GCN) and multi-task learning (MTL) to predict T2D-associated cell type-specific genes based on the biological network. Our work represents the first attempt to predict cell type-specific disease genes using GCN and MTL. We evaluate our approach by predicting genes specific to four cell types and demonstrate that the proposed DiGCellNet outperforms other models that combine node embeddings with traditional machine learning algorithms. Moreover, DiGCellNet successfully identifies CALM1 as a gene specific to beta cell type in T2D cases, and this association is confirmed using an independent dataset. The code is available at https://github.com/23AIBox/23AIBox-DiGCellNet.
Collapse
Affiliation(s)
- Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China; The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi'an, 710072, China
| | - Jingru Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China; The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi'an, 710072, China
| | - Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China; The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi'an, 710072, China
| | - Guang Yang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China; The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China; Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China; The National Engineering Laboratory for Integrated Aerospace-Ground-Ocean Big Data Application Technology, Xi'an, 710072, China; School of Computer Science, Research and Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen, 518000, China.
| |
Collapse
|
2
|
Rahman A, Debnath T, Kundu D, Khan MSI, Aishi AA, Sazzad S, Sayduzzaman M, Band SS. Machine learning and deep learning-based approach in smart healthcare: Recent advances, applications, challenges and opportunities. AIMS Public Health 2024; 11:58-109. [PMID: 38617415 PMCID: PMC11007421 DOI: 10.3934/publichealth.2024004] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/19/2023] [Accepted: 12/18/2023] [Indexed: 04/16/2024] Open
Abstract
In recent years, machine learning (ML) and deep learning (DL) have been the leading approaches to solving various challenges, such as disease predictions, drug discovery, medical image analysis, etc., in intelligent healthcare applications. Further, given the current progress in the fields of ML and DL, there exists the promising potential for both to provide support in the realm of healthcare. This study offered an exhaustive survey on ML and DL for the healthcare system, concentrating on vital state of the art features, integration benefits, applications, prospects and future guidelines. To conduct the research, we found the most prominent journal and conference databases using distinct keywords to discover scholarly consequences. First, we furnished the most current along with cutting-edge progress in ML-DL-based analysis in smart healthcare in a compendious manner. Next, we integrated the advancement of various services for ML and DL, including ML-healthcare, DL-healthcare, and ML-DL-healthcare. We then offered ML and DL-based applications in the healthcare industry. Eventually, we emphasized the research disputes and recommendations for further studies based on our observations.
Collapse
Affiliation(s)
- Anichur Rahman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Tanoy Debnath
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
- Department of CSE, Green University of Bangladesh, 220/D, Begum Rokeya Sarani, Dhaka -1207, Bangladesh
| | - Dipanjali Kundu
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Md. Saikat Islam Khan
- Department of CSE, Mawlana Bhashani Science and Technology University, Tangail, Bangladesh
| | - Airin Afroj Aishi
- Department of Computing and Information System, Daffodil International University, Savar, Dhaka, Bangladesh
| | - Sadia Sazzad
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Mohammad Sayduzzaman
- Department of CSE, National Institute of Textile Engineering and Research (NITER), Constituent Institute of the University of Dhaka, Savar, Dhaka-1350
| | - Shahab S. Band
- Department of Information Management, International Graduate School of Artificial Intelligence, National Yunlin University of Science and Technology, Taiwan
| |
Collapse
|
3
|
Uzuner D, İlgün A, Düz E, Bozkurt FB, Çakır T. Multilayer Analysis of RNA Sequencing Data in Alzheimer's Disease to Unravel Molecular Mysteries. ADVANCES IN NEUROBIOLOGY 2024; 41:219-246. [PMID: 39589716 DOI: 10.1007/978-3-031-69188-1_9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 11/27/2024]
Abstract
Alzheimer's disease (AD) is a complex disease, and numerous cellular events may be involved in etiology. RNAseq-based transcriptome data hold multilayer information content, which could be crucial in unraveling molecular mysteries of AD. It enables quantification of gene expression levels, identification of genomic variants, and elucidation of splicing anomalies such as exon skipping and intron retention. Additional integration of this information into protein-protein interaction networks and genome-scale metabolic models from the literature has potential to decipher functional modules and affected mechanisms for complex scenarios such as AD. In this chapter, we review the application areas of the multilayer content of RNAseq and associated integrative approaches available, with a special focus on AD.
Collapse
Affiliation(s)
- Dilara Uzuner
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Atılay İlgün
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Elif Düz
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Fatma Betül Bozkurt
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey
| | - Tunahan Çakır
- Department of Bioengineering, Gebze Technical University, Gebze, Kocaeli, Turkey.
| |
Collapse
|
4
|
Wang S, Fang X, Wen X, Yang C, Yang Y, Zhang T. Prioritization of risk genes for Alzheimer's disease: an analysis framework using spatial and temporal gene expression data in the human brain based on support vector machine. Front Genet 2023; 14:1190863. [PMID: 37867597 PMCID: PMC10587557 DOI: 10.3389/fgene.2023.1190863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/02/2023] [Accepted: 09/26/2023] [Indexed: 10/24/2023] Open
Abstract
Background: Alzheimer's disease (AD) is a complex disorder, and its risk is influenced by multiple genetic and environmental factors. In this study, an AD risk gene prediction framework based on spatial and temporal features of gene expression data (STGE) was proposed. Methods: We proposed an AD risk gene prediction framework based on spatial and temporal features of gene expression data. The gene expression data of providers of different tissues and ages were used as model features. Human genes were classified as AD risk or non-risk sets based on information extracted from relevant databases. Support vector machine (SVM) models were constructed to capture the expression patterns of genes believed to contribute to the risk of AD. Results: The recursive feature elimination (RFE) method was utilized for feature selection. Data for 64 tissue-age features were obtained before feature selection, and this number was reduced to 19 after RFE was performed. The SVM models were built and evaluated using 19 selected and full features. The area under curve (AUC) values for the SVM model based on 19 selected features (0.740 [0.690-0.790]) and full feature sets (0.730 [0.678-0.769]) were very similar. Fifteen genes predicted to be risk genes for AD with a probability greater than 90% were obtained. Conclusion: The newly proposed framework performed comparably to previous prediction methods based on protein-protein interaction (PPI) network properties. A list of 15 candidate genes for AD risk was also generated to provide data support for further studies on the genetic etiology of AD.
Collapse
Affiliation(s)
- Shiyu Wang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Xixian Fang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Xiang Wen
- Hangzhou Institute for Advanced Study, University of Chinese Academy of Sciences, Beijing, China
| | - Congying Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Ying Yang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, China
| | - Tianxiao Zhang
- Department of Epidemiology and Biostatistics, School of Public Health, Xi’an Jiaotong University Health Science Center, Xi’an, China
- National Anti-Drug Laboratory Shaanxi Regional Center, Xi’an, China
| |
Collapse
|
5
|
Tu W, Ling G, Liu F, Hu F, Song X. GCSTI: A Single-Cell Pseudotemporal Trajectory Inference Method Based on Graph Compression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2945-2958. [PMID: 37037234 DOI: 10.1109/tcbb.2023.3266109] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/19/2023]
Abstract
The single-cell pseudotemporal trajectory inference is an important way to explore the process of developmental changes within a cell. Due to the uneven rate of cell growth, changes in gene expression depend less on the time of data collection and more on a cell's "internal clock". To overcome the challenges of gene analysis, and replicate biological developmental processes, several strategies have been put forth. However, due to the size of single-cell datasets, locating relevant signposts usually necessitate clustering analysis or a sizable amount of priori information. To this end, we propose a novel single-cell pseudotemporal trajectory inference technique: GCSTI method, which is based on graph compression and doesn't rely on a priori knowledge or clustering procedures, can handle the trajectory inference problem for a large network in a stable and efficient manner. Additionally, we simultaneously improve the pseudotime defining method currently employed in this study in order to obtain more trustworthy and beneficial outcomes for trajectory inference. Finally, we validate the efficacy and stability of the GCSTI method using datasets from human skeletal muscle myogenic cells and four simulated datasets.
Collapse
|
6
|
Wu W, Zhang Y, Liu G, Chi Z, Zhang A, Miao S, Lin C, Xu Q, Zhang Y. Potential protective effects of Huanglian Jiedu Decoction against COVID-19-associated acute kidney injury: A network-based pharmacological and molecular docking study. Open Med (Wars) 2023; 18:20230746. [PMID: 37533739 PMCID: PMC10390755 DOI: 10.1515/med-2023-0746] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/05/2023] [Revised: 06/06/2023] [Accepted: 06/16/2023] [Indexed: 08/04/2023] Open
Abstract
Corona virus disease 2019 (COVID-19) is prone to induce multiple organ damage. The kidney is one of the target organs of SARS-CoV-2, which is susceptible to inducing acute kidney injury (AKI). Huanglian Jiedu Decoction (HLJDD) is one of the recommended prescriptions for COVID-19 with severe complications. We used network pharmacology and molecular docking to explore the therapeutic and protective effects of HLJDD on COVID-19-associated AKI. Potential targets related to "HLJDD," "COVID-19," and "Acute Kidney Injury/Acute Renal Failure" were identified from several databases. A protein-protein interaction (PPI) network was constructed and screened the core targets according to the degree value. The target genes were then enriched using gene ontology and Kyoto Encyclopedia of Genes and Genomes. The bioactive components were docked with the core targets. A total of 65 active compounds, 85 common targets for diseases and drugs were obtained; PPI network analysis showed that the core protein mainly involved JUN, RELA, and AKT1; functional analysis showed that these target genes were mainly involved in lipid and atherosclerosis signaling pathway and IL-17 signal pathway. The results of molecular docking showed that JUN, RELA, and AKT1 had good binding activity with the effective chemical components of HLJDD. In conclusion, HLJDD can be used as a potential therapeutic drug for COVID-19-associated AKI.
Collapse
Affiliation(s)
- Weichu Wu
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| | - Yonghai Zhang
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| | - Guoyuan Liu
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| | - Zepai Chi
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| | - Aiping Zhang
- School of Integrative Medicine, Gansu University of Traditional Chinese Medicine, Lanzhou, 730000, PR China
| | - Shuying Miao
- Department of Urology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, China
| | - Chengchuang Lin
- Department of Traditional Chinese Medicine, Shantou Central Hospital, Shantou, 515031, PR China
| | - Qingchun Xu
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| | - Yuanfeng Zhang
- Department of Urology, Shantou Central Hospital, Shantou, 515031, PR China
| |
Collapse
|
7
|
Shah E, Maji P. Multi-View Kernel Learning for Identification of Disease Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2278-2290. [PMID: 37027602 DOI: 10.1109/tcbb.2023.3247033] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Gene expression data sets and protein-protein interaction (PPI) networks are two heterogeneous data sources that have been extensively studied, due to their ability to capture the co-expression patterns among genes and their topological connections. Although they depict different traits of the data, both of them tend to group co-functional genes together. This phenomenon agrees with the basic assumption of multi-view kernel learning, according to which different views of the data contain a similar inherent cluster structure. Based on this inference, a new multi-view kernel learning based disease gene identification algorithm, termed as DiGId, is put forward. A novel multi-view kernel learning approach is proposed that aims to learn a consensus kernel, which efficiently captures the heterogeneous information of individual views as well as depicts the underlying inherent cluster structure. Some low-rank constraints are imposed on the learned multi-view kernel, so that it can effectively be partitioned into k or fewer clusters. The learned joint cluster structure is used to curate a set of potential disease genes. Moreover, a novel approach is put forward to quantify the importance of each view. In order to demonstrate the effectiveness of the proposed approach in capturing the relevant information depicted by individual views, an extensive analysis is performed on four different cancer-related gene expression data sets and PPI network, considering different similarity measures.
Collapse
|
8
|
Zhang L, Lu D, Bi X, Zhao K, Yu G, Quan N. Predicting disease genes based on multi-head attention fusion. BMC Bioinformatics 2023; 24:162. [PMID: 37085750 PMCID: PMC10122338 DOI: 10.1186/s12859-023-05285-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2022] [Accepted: 04/12/2023] [Indexed: 04/23/2023] Open
Abstract
BACKGROUND The identification of disease-related genes is of great significance for the diagnosis and treatment of human disease. Most studies have focused on developing efficient and accurate computational methods to predict disease-causing genes. Due to the sparsity and complexity of biomedical data, it is still a challenge to develop an effective multi-feature fusion model to identify disease genes. RESULTS This paper proposes an approach to predict the pathogenic gene based on multi-head attention fusion (MHAGP). Firstly, the heterogeneous biological information networks of disease genes are constructed by integrating multiple biomedical knowledge databases. Secondly, two graph representation learning algorithms are used to capture the feature vectors of gene-disease pairs from the network, and the features are fused by introducing multi-head attention. Finally, multi-layer perceptron model is used to predict the gene-disease association. CONCLUSIONS The MHAGP model outperforms all of other methods in comparative experiments. Case studies also show that MHAGP is able to predict genes potentially associated with diseases. In the future, more biological entity association data, such as gene-drug, disease phenotype-gene ontology and so on, can be added to expand the information in heterogeneous biological networks and achieve more accurate predictions. In addition, MHAGP with strong expansibility can be used for potential tasks such as gene-drug association and drug-disease association prediction.
Collapse
Affiliation(s)
- Linlin Zhang
- College of Software Engineering, Xinjiang University, Urumqi, China.
| | - Dianrong Lu
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Xuehua Bi
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Kai Zhao
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| | - Guanglei Yu
- Medical Engineering and Technology College, Xinjiang Medical University, Urumqi, China
| | - Na Quan
- College of information Science and Engineering, Xinjiang University, Urumqi, China
| |
Collapse
|
9
|
Jamali AA, Kusalik A, Wu FX. NMTF-DTI: A Nonnegative Matrix Tri-factorization Approach With Multiple Kernel Fusion for Drug-Target Interaction Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:586-594. [PMID: 34914594 DOI: 10.1109/tcbb.2021.3135978] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Prediction of drug-target interactions (DTIs) plays a significant role in drug development and drug discovery. Although this task requires a large investment in terms of time and cost, especially when it is performed experimentally, the results are not necessarily significant. Computational DTI prediction is a shortcut to reduce the risks of experimental methods. In this study, we propose an effective approach of nonnegative matrix tri-factorization, referred to as NMTF-DTI, to predict the interaction scores between drugs and targets. NMTF-DTI utilizes multiple kernels (similarity measures) for drugs and targets and Laplacian regularization to boost the prediction performance. The performance of NMTF-DTI is evaluated via cross-validation and is compared with existing DTI prediction methods in terms of the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision and recall curve (AUPR). We evaluate our method on four gold standard datasets, comparing to other state-of-the-art methods. Cross-validation and a separate, manually created dataset are used to set parameters. The results show that NMTF-DTI outperforms other competing methods. Moreover, the results of a case study also confirm the superiority of NMTF-DTI.
Collapse
|
10
|
Zhu Y, Zhang H, Yang Y, Zhang C, Ou-Yang L, Bai L, Deng M, Yi M, Liu S, Wang C. Discovery of pan-cancer related genes via integrative network analysis. Brief Funct Genomics 2022; 21:325-338. [PMID: 35760070 DOI: 10.1093/bfgp/elac012] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Revised: 05/14/2022] [Accepted: 05/25/2022] [Indexed: 01/02/2023] Open
Abstract
Identification of cancer-related genes is helpful for understanding the pathogenesis of cancer, developing targeted drugs and creating new diagnostic and therapeutic methods. Considering the complexity of the biological laboratory methods, many network-based methods have been proposed to identify cancer-related genes at the global perspective with the increasing availability of high-throughput data. Some studies have focused on the tissue-specific cancer networks. However, cancers from different tissues may share common features, and those methods may ignore the differences and similarities across cancers during the establishment of modeling. In this work, in order to make full use of global information of the network, we first establish the pan-cancer network via differential network algorithm, which not only contains heterogeneous data across multiple cancer types but also contains heterogeneous data between tumor samples and normal samples. Second, the node representation vectors are learned by network embedding. In contrast to ranking analysis-based methods, with the help of integrative network analysis, we transform the cancer-related gene identification problem into a binary classification problem. The final results are obtained via ensemble classification. We further applied these methods to the most commonly used gene expression data involving six tissue-specific cancer types. As a result, an integrative pan-cancer network and several biologically meaningful results were obtained. As examples, nine genes were ultimately identified as potential pan-cancer-related genes. Most of these genes have been reported in published studies, thus showing our method's potential for application in identifying driver gene candidates for further biological experimental verification.
Collapse
Affiliation(s)
- Yuan Zhu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
- Key Laboratory of Computational Neuroscience and Brain-Inspired Intelligence(Fudan University), Ministry of Education, Handan Road, 200433, Shanghai, China
| | - Houwang Zhang
- Electrical Engineering, City University of HongKong, Kowloon, 999077, HongKong, China
| | - Yuanhang Yang
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Chaoyang Zhang
- School of Computing Sciences and Computer Engineering, The University of Southern Mississippi, Hattiesburg, USA
| | - Le Ou-Yang
- Guangdong Key Laboratory of Intelligent Information Processing and Shenzhen Key Laboratory of Media Security, Shenzhen University, Nanhai Avenue, 518060, Shenzhen, China
| | - Litai Bai
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Minghua Deng
- School of Mathematical Sciences, Peking University, No.5 Yiheyuan Road, 100871, Beijing, China
| | - Ming Yi
- School of Mathematics and Physics, China University of Geosciences, Lumo Road, 430074, Wuhan, China
| | - Song Liu
- School of Automation, China University of Geosciences, Lumo Road, 430074, Wuhan, China
- Hubei Key Laboratory of Advanced Control and Intelligent Automation for Complex Systems, Lumo Road, 430074, Wuhan, China
- Engineering Research Center of Intelligent Technology for Geo-Exploration, Lumo Road, 430074, Wuhan, China
| | - Chao Wang
- Hepatic Surgery Center, Institute of Hepato-Pancreato-Biliary Surgery, Department of Surgery, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Jiefang Avenue, 430030, Wuhan, China
| |
Collapse
|
11
|
Jamali AA, Tan Y, Kusalik A, Wu FX. NTD-DR: Nonnegative tensor decomposition for drug repositioning. PLoS One 2022; 17:e0270852. [PMID: 35862409 PMCID: PMC9302855 DOI: 10.1371/journal.pone.0270852] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/09/2022] [Accepted: 06/20/2022] [Indexed: 12/12/2022] Open
Abstract
Computational drug repositioning aims to identify potential applications of existing drugs for the treatment of diseases for which they were not designed. This approach can considerably accelerate the traditional drug discovery process by decreasing the required time and costs of drug development. Tensor decomposition enables us to integrate multiple drug- and disease-related data to boost the performance of prediction. In this study, a nonnegative tensor decomposition for drug repositioning, NTD-DR, is proposed. In order to capture the hidden information in drug-target, drug-disease, and target-disease networks, NTD-DR uses these pairwise associations to construct a three-dimensional tensor representing drug-target-disease triplet associations and integrates them with similarity information of drugs, targets, and disease to make a prediction. We compare NTD-DR with recent state-of-the-art methods in terms of the area under the receiver operating characteristic (ROC) curve (AUC) and the area under the precision and recall curve (AUPR) and find that our method outperforms competing methods. Moreover, case studies with five diseases also confirm the reliability of predictions made by NTD-DR. Our proposed method identifies more known associations among the top 50 predictions than other methods. In addition, novel associations identified by NTD-DR are validated by literature analyses.
Collapse
Affiliation(s)
- Ali Akbar Jamali
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Yuting Tan
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- School of Mathematics and Statistics, Huazhong Normal University, Wuhan, China
| | - Anthony Kusalik
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
- * E-mail: (AK); (FXW)
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- * E-mail: (AK); (FXW)
| |
Collapse
|
12
|
Yu G, Yang Y, Yan Y, Guo M, Zhang X, Wang J. DeepIDA: Predicting Isoform-Disease Associations by Data Fusion and Deep Neural Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2166-2176. [PMID: 33571094 DOI: 10.1109/tcbb.2021.3058801] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
Alternative splicing produces different isoforms from the same gene locus, it is an important mechanism for regulating gene expression and proteome diversity. Although the prediction of gene(ncRNA)-disease associations has been extensively studied, few (or no) computational solutions have been proposed for the prediction of isoform-disease association (IDA) at a large scale, mainly due to the lack of disease annotations of isoforms. However, increasing evidences confirm the associations between diseases and isoforms, which can more precisely uncover the pathology of complex diseases. Therefore, it is highly desirable to predict IDAs. To bridge this gap, we propose a deep neural network based solution (DeepIDA) to fuse multi-type genomics and transcriptomics data to predict IDAs. Particularly, DeepIDA uses gene-isoform relations to dispatch gene-disease associations to isoforms. In addition, it utilizes two DNN sub-networks with different structures to capture nucleotide and expression features of isoforms, Gene Ontology data and miRNA target data, respectively. After that, these two sub-networks are merged in a dense layer to predict IDAs. The experimental results on public datasets show that DeepIDA can effectively predict IDAs with AUPRC (area under the precision-recall curve) of 0.9141, macro F-measure of 0.9155, G-mean of 0.9278 and balanced accuracy of 0.9303 across 732 diseases, which are much higher than those of competitive methods. Further study on sixteen isoform-disease association cases again corroborates the superiority of DeepIDA. The code of DeepIDA is available at http://mlda.swu.edu.cn/codes.php?name=DeepIDA.
Collapse
|
13
|
Xie W, Zheng Z, Zhang W, Huang L, Lin Q, Wong KC. SRG-vote: Predicting miRNA-gene relationships via embedding and LSTM ensemble. IEEE J Biomed Health Inform 2022; 26:4335-4344. [PMID: 35471879 DOI: 10.1109/jbhi.2022.3169542] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/10/2022]
Abstract
AbstractTargeted therapy for one for a set of genes has made it possible to apply precision medicine for different patients due to the existence of tumor heterogeneity. However, how to regulate those genes are still problematic. One of the natural regulators of genes is microRNAs. Thus, a better understanding of the miRNA-gene interaction mechanism might contribute to future diagnosis, prevention, and cancer therapy. The interactions between microRNA and genes play an essential role in molecular genetics. The in-vivo experiments validating the relationships between them are time-consuming, money-costly, and labor-intensive. With the development of high-throughput technology, we dealt with tons of biological data. However, extracting features from tremendous raw data and making a mathematical model is still a challenging topic. Machine learning and deep learning algorithms have become powerful tools in dealing with biological data. Inspired by this, in this paper, we propose a model that combines features/embedding extraction methods, deep learning algorithms, and a voting system. We leverage doc2vec to generate sequential embedding from molecular sequences. The role2vec, GCN, and GMM for geometrical embedding were generated from the complex network from similarity and pair-wise datasets. For the deep learning algorithms, we leveraged LSTM and Bi-LSTM according to different embedding and features. Finally, we adopted a voting system to balance results from different data sources. The results have shown that our voting system could achieve a higher AUC than the existing benchmark. The case studies demonstrate that our model could reveal potential relationships between miRNAs and genes. The source code, features, and predictive results can be downloaded at https://github.com/Xshelton/SRG-vote.
Collapse
|
14
|
Zhang Y, Chen L, Li S. CIPHER-SC: Disease-Gene Association Inference Using Graph Convolution on a Context-Aware Network With Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:819-829. [PMID: 32809944 DOI: 10.1109/tcbb.2020.3017547] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Inference of disease-gene associations helps unravel the pathogenesis of diseases and contributes to the treatment. Although many machine learning-based methods have been developed to predict causative genes, accurate association inference remains challenging. One major reason is the inaccurate feature selection and accumulation of error brought by commonly used multi-stage training architecture. In addition, the existing methods do not incorporate cell-type-specific information, thus fail to study gene functions at a higher resolution. Therefore, we introduce single-cell transcriptome data and construct a context-aware network to unbiasedly integrate all data sources. Then we develop a graph convolution-based approach named CIPHER-SC to realize a complete end-to-end learning architecture. Our approach outperforms four state-of-the-art approaches in five-fold cross-validations on three distinct test sets with the best AUC of 0.9501, demonstrating its stable ability either to predict the novel genes or to predict with genetic basis. The ablation study shows that our complete end-to-end design and unbiased data integration boost the performance from 0.8727 to 0.9443 in AUC. The addition of single-cell data further improves the prediction accuracy and makes our results be enriched for cell-type-specific genes. These results confirm the ability of CIPHER-SC to discover reliable disease genes. Our implementation is available at http://github.com/YidingZhang117/CIPHER-SC.
Collapse
|
15
|
Wang L, Wu M, Wu Y, Zhang X, Li S, He M, Zhang F, Wang Y, Li J. Prediction of the Disease Causal Genes Based on Heterogeneous Network and Multi-Feature Combination Method. Comput Biol Chem 2022; 97:107639. [DOI: 10.1016/j.compbiolchem.2022.107639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/10/2021] [Revised: 01/05/2022] [Accepted: 02/07/2022] [Indexed: 11/30/2022]
|
16
|
The Road to Personalized Medicine in Alzheimer’s Disease: The Use of Artificial Intelligence. Biomedicines 2022; 10:biomedicines10020315. [PMID: 35203524 PMCID: PMC8869403 DOI: 10.3390/biomedicines10020315] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/29/2021] [Revised: 01/21/2022] [Accepted: 01/24/2022] [Indexed: 02/05/2023] Open
Abstract
Dementia remains an extremely prevalent syndrome among older people and represents a major cause of disability and dependency. Alzheimer’s disease (AD) accounts for the majority of dementia cases and stands as the most common neurodegenerative disease. Since age is the major risk factor for AD, the increase in lifespan not only represents a rise in the prevalence but also adds complexity to the diagnosis. Moreover, the lack of disease-modifying therapies highlights another constraint. A shift from a curative to a preventive approach is imminent and we are moving towards the application of personalized medicine where we can shape the best clinical intervention for an individual patient at a given point. This new step in medicine requires the most recent tools and analysis of enormous amounts of data where the application of artificial intelligence (AI) plays a critical role on the depiction of disease–patient dynamics, crucial in reaching early/optimal diagnosis, monitoring and intervention. Predictive models and algorithms are the key elements in this innovative field. In this review, we present an overview of relevant topics regarding the application of AI in AD, detailing the algorithms and their applications in the fields of drug discovery, and biomarkers.
Collapse
|
17
|
Wang S, Li J, Wang Y. M2PP: a novel computational model for predicting drug-targeted pathogenic proteins. BMC Bioinformatics 2022; 23:7. [PMID: 34983358 PMCID: PMC8728953 DOI: 10.1186/s12859-021-04522-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/15/2021] [Accepted: 12/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Detecting pathogenic proteins is the origin way to understand the mechanism and resist the invasion of diseases, making pathogenic protein prediction develop into an urgent problem to be solved. Prediction for genome-wide proteins may be not necessarily conducive to rapidly cure diseases as developing new drugs specifically for the predicted pathogenic protein always need major expenditures on time and cost. In order to facilitate disease treatment, computational method to predict pathogenic proteins which are targeted by existing drugs should be exploited. RESULTS In this study, we proposed a novel computational model to predict drug-targeted pathogenic proteins, named as M2PP. Three types of features were presented on our constructed heterogeneous network (including target proteins, diseases and drugs), which were based on the neighborhood similarity information, drug-inferred information and path information. Then, a random forest regression model was trained to score unconfirmed target-disease pairs. Five-fold cross-validation experiment was implemented to evaluate model's prediction performance, where M2PP achieved advantageous results compared with other state-of-the-art methods. In addition, M2PP accurately predicted high ranked pathogenic proteins for common diseases with public biomedical literature as supporting evidence, indicating its excellent ability. CONCLUSIONS M2PP is an effective and accurate model to predict drug-targeted pathogenic proteins, which could provide convenience for the future biological researches.
Collapse
Affiliation(s)
- Shiming Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China
| | - Jie Li
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
| | - Yadong Wang
- School of Computer Science and Technology, Harbin Institute of Technology, Harbin, Heilongjiang, 150001, China.
| |
Collapse
|
18
|
Chen X, Wang Y, Ma Y, Wang R, Zhao D. To explore the Radix Paeoniae Rubra-Flos Carthami herb pair's potential mechanism in the treatment of ischemic stroke by network pharmacology and molecular docking technology. Medicine (Baltimore) 2021; 100:e27752. [PMID: 34889224 PMCID: PMC8663872 DOI: 10.1097/md.0000000000027752] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Accepted: 10/27/2021] [Indexed: 01/05/2023] Open
Abstract
To explore the Radix Paeoniae Rubra-Flos Carthami herb pair's (RPR-FC) potential mechanism in treating ischemic stroke (IS) by network pharmacology and molecular docking technology.The Traditional Chinese Medicine Systems Pharmacology Database was used to screen the active components of the RPR-FC, and Cytoscape 3.8 software was used to construct a network map of its active components and targets of action. The GeneCards and OMIM databases were used to identify disease targets of IS, and the common targets were chosen as research targets and imported into the STRING database to construct a protein-protein interaction network map of these targets. R language software was used to analyze the enrichment of GO terms and KEGG pathways, and explore the mechanisms of these targets. Molecular docking technology was used to verify that the RPR-FC components had a good bonding activity with their potential targets.A total of 44 active components, which corresponded to 197 targets, were identified in the RPR-FC. There were 139 common targets between the herb pair and IS. GO functional enrichment analysis revealed 2253 biological process entries, 72 cellular components entries, and 183 molecular functions entries. KEGG pathway enrichment analysis was mainly related to the NF-kappa B signaling pathway, the TNF signaling pathway, apoptosis, the MAPK signaling pathway, the PI3K-Akt signaling pathway, the VEGF signaling pathway, etc. The molecular docking results showed the components that docked well with key targets were quercetin, luteolin, kaempferol, and baicalein.The active components (quercetin, luteolin, kaempferol, and baicalein) of the RPR-FC and their targets act on proteins such as MAPK1, AKT1, VEGFA, and CASP3, which are closely related to IS.1 These targets are closely related to the NF-kappa B signaling pathway, the MAPK signaling pathway, the PI3K-Akt signaling pathway, the VEGF signaling pathway, and other signaling pathways. These pathways are involved in the recovery of nerve function, angiogenesis, and neuronal apoptosis and the regulation of inflammatory factors, which may have a therapeutic effect on IS.
Collapse
Affiliation(s)
- Xingyu Chen
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, China
| | - Yue Wang
- Department of Encephalopathy, The Affiliated Hospital to Changchun University of Chinese Medicine, Changchun, China
| | - Ying Ma
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, China
| | - Ruonan Wang
- College of Traditional Chinese Medicine, Changchun University of Chinese Medicine, Changchun, China
| | - Dexi Zhao
- Department of Encephalopathy, The Affiliated Hospital to Changchun University of Chinese Medicine, Changchun, China
| |
Collapse
|
19
|
Wang W, Han R, Zhang M, Wang Y, Wang T, Wang Y, Shang X, Peng J. A network-based method for brain disease gene prediction by integrating brain connectome and molecular network. Brief Bioinform 2021; 23:6415315. [PMID: 34727570 DOI: 10.1093/bib/bbab459] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/30/2021] [Revised: 09/18/2021] [Accepted: 10/07/2021] [Indexed: 12/27/2022] Open
Abstract
Brain disease gene identification is critical for revealing the biological mechanism and developing drugs for brain diseases. To enhance the identification of brain disease genes, similarity-based computational methods, especially network-based methods, have been adopted for narrowing down the searching space. However, these network-based methods only use molecular networks, ignoring brain connectome data, which have been widely used in many brain-related studies. In our study, we propose a novel framework, named brainMI, for integrating brain connectome data and molecular-based gene association networks to predict brain disease genes. For the consistent representation of molecular-based network data and brain connectome data, brainMI first constructs a novel gene network, called brain functional connectivity (BFC)-based gene network, based on resting-state functional magnetic resonance imaging data and brain region-specific gene expression data. Then, a multiple network integration method is proposed to learn low-dimensional features of genes by integrating the BFC-based gene network and existing protein-protein interaction networks. Finally, these features are utilized to predict brain disease genes based on a support vector machine-based model. We evaluate brainMI on four brain diseases, including Alzheimer's disease, Parkinson's disease, major depressive disorder and autism. brainMI achieves of 0.761, 0.729, 0.728 and 0.744 using the BFC-based gene network alone and enhances the molecular network-based performance by 6.3% on average. In addition, the results show that brainMI achieves higher performance in predicting brain disease genes compared to the existing three state-of-the-art methods.
Collapse
Affiliation(s)
- Wei Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Ruijiang Han
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Menghan Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yuxian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Tao Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Yongtian Wang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| | - Jiajie Peng
- School of Computer Science, Northwestern Polytechnical University, Xi'an, 710072, China.,Key Laboratory of Big Data Storage and Management, Northwestern Polytechnical University, Ministry of Industry and Information Technology, Xi'an, 710072, China
| |
Collapse
|
20
|
Optimal artificial neural network-based data mining technique for stress prediction in working employees. Soft comput 2021. [DOI: 10.1007/s00500-021-06058-9] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/20/2022]
|
21
|
Molecular mechanisms of An-Chuan Granule for the treatment of asthma based on a network pharmacology approach and experimental validation. Biosci Rep 2021; 41:228000. [PMID: 33645621 PMCID: PMC7990088 DOI: 10.1042/bsr20204247] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/30/2020] [Revised: 02/25/2021] [Accepted: 02/26/2021] [Indexed: 12/12/2022] Open
Abstract
An-Chuan Granule (ACG), a traditional Chinese medicine (TCM) formula, is an effective treatment for asthma but its pharmacological mechanism remains poorly understood. In the present study, network pharmacology was applied to explore the potential mechanism of ACG in the treatment of asthma. The tumor necrosis factor (TNF), Toll-like receptor (TLR), and Th17 cell differentiation-related, nucleotide-binding oligomerization domain (NOD)-like receptor, and NF-kappaB pathways were identified as the most significant signaling pathways involved in the therapeutic effect of ACG on asthma. A mouse asthma model was established using ovalbumin (OVA) to verify the effect of ACG and the underlying mechanism. The results showed that ACG treatment not only attenuated the clinical symptoms, but also reduced inflammatory cell infiltration, mucus secretion and MUC5AC production in lung tissue of asthmatic mice. In addition, ACG treatment notably decreased the inflammatory cell numbers in bronchoalveolar lavage fluid (BALF) and the levels of pro-inflammatory cytokines (including IL-6, IL-17, IL-23, TNF-alpha, IL-1beta and TGF-beta) in lung tissue of asthmatic mice. In addition, ACG treatment remarkably down-regulated the expression of TLR4, p-P65, NLRP3, Caspase-1 and adenosquamous carcinoma (ASC) in lung tissue. Further, ACG treatment decreased the expression of receptor-related orphan receptor (RORγt) in lung tissue but increased that of Forkhead box (Foxp3). In conclusion, the above results demonstrate that ACG alleviates the severity of asthma in a ´multi-compound and multi-target’ manner, which provides a basis for better understanding of the application of ACG in the treatment of asthma.
Collapse
|
22
|
Jamali AA, Kusalik A, Wu FX. MDIPA: a microRNA-drug interaction prediction approach based on non-negative matrix factorization. Bioinformatics 2021; 36:5061-5067. [PMID: 33212495 DOI: 10.1093/bioinformatics/btaa577] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2020] [Revised: 05/27/2020] [Accepted: 06/11/2020] [Indexed: 02/02/2023] Open
Abstract
MOTIVATION Evidence has shown that microRNAs, one type of small biomolecule, regulate the expression level of genes and play an important role in the development or treatment of diseases. Drugs, as important chemical compounds, can interact with microRNAs and change their functions. The experimental identification of microRNA-drug interactions is time-consuming and expensive. Therefore, it is appealing to develop effective computational approaches for predicting microRNA-drug interactions. RESULTS In this study, a matrix factorization-based method, called the microRNA-drug interaction prediction approach (MDIPA), is proposed for predicting unknown interactions among microRNAs and drugs. Specifically, MDIPA utilizes experimentally validated interactions between drugs and microRNAs, drug similarity and microRNA similarity to predict undiscovered interactions. A path-based microRNA similarity matrix is constructed, while the structural information of drugs is used to establish a drug similarity matrix. To evaluate its performance, our MDIPA is compared with four state-of-the-art prediction methods with an independent dataset and cross-validation. The results of both evaluation methods confirm the superior performance of MDIPA over other methods. Finally, the results of molecular docking in a case study with breast cancer confirm the efficacy of our approach. In conclusion, MDIPA can be effective in predicting potential microRNA-drug interactions. AVAILABILITY AND IMPLEMENTATION All code and data are freely available from https://github.com/AliJam82/MDIPA. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Anthony Kusalik
- Division of Biomedical Engineering.,Department of Computer Science
| | - Fang-Xiang Wu
- Division of Biomedical Engineering.,Department of Computer Science.,Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
23
|
Luo P, Chen B, Liao B, Wu F. Predicting disease‐associated genes: Computational methods, databases, and evaluations. WIRES DATA MINING AND KNOWLEDGE DISCOVERY 2021; 11. [DOI: 10.1002/widm.1383] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/28/2019] [Accepted: 06/13/2020] [Indexed: 09/09/2024]
Abstract
AbstractComplex diseases are associated with a set of genes (called disease genes), the identification of which can help scientists uncover the mechanisms of diseases and develop new drugs and treatment strategies. Due to the huge cost and time of experimental identification techniques, many computational algorithms have been proposed to predict disease genes. Although several review publications in recent years have discussed many computational methods, some of them focus on cancer driver genes while others focus on biomolecular networks, which only cover a specific aspect of existing methods. In this review, we summarize existing methods and classify them into three categories based on their rationales. Then, the algorithms, biological data, and evaluation methods used in the computational prediction are discussed. Finally, we highlight the limitations of existing methods and point out some future directions for improving these algorithms. This review could help investigators understand the principles of existing methods, and thus develop new methods to advance the computational prediction of disease genes.This article is categorized under:Technologies > Machine LearningTechnologies > PredictionAlgorithmic Development > Biological Data Mining
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering University of Saskatchewan Saskatoon Canada
- Princess Margaret Cancer Centre University Health Network Toronto Canada
| | - Bolin Chen
- School of Computer Science and Technology Northwestern Polytechnical University China
| | - Bo Liao
- School of Mathematics and Statistics Hainan Normal University Haikou China
| | - Fang‐Xiang Wu
- Department of Mechanical Engineering and Department of Computer Science University of Saskatchewan Saskatoon Canada
| |
Collapse
|
24
|
Su XR, You ZH, Hu L, Huang YA, Wang Y, Yi HC. An Efficient Computational Model for Large-Scale Prediction of Protein-Protein Interactions Based on Accurate and Scalable Graph Embedding. Front Genet 2021; 12:635451. [PMID: 33719344 PMCID: PMC7953052 DOI: 10.3389/fgene.2021.635451] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2020] [Accepted: 01/25/2021] [Indexed: 11/23/2022] Open
Abstract
Protein–protein interaction (PPI) is the basis of the whole molecular mechanisms of living cells. Although traditional experiments are able to detect PPIs accurately, they often encounter high cost and require more time. As a result, computational methods have been used to predict PPIs to avoid these problems. Graph structure, as the important and pervasive data carriers, is considered as the most suitable structure to present biomedical entities and relationships. Although graph embedding is the most popular approach for graph representation learning, it usually suffers from high computational and space cost, especially in large-scale graphs. Therefore, developing a framework, which can accelerate graph embedding and improve the accuracy of embedding results, is important to large-scale PPIs prediction. In this paper, we propose a multi-level model LPPI to improve both the quality and speed of large-scale PPIs prediction. Firstly, protein basic information is collected as its attribute, including positional gene sets, motif gene sets, and immunological signatures. Secondly, we construct a weighted graph by using protein attributes to calculate node similarity. Then GraphZoom is used to accelerate the embedding process by reducing the size of the weighted graph. Next, graph embedding methods are used to learn graph topology features from the reconstructed graph. Finally, the linear Logistic Regression (LR) model is used to predict the probability of interactions of two proteins. LPPI achieved a high accuracy of 0.99997 and 0.9979 on the PPI network dataset and GraphSAGE-PPI dataset, respectively. Our further results show that the LPPI is promising for large-scale PPI prediction in both accuracy and efficiency, which is beneficial to other large-scale biomedical molecules interactions detection.
Collapse
Affiliation(s)
- Xiao-Rui Su
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Zhu-Hong You
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Lun Hu
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Yu-An Huang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China
| | - Yi Wang
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| | - Hai-Cheng Yi
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Ürümqi, China.,University of Chinese Academy of Sciences, Beijing, China.,Xinjiang Laboratory of Minority Speech and Language Information Processing, Ürümqi, China
| |
Collapse
|
25
|
Jin Z, Liu L, Gong D, Li L. Target Recognition of Industrial Robots Using Machine Vision in 5G Environment. Front Neurorobot 2021; 15:624466. [PMID: 33716703 PMCID: PMC7947910 DOI: 10.3389/fnbot.2021.624466] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2020] [Accepted: 02/03/2021] [Indexed: 11/24/2022] Open
Abstract
The purpose is to solve the problems of large positioning errors, low recognition speed, and low object recognition accuracy in industrial robot detection in a 5G environment. The convolutional neural network (CNN) model in the deep learning (DL) algorithm is adopted for image convolution, pooling, and target classification, optimizing the industrial robot visual recognition system in the improved method. With the bottled objects as the targets, the improved Fast-RCNN target detection model's algorithm is verified; with the small-size bottled objects in a complex environment as the targets, the improved VGG-16 classification network on the Hyper-Column scheme is verified. Finally, the algorithm constructed by the simulation analysis is compared with other advanced CNN algorithms. The results show that both the Fast RCN algorithm and the improved VGG-16 classification network based on the Hyper-Column scheme can position and recognize the targets with a recognition accuracy rate of 82.34%, significantly better than other advanced neural network algorithms. Therefore, the improved VGG-16 classification network based on the Hyper-Column scheme has good accuracy and effectiveness for target recognition and positioning, providing an experimental reference for industrial robots' application and development.
Collapse
Affiliation(s)
- Zhenkun Jin
- Department of Information Engineering, Wuhan Business University, Wuhan, China
| | - Lei Liu
- Graduate School, Gachon University, Seoul, South Korea
| | - Dafeng Gong
- Department of Information Technology, Wenzhou Polytechnic, Wenzhou, China
| | - Lei Li
- Huawei Technologies Co. Ltd., Shenzhen, China
| |
Collapse
|
26
|
Ding Y, Lei X, Liao B, Wu FX. Machine learning approaches for predicting biomolecule-disease associations. Brief Funct Genomics 2021; 20:273-287. [PMID: 33554238 DOI: 10.1093/bfgp/elab002] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/12/2022] Open
Abstract
Biomolecules, such as microRNAs, circRNAs, lncRNAs and genes, are functionally interdependent in human cells, and all play critical roles in diverse fundamental and vital biological processes. The dysregulations of such biomolecules can cause diseases. Identifying the associations between biomolecules and diseases can uncover the mechanisms of complex diseases, which is conducive to their diagnosis, treatment, prognosis and prevention. Due to the time consumption and cost of biologically experimental methods, many computational association prediction methods have been proposed in the past few years. In this study, we provide a comprehensive review of machine learning-based approaches for predicting disease-biomolecule associations with multi-view data sources. Firstly, we introduce some databases and general strategies for integrating multi-view data sources in the prediction models. Then we discuss several feature representation methods for machine learning-based prediction models. Thirdly, we comprehensively review machine learning-based prediction approaches in three categories: basic machine learning methods, matrix completion-based methods and deep learning-based methods, while discussing their advantages and disadvantages. Finally, we provide some perspectives for further improving biomolecule-disease prediction methods.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering at the University of Saskatchewan
| | - Xiujuan Lei
- School of Computer Science at Shaanxi Normal University
| | - Bo Liao
- School of Mathematics and Statistics at Hainan Normal University, Haikou, China
| | - Fang-Xiang Wu
- College of Engineering and the Department of Computer Science at University of Saskatchewan
| |
Collapse
|
27
|
Mishra R, Li B. The Application of Artificial Intelligence in the Genetic Study of Alzheimer's Disease. Aging Dis 2020; 11:1567-1584. [PMID: 33269107 PMCID: PMC7673858 DOI: 10.14336/ad.2020.0312] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2020] [Accepted: 03/12/2020] [Indexed: 12/13/2022] Open
Abstract
Alzheimer's disease (AD) is a neurodegenerative disease in which genetic factors contribute approximately 70% of etiological effects. Studies have found many significant genetic and environmental factors, but the pathogenesis of AD is still unclear. With the application of microarray and next-generation sequencing technologies, research using genetic data has shown explosive growth. In addition to conventional statistical methods for the processing of these data, artificial intelligence (AI) technology shows obvious advantages in analyzing such complex projects. This article first briefly reviews the application of AI technology in medicine and the current status of genetic research in AD. Then, a comprehensive review is focused on the application of AI in the genetic research of AD, including the diagnosis and prognosis of AD based on genetic data, the analysis of genetic variation, gene expression profile, gene-gene interaction in AD, and genetic analysis of AD based on a knowledge base. Although many studies have yielded some meaningful results, they are still in a preliminary stage. The main shortcomings include the limitations of the databases, failing to take advantage of AI to conduct a systematic biology analysis of multilevel databases, and lack of a theoretical framework for the analysis results. Finally, we outlook the direction of future development. It is crucial to develop high quality, comprehensive, large sample size, data sharing resources; a multi-level system biology AI analysis strategy is one of the development directions, and computational creativity may play a role in theory model building, verification, and designing new intervention protocols for AD.
Collapse
Affiliation(s)
- Rohan Mishra
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
| | - Bin Li
- Washington Institute for Health Sciences, Arlington, VA 22203, USA
- Georgetown University Medical Center, Washington D.C. 20057, USA
| |
Collapse
|
28
|
Xiang J, Zhang NR, Zhang JS, Lv XY, Li M. PrGeFNE: Predicting disease-related genes by fast network embedding. Methods 2020; 192:3-12. [PMID: 32610158 DOI: 10.1016/j.ymeth.2020.06.015] [Citation(s) in RCA: 20] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/30/2020] [Revised: 06/13/2020] [Accepted: 06/22/2020] [Indexed: 12/14/2022] Open
Abstract
Identifying disease-related genes is of importance for understanding of molecule mechanisms of diseases, as well as diagnosis and treatment of diseases. Many computational methods have been proposed to predict disease-related genes, but how to make full use of multi-source biological data to enhance the ability of disease-gene prediction is still challenging. In this paper, we proposed a novel method for predicting disease-related genes by using fast network embedding (PrGeFNE), which can integrate multiple types of associations related to diseases and genes. Specifically, we first constructed a heterogeneous network by using phenotype-disease, disease-gene, protein-protein and gene-GO associations; and low-dimensional representation of nodes is extracted from the network by using a fast network embedding algorithm. Then, a dual-layer heterogeneous network was reconstructed by using the low-dimensional representation, and a network propagation was applied to the dual-layer heterogeneous network to predict disease-related genes. Through cross-validation and newly added-association validation, we displayed the important roles of different types of association data in enhancing the ability of disease-gene prediction, and confirmed the excellent performance of PrGeFNE by comparing to state-of-the-art algorithms. Furthermore, we developed a web tool that can facilitate researchers to search for candidate genes of different diseases predicted by PrGeFNE, along with the enrichment analysis of GO and pathway on candidate gene set. This may be useful for investigation of diseases' molecular mechanisms as well as their experimental validations. The web tool is available at http://bioinformatics.csu.edu.cn/prgefne/.
Collapse
Affiliation(s)
- Ju Xiang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; Neuroscience Research Center & Department of Basic Medical Sciences, Changsha Medical University, Changsha, 410219 Hunan, China
| | - Ning-Rui Zhang
- College of Information Science and Engineering, Xinjiang University, Urumqi 830046, China
| | - Jia-Shuai Zhang
- School of Computer Science and Engineering, Central South University, Changsha 410083, China
| | - Xiao-Yi Lv
- School of Software, Xinjiang University, Urumqi 830046, China
| | - Min Li
- School of Computer Science and Engineering, Central South University, Changsha 410083, China.
| |
Collapse
|
29
|
Ding Y, Chen B, Lei X, Liao B, Wu FX. Predicting novel CircRNA-disease associations based on random walk and logistic regression model. Comput Biol Chem 2020; 87:107287. [PMID: 32446243 DOI: 10.1016/j.compbiolchem.2020.107287] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/04/2020] [Accepted: 05/09/2020] [Indexed: 12/24/2022]
Abstract
Circular RNAs (circRNAs), a large group of small endogenous noncoding RNA molecules, have been proved to modulate protein-coding genes in the human genome. In recent years, many experimental studies have demonstrated that circRNAs are dysregulated in a number of diseases, and they can serve as biomarkers for disease diagnosis and prognosis. However, it is expensive and time-consuming to identify circRNA-disease associations by biological experiments and few computational models have been proposed for novel circRNA-disease association prediction. In this study, we develop a computational model based on the random walk and the logistic regression (RWLR) to predict circRNA-disease associations. Firstly, a circRNA-circRNA similarity network is constructed by calculating their functional similarity of circRNA based on circRNA-related gene ontology. Then, a random walk with restart is implemented on the circRNA similarity network, and the features of each pair of circRNA-disease are extracted based on the results of the random walk and the circRNA-disease association matrix. Finally, a logistic regression model is used to predict novel circRNA-disease associations. Leave one out validation (LOOCV), five-fold cross validation (5CV) and ten-fold cross validation (10CV) are adopted to evaluate the prediction performance of RWLR, by comparing with the latest two methods PWCDA and DWNN-RLS. The experiment results show that our RWLR has higher AUC values of LOOCV, 5CV and 10CV than the other two latest methods, which demonstrates that RWLR has a better performance than other computational methods. What's more, case studies also illustrate the reliability and effectiveness of RWLR for circRNA-disease association prediction.
Collapse
Affiliation(s)
- Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 1L5, Canada
| | - Bolin Chen
- School of Computer Science and Technology, Northwestern Polytechnical University, Xi'an 710072, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an 710119, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou 571158, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK S7N 1L5, Canada; Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada; Department of Computer Science, University of Saskatchewan, Saskatoon, SK S7N 5A9, Canada.
| |
Collapse
|
30
|
Xie W, Luo J, Pan C, Liu Y. SG-LSTM-FRAME: a computational frame using sequence and geometrical information via LSTM to predict miRNA-gene associations. Brief Bioinform 2020; 22:2032-2042. [PMID: 32181478 DOI: 10.1093/bib/bbaa022] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/29/2019] [Revised: 02/10/2020] [Accepted: 02/11/2020] [Indexed: 12/19/2022] Open
Abstract
MOTIVATION MircroRNAs (miRNAs) regulate target genes and are responsible for lethal diseases such as cancers. Accurately recognizing and identifying miRNA and gene pairs could be helpful in deciphering the mechanism by which miRNA affects and regulates the development of cancers. Embedding methods and deep learning methods have shown their excellent performance in traditional classification tasks in many scenarios. But not so many attempts have adapted and merged these two methods into miRNA-gene relationship prediction. Hence, we proposed a novel computational framework. We first generated representational features for miRNAs and genes using both sequence and geometrical information and then leveraged a deep learning method for the associations' prediction. RESULTS We used long short-term memory (LSTM) to predict potential relationships and proved that our method outperformed other state-of-the-art methods. Results showed that our framework SG-LSTM got an area under curve of 0.94 and was superior to other methods. In the case study, we predicted the top 10 miRNA-gene relationships and recommended the top 10 potential genes for hsa-miR-335-5p for SG-LSTM-core. We also tested our model using a larger dataset, from which 14 668 698 miRNA-gene pairs were predicted. The top 10 unknown pairs were also listed. AVAILABILITY Our work can be download in https://github.com/Xshelton/SG_LSTM. CONTACT luojiawei@hnu.edu.cn. SUPPLEMENTARY INFORMATION Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Weidun Xie
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Chu Pan
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Ying Liu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, Hunan, China
| |
Collapse
|
31
|
Abstract
BACKGROUND Disease gene prediction is a critical and challenging task. Many computational methods have been developed to predict disease genes, which can reduce the money and time used in the experimental validation. Since proteins (products of genes) usually work together to achieve a specific function, biomolecular networks, such as the protein-protein interaction (PPI) network and gene co-expression networks, are widely used to predict disease genes by analyzing the relationships between known disease genes and other genes in the networks. However, existing methods commonly use a universal static PPI network, which ignore the fact that PPIs are dynamic, and PPIs in various patients should also be different. RESULTS To address these issues, we develop an ensemble algorithm to predict disease genes from clinical sample-based networks (EdgCSN). The algorithm first constructs single sample-based networks for each case sample of the disease under study. Then, these single sample-based networks are merged to several fused networks based on the clustering results of the samples. After that, logistic models are trained with centrality features extracted from the fused networks, and an ensemble strategy is used to predict the finial probability of each gene being disease-associated. EdgCSN is evaluated on breast cancer (BC), thyroid cancer (TC) and Alzheimer's disease (AD) and obtains AUC values of 0.970, 0.971 and 0.966, respectively, which are much better than the competing algorithms. Subsequent de novo validations also demonstrate the ability of EdgCSN in predicting new disease genes. CONCLUSIONS In this study, we propose EdgCSN, which is an ensemble learning algorithm for predicting disease genes with models trained by centrality features extracted from clinical sample-based networks. Results of the leave-one-out cross validation show that our EdgCSN performs much better than the competing algorithms in predicting BC-associated, TC-associated and AD-associated genes. de novo validations also show that EdgCSN is valuable for identifying new disease genes.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9 Canada
| | - Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing, 101149 China
| | - Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi’an, 710072 China
| | - Qianghua Xiao
- School of Mathematics and Physics, University of South China, HengYang, 421001 China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9 Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, S7N 5C9 Canada
- School of Mathematics and Statistics, Hainan Normal University, Haikou, 571158 China
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, S7N 5A9 Canada
| |
Collapse
|
32
|
Li C, Liu H, Hu Q, Que J, Yao J. A Novel Computational Model for Predicting microRNA-Disease Associations Based on Heterogeneous Graph Convolutional Networks. Cells 2019; 8:cells8090977. [PMID: 31455028 PMCID: PMC6769654 DOI: 10.3390/cells8090977] [Citation(s) in RCA: 31] [Impact Index Per Article: 5.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2019] [Revised: 08/22/2019] [Accepted: 08/23/2019] [Indexed: 01/13/2023] Open
Abstract
Identifying the interactions between disease and microRNA (miRNA) can accelerate drugs development, individualized diagnosis, and treatment for various human diseases. However, experimental methods are time-consuming and costly. So computational approaches to predict latent miRNA-disease interactions are eliciting increased attention. But most previous studies have mainly focused on designing complicated similarity-based methods to predict latent interactions between miRNAs and diseases. In this study, we propose a novel computational model, termed heterogeneous graph convolutional network for miRNA-disease associations (HGCNMDA), which is based on known human protein-protein interaction (PPI) and integrates four biological networks: miRNA-disease, miRNA-gene, disease-gene, and PPI network. HGCNMDA achieved reliable performance using leave-one-out cross-validation (LOOCV). HGCNMDA is then compared to three state-of-the-art algorithms based on five-fold cross-validation. HGCNMDA achieves an AUC of 0.9626 and an average precision of 0.9660, respectively, which is ahead of other competitive algorithms. We further analyze the top-10 unknown interactions between miRNA and disease. In summary, HGCNMDA is a useful computational model for predicting miRNA-disease interactions.
Collapse
Affiliation(s)
- Chunyan Li
- School of Informatics, Xiamen University, Xiamen 361005, China
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| | - Hongju Liu
- College of Information Technology and Computer Science, University of the Cordilleras, Baguio 2600, Philippines
| | - Qian Hu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Jinlong Que
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Junfeng Yao
- School of Informatics, Xiamen University, Xiamen 361005, China.
| |
Collapse
|
33
|
Luo P, Xiao Q, Wei PJ, Liao B, Wu FX. Identifying Disease-Gene Associations With Graph-Regularized Manifold Learning. Front Genet 2019; 10:270. [PMID: 31001321 PMCID: PMC6454152 DOI: 10.3389/fgene.2019.00270] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2018] [Accepted: 03/12/2019] [Indexed: 12/18/2022] Open
Abstract
Complex diseases are known to be associated with disease genes. Uncovering disease-gene associations is critical for diagnosis, treatment, and prevention of diseases. Computational algorithms which effectively predict candidate disease-gene associations prior to experimental proof can greatly reduce the associated cost and time. Most existing methods are disease-specific which can only predict genes associated with a specific disease at a time. Similarities among diseases are not used during the prediction. Meanwhile, most methods predict new disease genes based on known associations, making them unable to predict disease genes for diseases without known associated genes.In this study, a manifold learning-based method is proposed for predicting disease-gene associations by assuming that the geodesic distance between any disease and its associated genes should be shorter than that of other non-associated disease-gene pairs. The model maps the diseases and genes into a lower dimensional manifold based on the known disease-gene associations, disease similarity and gene similarity to predict new associations in terms of the geodesic distance between disease-gene pairs. In the 3-fold cross-validation experiments, our method achieves scores of 0.882 and 0.854 in terms of the area under of the receiver operating characteristic (ROC) curve (AUC) for diseases with more than one known associated genes and diseases with only one known associated gene, respectively. Further de novo studies on Lung Cancer and Bladder Cancer also show that our model is capable of identifying new disease-gene associations.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of SaskatchewanSaskatoon, SKCanada
| | - Qianghua Xiao
- School of Mathematics and Physics, University of South China, Hengyang, China
| | - Pi-Jing Wei
- Division of Biomedical Engineering, University of SaskatchewanSaskatoon, SKCanada
- College of Computer Science and Technology, Anhui University, Hefei, China
| | - Bo Liao
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of SaskatchewanSaskatoon, SKCanada
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
34
|
Luo P, Li Y, Tian LP, Wu FX. Enhancing the prediction of disease–gene associations with multimodal deep learning. Bioinformatics 2019; 35:3735-3742. [DOI: 10.1093/bioinformatics/btz155] [Citation(s) in RCA: 33] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/29/2018] [Revised: 02/11/2019] [Accepted: 02/27/2019] [Indexed: 12/20/2022] Open
Abstract
Abstract
Motivation
Computationally predicting disease genes helps scientists optimize the in-depth experimental validation and accelerates the identification of real disease-associated genes. Modern high-throughput technologies have generated a vast amount of omics data, and integrating them is expected to improve the accuracy of computational prediction. As an integrative model, multimodal deep belief net (DBN) can capture cross-modality features from heterogeneous datasets to model a complex system. Studies have shown its power in image classification and tumor subtype prediction. However, multimodal DBN has not been used in predicting disease–gene associations.
Results
In this study, we propose a method to predict disease–gene associations by multimodal DBN (dgMDL). Specifically, latent representations of protein-protein interaction networks and gene ontology terms are first learned by two DBNs independently. Then, a joint DBN is used to learn cross-modality representations from the two sub-models by taking the concatenation of their obtained latent representations as the multimodal input. Finally, disease–gene associations are predicted with the learned cross-modality representations. The proposed method is compared with two state-of-the-art algorithms in terms of 5-fold cross-validation on a set of curated disease–gene associations. dgMDL achieves an AUC of 0.969 which is superior to the competing algorithms. Further analysis of the top-10 unknown disease–gene pairs also demonstrates the ability of dgMDL in predicting new disease–gene associations.
Availability and implementation
Prediction results and a reference implementation of dgMDL in Python is available on https://github.com/luoping1004/dgMDL.
Supplementary information
Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
| | - Yuanyuan Li
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
- School of Mathematics and Physics, Wuhan Institute of Technology, Wuhan, China
| | - Li-Ping Tian
- School of Information, Beijing Wuzi University, Beijing, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, Canada
- Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, Canada
- Department of Computer Science, University of Saskatchewan, Saskatoon, Canada
| |
Collapse
|
35
|
Luo P, Ding Y, Lei X, Wu FX. deepDriver: Predicting Cancer Driver Genes Based on Somatic Mutations Using Deep Convolutional Neural Networks. Front Genet 2019; 10:13. [PMID: 30761181 PMCID: PMC6361806 DOI: 10.3389/fgene.2019.00013] [Citation(s) in RCA: 60] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2018] [Accepted: 01/11/2019] [Indexed: 12/16/2022] Open
Abstract
With the advances in high-throughput technologies, millions of somatic mutations have been reported in the past decade. Identifying driver genes with oncogenic mutations from these data is a critical and challenging problem. Many computational methods have been proposed to predict driver genes. Among them, machine learning-based methods usually train a classifier with representations that concatenate various types of features extracted from different kinds of data. Although successful, simply concatenating different types of features may not be the best way to fuse these data. We notice that a few types of data characterize the similarities of genes, to better integrate them with other data and improve the accuracy of driver gene prediction, in this study, a deep learning-based method (deepDriver) is proposed by performing convolution on mutation-based features of genes and their neighbors in the similarity networks. The method allows the convolutional neural network to learn information within mutation data and similarity networks simultaneously, which enhances the prediction of driver genes. deepDriver achieves AUC scores of 0.984 and 0.976 on breast cancer and colorectal cancer, which are superior to the competing algorithms. Further evaluations of the top 10 predictions also demonstrate that deepDriver is valuable for predicting new driver genes.
Collapse
Affiliation(s)
- Ping Luo
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Yulian Ding
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xian, China
| | - Fang-Xiang Wu
- Division of Biomedical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Department of Mechanical Engineering, University of Saskatchewan, Saskatoon, SK, Canada.,Department of Computer Science, University of Saskatchewan, Saskatoon, SK, Canada
| |
Collapse
|
36
|
Luo P, Tian LP, Chen B, Xiao Q, Wu FX. Predicting Gene-Disease Associations with Manifold Learning. BIOINFORMATICS RESEARCH AND APPLICATIONS 2018. [DOI: 10.1007/978-3-319-94968-0_26] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|