1
|
Fu L, Yao Z, Zhou Y, Peng Q, Lyu H. ACLNDA: an asymmetric graph contrastive learning framework for predicting noncoding RNA-disease associations in heterogeneous graphs. Brief Bioinform 2024; 25:bbae533. [PMID: 39441244 PMCID: PMC11497849 DOI: 10.1093/bib/bbae533] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/29/2024] [Revised: 08/27/2024] [Accepted: 10/08/2024] [Indexed: 10/25/2024] Open
Abstract
Noncoding RNAs (ncRNAs), including long noncoding RNAs (lncRNAs) and microRNAs (miRNAs), play crucial roles in gene expression regulation and are significant in disease associations and medical research. Accurate ncRNA-disease association prediction is essential for understanding disease mechanisms and developing treatments. Existing methods often focus on single tasks like lncRNA-disease associations (LDAs), miRNA-disease associations (MDAs), or lncRNA-miRNA interactions (LMIs), and fail to exploit heterogeneous graph characteristics. We propose ACLNDA, an asymmetric graph contrastive learning framework for analyzing heterophilic ncRNA-disease associations. It constructs inter-layer adjacency matrices from the original lncRNA, miRNA, and disease associations, and uses a Top-K intra-layer similarity edges construction approach to form a triple-layer heterogeneous graph. Unlike traditional works, to account for both node attribute features (ncRNA/disease) and node preference features (association), ACLNDA employs an asymmetric yet simple graph contrastive learning framework to maximize one-hop neighborhood context and two-hop similarity, extracting ncRNA-disease features without relying on graph augmentations or homophily assumptions, reducing computational cost while preserving data integrity. Our framework is capable of being applied to a universal range of potential LDA, MDA, and LMI association predictions. Further experimental results demonstrate superior performance to other existing state-of-the-art baseline methods, which shows its potential for providing insights into disease diagnosis and therapeutic target identification. The source code and data of ACLNDA is publicly available at https://github.com/AI4Bread/ACLNDA.
Collapse
Affiliation(s)
- Laiyi Fu
- School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
- Research Institute, Xi’an Jiaotong University, Zhejiang, Hangzhou, Zhejiang 311200, China
- Sichuan Digital Economy Industry Development Research Institute, Chengdu, Sichuan 610036, China
| | - ZhiYuan Yao
- School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| | - Yangyi Zhou
- School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| | - Qinke Peng
- School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| | - Hongqiang Lyu
- School of Automation Science and Engineering, Xi’an Jiaotong University, Xi’an, Shannxi 710049, China
| |
Collapse
|
2
|
Diao B, Luo J, Guo Y. A comprehensive survey on deep learning-based identification and predicting the interaction mechanism of long non-coding RNAs. Brief Funct Genomics 2024; 23:314-324. [PMID: 38576205 DOI: 10.1093/bfgp/elae010] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2023] [Revised: 02/25/2024] [Accepted: 03/14/2024] [Indexed: 04/06/2024] Open
Abstract
Long noncoding RNAs (lncRNAs) have been discovered to be extensively involved in eukaryotic epigenetic, transcriptional, and post-transcriptional regulatory processes with the advancements in sequencing technology and genomics research. Therefore, they play crucial roles in the body's normal physiology and various disease outcomes. Presently, numerous unknown lncRNA sequencing data require exploration. Establishing deep learning-based prediction models for lncRNAs provides valuable insights for researchers, substantially reducing time and costs associated with trial and error and facilitating the disease-relevant lncRNA identification for prognosis analysis and targeted drug development as the era of artificial intelligence progresses. However, most lncRNA-related researchers lack awareness of the latest advancements in deep learning models and model selection and application in functional research on lncRNAs. Thus, we elucidate the concept of deep learning models, explore several prevalent deep learning algorithms and their data preferences, conduct a comprehensive review of recent literature studies with exemplary predictive performance over the past 5 years in conjunction with diverse prediction functions, critically analyze and discuss the merits and limitations of current deep learning models and solutions, while also proposing prospects based on cutting-edge advancements in lncRNA research.
Collapse
Affiliation(s)
- Biyu Diao
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Jin Luo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| | - Yu Guo
- Department of Breast Surgery, The First Affiliated Hospital of Ningbo University, No. 59, Liuting Street, Haishu District, Ningbo 315000, China
| |
Collapse
|
3
|
Zhou L, Peng X, Zeng L, Peng L. Finding potential lncRNA-disease associations using a boosting-based ensemble learning model. Front Genet 2024; 15:1356205. [PMID: 38495672 PMCID: PMC10940470 DOI: 10.3389/fgene.2024.1356205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/18/2023] [Accepted: 02/01/2024] [Indexed: 03/19/2024] Open
Abstract
Introduction: Long non-coding RNAs (lncRNAs) have been in the clinical use as potential prognostic biomarkers of various types of cancer. Identifying associations between lncRNAs and diseases helps capture the potential biomarkers and design efficient therapeutic options for diseases. Wet experiments for identifying these associations are costly and laborious. Methods: We developed LDA-SABC, a novel boosting-based framework for lncRNA-disease association (LDA) prediction. LDA-SABC extracts LDA features based on singular value decomposition (SVD) and classifies lncRNA-disease pairs (LDPs) by incorporating LightGBM and AdaBoost into the convolutional neural network. Results: The LDA-SABC performance was evaluated under five-fold cross validations (CVs) on lncRNAs, diseases, and LDPs. It obviously outperformed four other classical LDA inference methods (SDLDA, LDNFSGB, LDASR, and IPCAF) through precision, recall, accuracy, F1 score, AUC, and AUPR. Based on the accurate LDA prediction performance of LDA-SABC, we used it to find potential lncRNA biomarkers for lung cancer. The results elucidated that 7SK and HULC could have a relationship with non-small-cell lung cancer (NSCLC) and lung adenocarcinoma (LUAD), respectively. Conclusion: We hope that our proposed LDA-SABC method can help improve the LDA identification.
Collapse
Affiliation(s)
- Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Xinhuai Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| | - Lijun Zeng
- School of Computer Science, Hunan Institute of Technology, Hengyang, China
| | - Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, Hunan, China
| |
Collapse
|
4
|
Ding X, Liang W, Xia H, Liu Y, Liu S, Xia X, Zhu X, Pei Y, Zhang D. Analysis of Immune and Prognostic-Related lncRNA PRKCQ-AS1 for Predicting Prognosis and Regulating Effect in Sepsis. J Inflamm Res 2024; 17:279-299. [PMID: 38229689 PMCID: PMC10790647 DOI: 10.2147/jir.s433057] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2023] [Accepted: 12/07/2023] [Indexed: 01/18/2024] Open
Abstract
Background Sepsis was a high mortality and great harm systemic inflammatory response syndrome caused by infection. lncRNAs were potential prognostic marker and therapeutic target. Therefore, we expect to screen and analyze lncRNAs with potential prognostic markers in sepsis. Methods Transcriptome sequencing and limma was used to screen dysregulated RNAs. Key RNAs were screened by correlation analysis, lncRNA-mRNA co-expression and weighted gene co-expression network analysis. Immune infiltration, gene set enrichment analysis and gene set variation analysis were used to analyze the immune correlation. Kaplan-Meier curve, receiver operator characteristic curve, Cox regression analysis and nomogram were used to analyze the correlation between key RNAs and prognosis. Sepsis model was established by lipopolysaccharide-induced HUVECs injury, and then cell viability and migration ability were detected by cell counting kit-8 and wound healing assay. The levels of apoptosis-related proteins and inflammatory cytokines were detected by RT-qPCR and Western blot. Reactive Oxygen Species and superoxide dismutase were detected by commercial kit. Results Fourteen key differentially expressed lncRNAs and 663 key differentially expressed genes were obtained. And these lncRNAs were closely related to immune cells, especially T cell activation, immune response and inflammation. Subsequently, Subsequently, lncRNA PRKCQ-AS1 was identified as the regulator for further investigation in sepsis. RT-qPCR results showed that PRKCQ-AS1 expression was up-regulated in clinical samples and sepsis model cells, which was an independent prognostic factor in sepsis patients. Immune correlation analysis showed that PRKCQ-AS1 was involved in the immune response and inflammatory process of sepsis. Cell function tests confirmed that PRKCQ-AS1 could inhibit sepsis model cells viability and promote cell apoptosis, inflammatory damage and oxidative stress. Conclusion We constructed immune-related lncRNA-mRNA regulatory networks in the progression of sepsis and confirmed that PRKCQ-AS1 is an important prognostic factor affecting the progression of sepsis and is involved in immune response.
Collapse
Affiliation(s)
- Xian Ding
- Department of Emergency, Third Affiliated Hospital of Naval Medical University, Shanghai, People’s Republic of China
| | - Wenqi Liang
- Department of Emergency, Shanghai Changhai Hospital, Naval Medical University, Shanghai, People’s Republic of China
| | - Hongjuan Xia
- Department of Emergency, Third Affiliated Hospital of Naval Medical University, Shanghai, People’s Republic of China
| | - Yuee Liu
- Department of Emergency, Shanghai Changhai Hospital, Naval Medical University, Shanghai, People’s Republic of China
| | - Shuxiong Liu
- Department of Emergency, Third Affiliated Hospital of Naval Medical University, Shanghai, People’s Republic of China
| | - Xinyu Xia
- Department of Emergency, Third Affiliated Hospital of Naval Medical University, Shanghai, People’s Republic of China
| | - Xiaoli Zhu
- Department of Emergency, Third Affiliated Hospital of Naval Medical University, Shanghai, People’s Republic of China
| | - Yongyan Pei
- School of Chemistry and Chemical Engineering, Guangdong Pharmaceutical University, Zhongshan, People’s Republic of China
| | - Dewen Zhang
- Longhua Clinical Medical College, Shanghai University of Traditional Chinese Medicine, Shanghai, People’s Republic of China
| |
Collapse
|
5
|
Yao D, Zhang B, Li X, Zhan X, Zhan X, Zhang B. Applying negative sample denoising and multi-view feature for lncRNA-disease association prediction. Front Genet 2024; 14:1332273. [PMID: 38264213 PMCID: PMC10803626 DOI: 10.3389/fgene.2023.1332273] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/02/2023] [Accepted: 12/22/2023] [Indexed: 01/25/2024] Open
Abstract
Increasing evidence indicates that mutations and dysregulation of long non-coding RNA (lncRNA) play a crucial role in the pathogenesis and prognosis of complex human diseases. Computational methods for predicting the association between lncRNAs and diseases have gained increasing attention. However, these methods face two key challenges: obtaining reliable negative samples and incorporating lncRNA-disease association (LDA) information from multiple perspectives. This paper proposes a method called NDMLDA, which combines multi-view feature extraction, unsupervised negative sample denoising, and stacking ensemble classifier. Firstly, an unsupervised method (K-means) is used to design a negative sample denoising module to alleviate the imbalance of samples and the impact of potential noise in the negative samples on model performance. Secondly, graph attention networks are employed to extract multi-view features of both lncRNAs and diseases, thereby enhancing the learning of association information between them. Finally, lncRNA-disease association prediction is implemented through a stacking ensemble classifier. Existing research datasets are integrated to evaluate performance, and 5-fold cross-validation is conducted on this dataset. Experimental results demonstrate that NDMLDA achieves an AUC of 0.9907and an AUPR of 0.9927, with a 5-fold cross-validation variance of less than 0.1%. These results outperform the baseline methods. Additionally, case studies further illustrate the model's potential in cancer diagnosis and precision medicine implementation.
Collapse
Affiliation(s)
- Dengju Yao
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Bo Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiangkui Li
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| | - Xiaojuan Zhan
- College of Computer Science and Technology, Heilongjiang Institute of Technology, Harbin, China
| | - Xiaorong Zhan
- Department of Endocrinology and Metabolism, Hospital of South University of Science and Technology, Shenzhen, China
| | - Binbin Zhang
- School of Computer Science and Technology, Harbin University of Science and Technology, Harbin, China
| |
Collapse
|
6
|
Wei J, Lu L, Shen T. Predicting drug-protein interactions by preserving the graph information of multi source data. BMC Bioinformatics 2024; 25:10. [PMID: 38177981 PMCID: PMC10768380 DOI: 10.1186/s12859-023-05620-6] [Citation(s) in RCA: 1] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/13/2023] [Accepted: 12/15/2023] [Indexed: 01/06/2024] Open
Abstract
Examining potential drug-target interactions (DTIs) is a pivotal component of drug discovery and repurposing. Recently, there has been a significant rise in the use of computational techniques to predict DTIs. Nevertheless, previous investigations have predominantly concentrated on assessing either the connections between nodes or the consistency of the network's topological structure in isolation. Such one-sided approaches could severely hinder the accuracy of DTI predictions. In this study, we propose a novel method called TTGCN, which combines heterogeneous graph convolutional neural networks (GCN) and graph attention networks (GAT) to address the task of DTI prediction. TTGCN employs a two-tiered feature learning strategy, utilizing GAT and residual GCN (R-GCN) to extract drug and target embeddings from the diverse network, respectively. These drug and target embeddings are then fused through a mean-pooling layer. Finally, we employ an inductive matrix completion technique to forecast DTIs while preserving the network's node connectivity and topological structure. Our approach demonstrates superior performance in terms of area under the curve and area under the precision-recall curve in experimental comparisons, highlighting its significant advantages in predicting DTIs. Furthermore, case studies provide additional evidence of its ability to identify potential DTIs.
Collapse
Affiliation(s)
- Jiahao Wei
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, China
| | - Linzhang Lu
- School of Mathematical Sciences, Guizhou Normal University, Guiyang, 550025, China.
- School of Mathematical Sciences, Xiamen University, Xiamen, 361005, China.
| | - Tie Shen
- Key Laboratory of Information and Computing Science Guizhou Province, Guizhou Normal University, Guizhou, 550001, China.
| |
Collapse
|
7
|
Lu Z, Zhong H, Tang L, Luo J, Zhou W, Liu L. Predicting lncRNA-disease associations based on heterogeneous graph convolutional generative adversarial network. PLoS Comput Biol 2023; 19:e1011634. [PMID: 38019786 PMCID: PMC10686445 DOI: 10.1371/journal.pcbi.1011634] [Citation(s) in RCA: 7] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2023] [Accepted: 10/25/2023] [Indexed: 12/01/2023] Open
Abstract
There is a growing body of evidence indicating the crucial roles that long non-coding RNAs (lncRNAs) play in the development and progression of various diseases, including cancers, cardiovascular diseases, and neurological disorders. However, accurately predicting potential lncRNA-disease associations remains a challenge, as existing methods have limitations in extracting heterogeneous association information and handling sparse and unbalanced data. To address these issues, we propose a novel computational method, called HGC-GAN, which combines heterogeneous graph convolutional neural networks (GCN) and generative adversarial networks (GAN) to predict potential lncRNA-disease associations. Specifically, we construct a lncRNA-miRNA-disease heterogeneous network by integrating multiple association data and sequence information. The GCN-based generator is then employed to aggregate neighbor information of nodes and obtain node embeddings, which are used to predict lncRNA-disease associations. Meanwhile, the GAN-based discriminator is trained to distinguish between real and fake lncRNA-disease associations generated by the generator, enabling the generator to improve its ability to generate accurate lncRNA-disease associations gradually. Our experimental results demonstrate that HGC-GAN performs better in predicting potential lncRNA-disease associations, with AUC and AUPR values of 0.9591 and 0.9606, respectively, under 10-fold cross-validation. Moreover, our case study further confirms the effectiveness of HGC-GAN in predicting potential lncRNA-disease associations, even for novel lncRNAs without any known lncRNA-disease associations. Overall, our proposed method HGC-GAN provides a promising approach to predict potential lncRNA-disease associations and may have important implications for disease diagnosis, treatment, and drug development.
Collapse
Affiliation(s)
- Zhonghao Lu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Hua Zhong
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Lin Tang
- Key Laboratory of Educational Information for Nationalities Ministry of Education, Yunnan Normal University, Yunnan, People’s Republic of China
| | - Jing Luo
- State Key Laboratory for Conservation and Utilization of Bio-resource in Yunnan, School of Life Sciences and School of Ecology and Environment, Yunnan University, Kunming, People’s Republic of China
| | - Wei Zhou
- School of Software, Yunnan University, Kunming, People’s Republic of China
| | - Lin Liu
- School of Information, Yunnan Normal University, Yunnan, People’s Republic of China
| |
Collapse
|
8
|
Zhang J, Lang M, Zhou Y, Zhang Y. Predicting RNA structures and functions by artificial intelligence. Trends Genet 2023; 40:S0168-9525(23)00229-9. [PMID: 39492264 DOI: 10.1016/j.tig.2023.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/22/2023] [Accepted: 10/03/2023] [Indexed: 11/05/2024]
Abstract
RNA functions by interacting with its intended targets structurally. However, due to the dynamic nature of RNA molecules, RNA structures are difficult to determine experimentally or predict computationally. Artificial intelligence (AI) has revolutionized many biomedical fields and has been progressively utilized to deduce RNA structures, target binding, and associated functionality. Integrating structural and target binding information could also help improve the robustness of AI-based RNA function prediction and RNA design. Given the rapid development of deep learning (DL) algorithms, AI will provide an unprecedented opportunity to elucidate the sequence-structure-function relation of RNAs.
Collapse
Affiliation(s)
- Jun Zhang
- National Engineering Laboratory for Big Data System Computing Technology, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, 518060, China
| | - Mei Lang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, Guangdong, 518106, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, Guangdong, 518106, China.
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
9
|
Li F, Guo X, Bi Y, Jia R, Pitt ME, Pan S, Li S, Gasser RB, Coin LJ, Song J. Digerati - A multipath parallel hybrid deep learning framework for the identification of mycobacterial PE/PPE proteins. Comput Biol Med 2023; 163:107155. [PMID: 37356289 DOI: 10.1016/j.compbiomed.2023.107155] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/21/2023] [Revised: 06/05/2023] [Accepted: 06/07/2023] [Indexed: 06/27/2023]
Abstract
The genome of Mycobacterium tuberculosis contains a relatively high percentage (10%) of genes that are poorly characterised because of their highly repetitive nature and high GC content. Some of these genes encode proteins of the PE/PPE family, which are thought to be involved in host-pathogen interactions, virulence, and disease pathogenicity. Members of this family are genetically divergent and challenging to both identify and classify using conventional computational tools. Thus, advanced in silico methods are needed to identify proteins of this family for subsequent functional annotation efficiently. In this study, we developed the first deep learning-based approach, termed Digerati, for the rapid and accurate identification of PE and PPE family proteins. Digerati was built upon a multipath parallel hybrid deep learning framework, which equips multi-layer convolutional neural networks with bidirectional, long short-term memory, equipped with a self-attention module to effectively learn the higher-order feature representations of PE/PPE proteins. Empirical studies demonstrated that Digerati achieved a significantly better performance (∼18-20%) than alignment-based approaches, including BLASTP, PHMMER, and HHsuite, in both prediction accuracy and speed. Digerati is anticipated to facilitate community-wide efforts to conduct high-throughput identification and analysis of PE/PPE family members. The webserver and source codes of Digerati are publicly available at http://web.unimelb-bioinfortools.cloud.edu.au/Digerati/.
Collapse
Affiliation(s)
- Fuyi Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China; Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Xudong Guo
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Yue Bi
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia
| | - Runchang Jia
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Miranda E Pitt
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia
| | - Shirui Pan
- School of Information and Communication Technology, Griffith University, QLD, 4222, Australia
| | - Shuqin Li
- College of Information Engineering, Northwest A&F University, Yangling, 712100, China
| | - Robin B Gasser
- Melbourne Veterinary School, Faculty of Science, The University of Melbourne, VIC, 3010, Australia
| | - Lachlan Jm Coin
- Department of Microbiology and Immunology, The Peter Doherty Institute for Infection and Immunity, The University of Melbourne, 792 Elizabeth Street, Melbourne, Victoria, 3000, Australia.
| | - Jiangning Song
- Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Victoria, 3800, Australia.
| |
Collapse
|
10
|
Kim Y, Lee M. Deep Learning Approaches for lncRNA-Mediated Mechanisms: A Comprehensive Review of Recent Developments. Int J Mol Sci 2023; 24:10299. [PMID: 37373445 DOI: 10.3390/ijms241210299] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2023] [Revised: 06/16/2023] [Accepted: 06/17/2023] [Indexed: 06/29/2023] Open
Abstract
This review paper provides an extensive analysis of the rapidly evolving convergence of deep learning and long non-coding RNAs (lncRNAs). Considering the recent advancements in deep learning and the increasing recognition of lncRNAs as crucial components in various biological processes, this review aims to offer a comprehensive examination of these intertwined research areas. The remarkable progress in deep learning necessitates thoroughly exploring its latest applications in the study of lncRNAs. Therefore, this review provides insights into the growing significance of incorporating deep learning methodologies to unravel the intricate roles of lncRNAs. By scrutinizing the most recent research spanning from 2021 to 2023, this paper provides a comprehensive understanding of how deep learning techniques are employed in investigating lncRNAs, thereby contributing valuable insights to this rapidly evolving field. The review is aimed at researchers and practitioners looking to integrate deep learning advancements into their lncRNA studies.
Collapse
Affiliation(s)
- Yoojoong Kim
- School of Computer Science and Information Engineering, The Catholic University of Korea, Bucheon 14662, Republic of Korea
| | - Minhyeok Lee
- School of Electrical and Electronics Engineering, Chung-Ang University, Seoul 06974, Republic of Korea
| |
Collapse
|
11
|
Xuan P, Zhao Y, Cui H, Zhan L, Jin Q, Zhang T, Nakaguchi T. Semantic Meta-Path Enhanced Global and Local Topology Learning for lncRNA-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1480-1491. [PMID: 36173783 DOI: 10.1109/tcbb.2022.3209571] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Since abnormal expression of long non-coding RNAs (lncRNAs) is associated with various human diseases, identifying disease-related lncRNAs helps reveal the pathogenesis of diseases. Existing methods for lncRNA-disease association prediction mainly focus on multi-sourced data related to lncRNAs and diseases. The rich semantic information of meta-paths, composed of multiple kinds of connections between lncRNA and disease nodes, is neglected. We propose a new prediction method, MGLDA, to encode and integrate the semantics of multiple meta-paths, the global topology of heterogeneous graph, and pairwise attributes of lncRNA and disease nodes. First, a tri-layer heterogeneous graph is constructed to associate multi-sourced data across the lncRNA, disease, and miRNA nodes. Afterwards, we establish multiple meta-paths connecting the lncRNA and disease nodes to derive and denote various semantics. Each meta-path contains its specific semantics formulated by an embedding strategy, and each embedding covers local topology formed by the diverse semantic connections among the lncRNA, disease, and miRNA nodes. We construct multiple graph convolutional autoencoders (GCA) with topology-level attention to learn global and multiple local topologies from the tri-layer graph and each meta-path, respectively. The topology-level attention mechanism can learn the importance of various global and local topologies for adaptive pairwise topology fusion. Finally, a convolutional autoencoder learns the attribute representations of lncRNA-disease pairs, which integrates the learnt detailed and representative pairwise features. Experimental results show that MGLDA outperforms other state-of-the-art prediction methods in comparison and retrieves more real lncRNA-disease associations in the top-ranked candidates. The ablation study also demonstrates the important contributions of the local and global topology learning, and pairwise attribute learning. Case studies on three diseases further demonstrate MGLDA's ability to identify potential disease-related lncRNAs.
Collapse
|
12
|
Constructing discriminative feature space for LncRNA-protein interaction based on deep autoencoder and marginal fisher analysis. Comput Biol Med 2023; 157:106711. [PMID: 36924738 DOI: 10.1016/j.compbiomed.2023.106711] [Citation(s) in RCA: 5] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2022] [Revised: 01/26/2023] [Accepted: 02/26/2023] [Indexed: 03/04/2023]
Abstract
Long non-coding RNAs (lncRNAs) play important roles by regulating proteins in many biological processes and life activities. To uncover molecular mechanisms of lncRNA, it is very necessary to identify interactions of lncRNA with proteins. Recently, some machine learning methods were proposed to detect lncRNA-protein interactions according to the distribution of known interactions. The performances of these methods were largely dependent upon: (1) how exactly the distribution of known interactions was characterized by feature space; (2) how discriminative the feature space was for distinguishing lncRNA-protein interactions. Because the known interactions may be multiple and complex model, it remains a challenge to construct discriminative feature space for lncRNA-protein interactions. To resolve this problem, a novel method named DFRPI was developed based on deep autoencoder and marginal fisher analysis in this paper. Firstly, some initial features of lncRNA-protein interactions were extracted from the primary sequences and secondary structures of lncRNA and protein. Secondly, a deep autoencoder was exploited to learn encode parameters of the initial features to describe the known interactions precisely. Next, the marginal fisher analysis was employed to optimize the encode parameters of features to characterize a discriminative feature space of the lncRNA-protein interactions. Finally, a random forest-based predictor was trained on the discriminative feature space to detect lncRNA-protein interactions. Verified by a series of experiments, the results showed that our predictor achieved the precision of 0.920, recall of 0.916, accuracy of 0.918, MCC of 0.836, specificity of 0.920, sensitivity of 0.916 and AUC of 0.906 respectively, which outperforms the concerned methods for predicting lncRNA-protein interaction. It may be suggested that the proposed method can generate a reasonable and effective feature space for distinguishing lncRNA-protein interactions accurately. The code and data are available on https://github.com/D0ub1e-D/DFRPI.
Collapse
|
13
|
Shirvaliloo M. LncRNA H19 promotes tumor angiogenesis in smokers by targeting anti-angiogenic miRNAs. Epigenomics 2023; 15:61-73. [PMID: 36802727 DOI: 10.2217/epi-2022-0145] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/22/2023] Open
Abstract
A key concept in drug discovery is the identification of candidate therapeutic targets such as long noncoding RNAs (lncRNAs) because of their extensive involvement in neoplasms, and impressionability by smoking. Induced by exposure to cigarette smoke, lncRNA H19 targets and inactivates miR-29, miR-30a, miR-107, miR-140, miR-148b, miR-199a and miR-200, which control the rate of angiogenesis by inhibiting BiP, DLL4, FGF7, HIF1A, HIF1B, HIF2A, PDGFB, PDGFRA, VEGFA, VEGFB, VEGFC, VEGFR1, VEGFR2 and VEGFR3. Nevertheless, these miRNAs are often dysregulated in bladder cancer, breast cancer, colorectal cancer, glioma, gastric adenocarcinoma, hepatocellular carcinoma, meningioma, non-small-cell lung carcinoma, oral squamous cell carcinoma, ovarian cancer, prostate adenocarcinoma and renal cell carcinoma. As such, the present perspective article seeks to establish an evidence-based hypothetical model of how a smoking-related lncRNA known as H19 might aggravate angiogenesis by interfering with miRNAs that would otherwise regulate angiogenesis in a nonsmoking individual.
Collapse
Affiliation(s)
- Milad Shirvaliloo
- Infectious & Tropical Diseases Research Center, Tabriz University of Medical Sciences, Tabriz, 15731, Iran.,Future Science Group, Unitec House, 2 Albert Place, London, N3 1QB, UK
| |
Collapse
|
14
|
Sheng N, Huang L, Lu Y, Wang H, Yang L, Gao L, Xie X, Fu Y, Wang Y. Data resources and computational methods for lncRNA-disease association prediction. Comput Biol Med 2023; 153:106527. [PMID: 36610216 DOI: 10.1016/j.compbiomed.2022.106527] [Citation(s) in RCA: 14] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/20/2022] [Revised: 12/08/2022] [Accepted: 12/31/2022] [Indexed: 01/03/2023]
Abstract
Increasing interest has been attracted in deciphering the potential disease pathogenesis through lncRNA-disease association (LDA) prediction, regarding to the diverse functional roles of lncRNAs in genome regulation. Whilst, computational models and algorithms benefit systematic biology research, even facilitate the classical biological experimental procedures. In this review, we introduce representative diseases associated with lncRNAs, such as cancers, cardiovascular diseases, and neurological diseases. Current publicly available resources related to lncRNAs and diseases have also been included. Furthermore, all of the 64 computational methods for LDA prediction have been divided into 5 groups, including machine learning-based methods, network propagation-based methods, matrix factorization- and completion-based methods, deep learning-based methods, and graph neural network-based methods. The common evaluation methods and metrics in LDA prediction have also been discussed. Finally, the challenges and future trends in LDA prediction have been discussed. Recent advances in LDA prediction approaches have been summarized in the GitHub repository at https://github.com/sheng-n/lncRNA-disease-methods.
Collapse
Affiliation(s)
- Nan Sheng
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Lan Huang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China.
| | - Yuting Lu
- School of Artificial Intelligence, Jilin University, Changchun, China
| | - Hao Wang
- Department of Hepatopancreatobiliary Surgery, Second Affiliated Hospital of Harbin Medical University, Harbin, China
| | - Lili Yang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; Department of Obstetrics, The First Hospital of Jilin University, Changchun, China
| | - Ling Gao
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Xuping Xie
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China
| | - Yuan Fu
- Institute of Biological, Environmental and Rural Sciences, Aberystwyth University, Aberystwyth, Ceredigion, United Kingdom
| | - Yan Wang
- Key Laboratory of Symbol Computation and Knowledge Engineering of Ministry of Education, College of Computer Science and Technology, Jilin University, Changchun, China; School of Artificial Intelligence, Jilin University, Changchun, China.
| |
Collapse
|
15
|
Zhang Z, Xu J, Wu Y, Liu N, Wang Y, Liang Y. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform 2023; 24:6889447. [PMID: 36511221 DOI: 10.1093/bib/bbac531] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
Cumulative studies have shown that many long non-coding RNAs (lncRNAs) are crucial in a number of diseases. Predicting potential lncRNA-disease associations (LDAs) can facilitate disease prevention, diagnosis and treatment. Therefore, it is vital to develop practical computational methods for LDA prediction. In this study, we propose a novel predictor named capsule network (CapsNet)-LDA for LDA prediction. CapsNet-LDA first uses a stacked autoencoder for acquiring the informative low-dimensional representations of the lncRNA-disease pairs under multiple views, then the attention mechanism is leveraged to implement an adaptive allocation of importance weights to them, and they are subsequently processed using a CapsNet-based architecture for predicting LDAs. Different from the conventional convolutional neural networks (CNNs) that have some restrictions with the usage of scalar neurons and pooling operations. the CapsNets use vector neurons instead of scalar neurons that have better robustness for the complex combination of features and they use dynamic routing processes for updating parameters. CapsNet-LDA is superior to other five state-of-the-art models on four benchmark datasets, four perturbed datasets and an independent test set in the comparison experiments, demonstrating that CapsNet-LDA has excellent performance and robustness against perturbation, as well as good generalization ability. The ablation studies verify the effectiveness of some modules of CapsNet-LDA. Moreover, the ability of multi-view data to improve performance is proven. Case studies further indicate that CapsNet-LDA can accurately predict novel LDAs for specific diseases.
Collapse
Affiliation(s)
- Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Yinglong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| |
Collapse
|
16
|
Liang Q, Zhang W, Wu H, Liu B. LncRNA-disease association identification using graph auto-encoder and learning to rank. Brief Bioinform 2023; 24:6955271. [PMID: 36545805 DOI: 10.1093/bib/bbac539] [Citation(s) in RCA: 12] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/20/2022] [Revised: 10/18/2022] [Accepted: 11/08/2022] [Indexed: 12/24/2022] Open
Abstract
Discovering the relationships between long non-coding RNAs (lncRNAs) and diseases is significant in the treatment, diagnosis and prevention of diseases. However, current identified lncRNA-disease associations are not enough because of the expensive and heavy workload of wet laboratory experiments. Therefore, it is greatly important to develop an efficient computational method for predicting potential lncRNA-disease associations. Previous methods showed that combining the prediction results of the lncRNA-disease associations predicted by different classification methods via Learning to Rank (LTR) algorithm can be effective for predicting potential lncRNA-disease associations. However, when the classification results are incorrect, the ranking results will inevitably be affected. We propose the GraLTR-LDA predictor based on biological knowledge graphs and ranking framework for predicting potential lncRNA-disease associations. Firstly, homogeneous graph and heterogeneous graph are constructed by integrating multi-source biological information. Then, GraLTR-LDA integrates graph auto-encoder and attention mechanism to extract embedded features from the constructed graphs. Finally, GraLTR-LDA incorporates the embedded features into the LTR via feature crossing statistical strategies to predict priority order of diseases associated with query lncRNAs. Experimental results demonstrate that GraLTR-LDA outperforms the other state-of-the-art predictors and can effectively detect potential lncRNA-disease associations. Availability and implementation: Datasets and source codes are available at http://bliulab.net/GraLTR-LDA.
Collapse
Affiliation(s)
- Qi Liang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China.,Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
17
|
Zhang W, Liu B. iSnoDi-LSGT: identifying snoRNA-disease associations based on local similarity constraints and global topological constraints. RNA (NEW YORK, N.Y.) 2022; 28:1558-1567. [PMID: 36192132 PMCID: PMC9670808 DOI: 10.1261/rna.079325.122] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/22/2022] [Accepted: 09/26/2022] [Indexed: 06/16/2023]
Abstract
Growing evidence proves that small nucleolar RNAs (snoRNAs) have important functions in various biological processes, the malfunction of which leads to the emergence and development of complex diseases. However, identifying snoRNA-disease associations is an ongoing challenging task due to the considerable time- and money-consuming biological experiments. Therefore, it is urgent to design efficient and economical methods for the identification of snoRNA-disease associations. In this regard, we propose a computational method named iSnoDi-LSGT, which utilizes snoRNA sequence similarity and disease similarity as local similarity constraints. The iSnoDi-LSGT predictor further employs network embedding technology to extract topological features of snoRNAs and diseases, based on which snoRNA topological similarity and disease topological similarity are calculated as global topological constraints. To the best of our knowledge, the iSnoDi-LSGT is the first computational method for snoRNA-disease association identification. The experimental results indicate that the iSnoDi-LSGT predictor can effectively predict unknown snoRNA-disease associations. The web server of the iSnoDi-LSGT predictor is freely available at http://bliulab.net/iSnoDi-LSGT.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China
| |
Collapse
|
18
|
Xuan P, Wang S, Cui H, Zhao Y, Zhang T, Wu P. Learning global dependencies and multi-semantics within heterogeneous graph for predicting disease-related lncRNAs. Brief Bioinform 2022; 23:6695267. [DOI: 10.1093/bib/bbac361] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2022] [Revised: 07/18/2022] [Accepted: 08/05/2022] [Indexed: 11/14/2022] Open
Abstract
Abstract
Motivation
Long noncoding RNAs (lncRNAs) play an important role in the occurrence and development of diseases. Predicting disease-related lncRNAs can help to understand the pathogenesis of diseases deeply. The existing methods mainly rely on multi-source data related to lncRNAs and diseases when predicting the associations between lncRNAs and diseases. There are interdependencies among node attributes in a heterogeneous graph composed of all lncRNAs, diseases and micro RNAs. The meta-paths composed of various connections between them also contain rich semantic information. However, the existing methods neglect to integrate attribute information of intermediate nodes in meta-paths.
Results
We propose a novel association prediction model, GSMV, to learn and deeply integrate the global dependencies, semantic information of meta-paths and node-pair multi-view features related to lncRNAs and diseases. We firstly formulate the global representations of the lncRNA and disease nodes by establishing a self-attention mechanism to capture and learn the global dependencies among node attributes. Second, starting from the lncRNA and disease nodes, respectively, multiple meta-pathways are established to reveal different semantic information. Considering that each meta-path contains specific semantics and has multiple meta-path instances which have different contributions to revealing meta-path semantics, we design a graph neural network based module which consists of a meta-path instance encoding strategy and two novel attention mechanisms. The proposed meta-path instance encoding strategy is used to learn the contextual connections between nodes within a meta-path instance. One of the two new attention mechanisms is at the meta-path instance level, which learns rich and informative meta-path instances. The other attention mechanism integrates various semantic information from multiple meta-paths to learn the semantic representation of lncRNA and disease nodes. Finally, a dilated convolution-based learning module with adjustable receptive fields is proposed to learn multi-view features of lncRNA-disease node pairs. The experimental results prove that our method outperforms seven state-of-the-art comparing methods for lncRNA-disease association prediction. Ablation experiments demonstrate the contributions of the proposed global representation learning, semantic information learning, pairwise multi-view feature learning and the meta-path instance encoding strategy. Case studies on three cancers further demonstrate our method’s ability to discover potential disease-related lncRNA candidates.
Contact
zhang@hlju.edu.cn or peiliangwu@ysu.edu.cn
Supplementary information
Supplementary data are available at Briefings in Bioinformatics online.
Collapse
Affiliation(s)
- Ping Xuan
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Shuai Wang
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University , Melbourne 3083, Australia
| | - Yue Zhao
- School of Computer Science and Technology, Heilongjiang University , Harbin 150080, China
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University , Harbin 150080, China
| | - Peiliang Wu
- School of Information Science and Engineering (School of Software), Yanshan University , Qinhuangdao 066004, China
| |
Collapse
|
19
|
Zhang Y, Ye F, Gao X. MCA-Net: Multi-Feature Coding and Attention Convolutional Neural Network for Predicting lncRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:2907-2919. [PMID: 34283719 DOI: 10.1109/tcbb.2021.3098126] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
With the advent of the era of big data, it is troublesome to accurately predict the associations between lncRNAs and diseases based on traditional biological experiments due to its time-consuming and subjective. In this paper, we propose a novel deep learning method for predicting lncRNA-disease associations using multi-feature coding and attention convolutional neural network (MCA-Net). We first calculate six similarity features to extract different types of lncRNA and disease feature information. Second, a multi-feature coding method is proposed to construct the feature vectors of lncRNA-disease association samples by integrating the six similarity features. Furthermore, an attention convolutional neural network is developed to identify lncRNA-disease associations under 10-fold cross-validation. Finally, we evaluate the performance of MCA-Net from different perspectives including the effects of the model parameters, distinct deep learning models, and the necessity of attention mechanism. We also compare MCA-Net with several state-of-the-art methods on three publicly available datasets, i.e., LncRNADisease, Lnc2Cancer, and LncRNADisease2.0. The results show that our MCA-Net outperforms the state-of-the-art methods on all three dataset. Besides, case studies on breast cancer and lung cancer further verify that MCA-Net is effective and accurate for the lncRNA-disease association prediction.
Collapse
|
20
|
Zhang W, Hou J, Liu B. iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 2022; 18:e1010404. [PMID: 35969645 PMCID: PMC9410559 DOI: 10.1371/journal.pcbi.1010404] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 08/25/2022] [Accepted: 07/18/2022] [Indexed: 12/01/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
21
|
Qiu XY, Wu H, Shao J. TALE-cmap: Protein function prediction based on a TALE-based architecture and the structure information from contact map. Comput Biol Med 2022; 149:105938. [DOI: 10.1016/j.compbiomed.2022.105938] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/26/2022] [Revised: 07/26/2022] [Accepted: 08/06/2022] [Indexed: 11/03/2022]
|
22
|
Liang Y, Zhang ZQ, Liu NN, Wu YN, Gu CL, Wang YL. MAGCNSE: predicting lncRNA-disease associations using multi-view attention graph convolutional network and stacking ensemble model. BMC Bioinformatics 2022; 23:189. [PMID: 35590258 PMCID: PMC9118755 DOI: 10.1186/s12859-022-04715-w] [Citation(s) in RCA: 24] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/15/2022] [Accepted: 05/05/2022] [Indexed: 01/02/2023] Open
Abstract
BACKGROUND Many long non-coding RNAs (lncRNAs) have key roles in different human biologic processes and are closely linked to numerous human diseases, according to cumulative evidence. Predicting potential lncRNA-disease associations can help to detect disease biomarkers and perform disease analysis and prevention. Establishing effective computational methods for lncRNA-disease association prediction is critical. RESULTS In this paper, we propose a novel model named MAGCNSE to predict underlying lncRNA-disease associations. We first obtain multiple feature matrices from the multi-view similarity graphs of lncRNAs and diseases utilizing graph convolutional network. Then, the weights are adaptively assigned to different feature matrices of lncRNAs and diseases using the attention mechanism. Next, the final representations of lncRNAs and diseases is acquired by further extracting features from the multi-channel feature matrices of lncRNAs and diseases using convolutional neural network. Finally, we employ a stacking ensemble classifier, consisting of multiple traditional machine learning classifiers, to make the final prediction. The results of ablation studies in both representation learning methods and classification methods demonstrate the validity of each module. Furthermore, we compare the overall performance of MAGCNSE with that of six other state-of-the-art models, the results show that it outperforms the other methods. Moreover, we verify the effectiveness of using multi-view data of lncRNAs and diseases. Case studies further reveal the outstanding ability of MAGCNSE in the identification of potential lncRNA-disease associations. CONCLUSIONS The experimental results indicate that MAGCNSE is a useful approach for predicting potential lncRNA-disease associations.
Collapse
Affiliation(s)
- Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ze-Qun Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Nian-Nian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Ya-Nan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| | - Chang-Long Gu
- College of Information Science and Engineering, Hunan University, Changsha, China
| | - Ying-Long Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, China
| |
Collapse
|
23
|
Wu H, Liang Q, Zhang W, Zou Q, El-Latif Hesham A, Liu B. iLncDA-LTR: Identification of lncRNA-disease associations by learning to rank. Comput Biol Med 2022; 146:105605. [PMID: 35594681 DOI: 10.1016/j.compbiomed.2022.105605] [Citation(s) in RCA: 9] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/01/2022] [Revised: 04/27/2022] [Accepted: 05/09/2022] [Indexed: 12/12/2022]
Abstract
Identifying the associations between lncRNAs and diseases is helpful for the treatment and diagnosis of complex diseases. The existing computational methods mainly focus on the identification of associations between known lncRNAs and known diseases. However, with the application of high-throughput sequencing in lncRNA research, more and more lncRNAs have been detected. Predicting diseases related with newly detected lncRNAs has not been fully explored. Therefore, there is an urgent need for developing powerful computational methods to predict diseases related with newly detected lncRNAs. In this paper, we propose a Learning to Rank (LTR)-based method called iLncDA-LTR to predict diseases related with newly detected lncRNAs. iLncDA-LTR treats this task as an information retrieval task. The newly detected lncRNAs and diseases are considered as queries and documents, respectively. For a given newly detected lncRNA (query), iLncDA-LTR integrates multiple relevant information into LTR for predicting candidate diseases associated with query lncRNA. Experimental results show that iLncDA-LTR outperforms the other exiting state-of-the-art predictors on independent dataset. The corresponding web server of iLncDA-LTR has been constructed as well (http://bliulab.net/iLncDA-LTR/).
Collapse
Affiliation(s)
- Hao Wu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Qi Liang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China.
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China.
| | - Abd El-Latif Hesham
- Genetics Department, Faculty of Agriculture, Beni-Suef University, Beni-Suef, 62511, Egypt.
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China.
| |
Collapse
|
24
|
He T, Li J, Wang P, Zhang Z. Artificial intelligence predictive system of individual survival rate for lung adenocarcinoma. Comput Struct Biotechnol J 2022; 20:2352-2359. [PMID: 35615023 PMCID: PMC9123088 DOI: 10.1016/j.csbj.2022.05.005] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/12/2022] [Revised: 05/05/2022] [Accepted: 05/05/2022] [Indexed: 12/24/2022] Open
Abstract
Background The current research aimed to develop an artificial intelligence predictive system for individual survival rate of lung adenocarcinoma (LUAD). Methods Independent risk variables were identified by multivariate Cox regression. Artificial intelligence predictive system was constructed using three different data mining algorithms. Results Stage, PM, chemotherapy, PN, age, PT, sex, and radiation_surgery were determined as risk factors for LUAD patients. For 12-month survival rate in model cohort, concordance indexes of RFS, MTLR, and Cox models were 0.852, 0.821, and 0.835, respectively. For 36-month survival rate in model cohort, concordance indexes of RFS, MTLR, and Cox models were 0.901, 0.864, and 0.862, respectively. For 60-month survival rate in model cohort, concordance indexes of RFS, MTLR, and Cox models were 0.899, 0.874, and 0.866, respectively. The concordance indexes in validation dataset were similar to those in model dataset. Conclusions The current study designed an individualized survival predictive system, which could provide individual survival curves using three different artificial intelligence algorithms. This artificial intelligence predictive system could directly convey treatment benefits by comparing individual mortality risk curves under different treatments. This artificial intelligence predictive tool is available at https://zhangzhiqiao11.shinyapps.io/Artificial_Intelligence_Survival_Prediction_System_AI_E1001/.
Collapse
|
25
|
Peng L, Tan J, Tian X, Zhou L. EnANNDeep: An Ensemble-based lncRNA-protein Interaction Prediction Framework with Adaptive k-Nearest Neighbor Classifier and Deep Models. Interdiscip Sci 2022; 14:209-232. [PMID: 35006529 DOI: 10.1007/s12539-021-00483-y] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/06/2021] [Revised: 09/14/2021] [Accepted: 09/15/2021] [Indexed: 01/08/2023]
Abstract
lncRNA-protein interactions (LPIs) prediction can deepen the understanding of many important biological processes. Artificial intelligence methods have reported many possible LPIs. However, most computational techniques were evaluated mainly on one dataset, which may produce prediction bias. More importantly, they were validated only under cross validation on lncRNA-protein pairs, and did not consider the performance under cross validations on lncRNAs and proteins, thus fail to search related proteins/lncRNAs for a new lncRNA/protein. Under an ensemble learning framework (EnANNDeep) composed of adaptive k-nearest neighbor classifier and Deep models, this study focuses on systematically finding underlying linkages between lncRNAs and proteins. First, five LPI-related datasets are arranged. Second, multiple source features are integrated to depict an lncRNA-protein pair. Third, adaptive k-nearest neighbor classifier, deep neural network, and deep forest are designed to score unknown lncRNA-protein pairs, respectively. Finally, interaction probabilities from the three predictors are integrated based on a soft voting technique. In comparing to five classical LPI identification models (SFPEL, PMDKN, CatBoost, PLIPCOM, and LPI-SKF) under fivefold cross validations on lncRNAs, proteins, and LPIs, EnANNDeep computes the best average AUCs of 0.8660, 0.8775, and 0.9166, respectively, and the best average AUPRs of 0.8545, 0.8595, and 0.9054, respectively, indicating its superior LPI prediction ability. Case study analyses indicate that SNHG10 may have dense linkage with Q15717. In the ensemble framework, adaptive k-nearest neighbor classifier can separately pick the most appropriate k for each query lncRNA-protein pair. More importantly, deep models including deep neural network and deep forest can effectively learn the representative features of lncRNAs and proteins.
Collapse
Affiliation(s)
- Lihong Peng
- School of Computer Science, Hunan University of Technology, Zhuzhou, China. .,College of Life Sciences and Chemistry, Hunan University of Technology, Zhuzhou, China.
| | - Jingwei Tan
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Xiongfei Tian
- School of Computer Science, Hunan University of Technology, Zhuzhou, China
| | - Liqian Zhou
- School of Computer Science, Hunan University of Technology, Zhuzhou, China.
| |
Collapse
|
26
|
Wang L, Zhong C. gGATLDA: lncRNA-disease association prediction based on graph-level graph attention network. BMC Bioinformatics 2022; 23:11. [PMID: 34983363 PMCID: PMC8729153 DOI: 10.1186/s12859-021-04548-z] [Citation(s) in RCA: 20] [Impact Index Per Article: 6.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/18/2021] [Accepted: 12/21/2021] [Indexed: 01/20/2023] Open
Abstract
Background Long non-coding RNAs (lncRNAs) are related to human diseases by regulating gene expression. Identifying lncRNA-disease associations (LDAs) will contribute to diagnose, treatment, and prognosis of diseases. However, the identification of LDAs by the biological experiments is time-consuming, costly and inefficient. Therefore, the development of efficient and high-accuracy computational methods for predicting LDAs is of great significance. Results In this paper, we propose a novel computational method (gGATLDA) to predict LDAs based on graph-level graph attention network. Firstly, we extract the enclosing subgraphs of each lncRNA-disease pair. Secondly, we construct the feature vectors by integrating lncRNA similarity and disease similarity as node attributes in subgraphs. Finally, we train a graph neural network (GNN) model by feeding the subgraphs and feature vectors to it, and use the trained GNN model to predict lncRNA-disease potential association scores. The experimental results show that our method can achieve higher area under the receiver operation characteristic curve (AUC), area under the precision recall curve (AUPR), accuracy and F1-Score than the state-of-the-art methods in five fold cross-validation. Case studies show that our method can effectively identify lncRNAs associated with breast cancer, gastric cancer, prostate cancer, and renal cancer. Conclusion The experimental results indicate that our method is a useful approach for predicting potential LDAs.
Collapse
Affiliation(s)
- Li Wang
- School of Computer Science and Engineering, South China University of Technology, Guangzhou, China.,School of Computer, Electronics and Information, Guangxi University, Nanning, China
| | - Cheng Zhong
- School of Computer, Electronics and Information, Guangxi University, Nanning, China. .,Key Laboratory of Parallel and Distributed Computing in Guangxi Colleges and Universities, Guangxi University, Nanning, China.
| |
Collapse
|
27
|
Gong Y, Zhu W, Sun M, Shi L. Bioinformatics Analysis of Long Non-coding RNA and Related Diseases: An Overview. Front Genet 2021; 12:813873. [PMID: 34956340 PMCID: PMC8692768 DOI: 10.3389/fgene.2021.813873] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/12/2021] [Accepted: 11/26/2021] [Indexed: 12/30/2022] Open
Abstract
Long non-coding RNAs (lncRNAs) are usually located in the nucleus and cytoplasm of cells. The transcripts of lncRNAs are >200 nucleotides in length and do not encode proteins. Compared with small RNAs, lncRNAs have longer sequences, more complex spatial structures, and more diverse and complex mechanisms involved in the regulation of gene expression. LncRNAs are widely involved in the biological processes of cells, and in the occurrence and development of many human diseases. Many studies have shown that lncRNAs can induce the occurrence of diseases, and some lncRNAs undergo specific changes in tumor cells. Research into the roles of lncRNAs has covered the diagnosis of, for example, cardiovascular, cerebrovascular, and central nervous system diseases. The bioinformatics of lncRNAs has gradually become a research hotspot and has led to the discovery of a large number of lncRNAs and associated biological functions, and lncRNA databases and recognition models have been developed. In this review, the research progress of lncRNAs is discussed, and lncRNA-related databases and the mechanisms and modes of action of lncRNAs are described. In addition, disease-related lncRNA methods and the relationships between lncRNAs and human lung adenocarcinoma, rectal cancer, colon cancer, heart disease, and diabetes are discussed. Finally, the significance and existing problems of lncRNA research are considered.
Collapse
Affiliation(s)
- Yuxin Gong
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Smart Education, Hainan Normal University, Ministry of Education, Haikou, China
| | - Wen Zhu
- School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Meili Sun
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| |
Collapse
|
28
|
Xuan P, Zhan L, Cui H, Zhang T, Nakaguchi T, Zhang W. Graph Triple-Attention Network for Disease-related LncRNA Prediction. IEEE J Biomed Health Inform 2021; 26:2839-2849. [PMID: 34813484 DOI: 10.1109/jbhi.2021.3130110] [Citation(s) in RCA: 18] [Impact Index Per Article: 4.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/06/2022]
Abstract
Abnormal expressions of long non-coding RNAs (lncRNAs) are associated with various human diseases. Identifying disease-related lncRNAs can help clarify complex disease pathogeneses. The latest methods for lncRNA-disease association prediction rely on diverse data about lncRNAs and diseases. These methods, however, cannot adequately integrate the neighbour topological information of lncRNA and disease nodes. Moreover, more intrinsic features of lncRNA-disease node pairs can be explored to better predict the latent associations between lncRNAs and diseases. We developed a novel method, named GTAN, to predict the association propensities between lncRNAs and diseases. GTAN integrates various information about lncRNAs and diseases, including similarities, associations and interactions among lncRNAs, diseases and miRNAs, and exploits neighbour topology and attribute representations of a pair of lncRNA-disease nodes. We adopted in GTAN a graph neural network architecture with three attention mechanisms and multi-layer convolutional neural networks. First, a neighbour-level self-attention mechanism is constructed to learn the importance of each neighbour for an interested lncRNA or disease node. Second, topology-level attention is proposed to enhance contextual dependencies among multiple local topology representations of the lncRNA or disease node. An attention-enhanced graph neural network framework is then established to learn a topology representation of top-ranked neighbours for a pair of lncRNA-disease nodes. GTAN also has attribute-level attention to distinguish various contributions of attributes of the lncRNA-disease pair. Finally, attribute representation is learned by multi-layer CNN to integrate detailed features and representative features of the pair. Extensive experimental results demonstrated that GTAN outperformed state-of-the-art methods. The improved recall rates also showed GTANs capacity for retrieving more actual lncRNA-disease associations in the top-ranked candidates. The ablation studies confirmed the important contributions of three attention mechanisms. Case studies on lung cancer, prostate cancer and colon cancer further showed GTANs ability in discovering potential lncRNA candidates related to diseases.
Collapse
|
29
|
Li Y, Wang R, Zhang S, Xu H, Deng L. LRGCPND: Predicting Associations between ncRNA and Drug Resistance via Linear Residual Graph Convolution. Int J Mol Sci 2021; 22:10508. [PMID: 34638849 PMCID: PMC8508984 DOI: 10.3390/ijms221910508] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/24/2021] [Revised: 09/25/2021] [Accepted: 09/27/2021] [Indexed: 01/08/2023] Open
Abstract
Accurate inference of the relationship between non-coding RNAs (ncRNAs) and drug resistance is essential for understanding the complicated mechanisms of drug actions and clinical treatment. Traditional biological experiments are time-consuming, laborious, and minor in scale. Although several databases provide relevant resources, computational method for predicting this type of association has not yet been developed. In this paper, we leverage the verified association data of ncRNA and drug resistance to construct a bipartite graph and then develop a linear residual graph convolution approach for predicting associations between non-coding RNA and drug resistance (LRGCPND) without introducing or defining additional data. LRGCPND first aggregates the potential features of neighboring nodes per graph convolutional layer. Next, we transform the information between layers through a linear function. Eventually, LRGCPND unites the embedding representations of each layer to complete the prediction. Results of comparison experiments demonstrate that LRGCPND has more reliable performance than seven other state-of-the-art approaches with an average AUC value of 0.8987. Case studies illustrate that LRGCPND is an effective tool for inferring the associations between ncRNA and drug resistance.
Collapse
Affiliation(s)
| | | | | | | | - Lei Deng
- School of Computer Science and Engineering, Central South University, Changsha 410083, China; (Y.L.); (R.W.); (S.Z.); (H.X.)
| |
Collapse
|
30
|
Fan Y, Chen M, Pan X. GCRFLDA: scoring lncRNA-disease associations using graph convolution matrix completion with conditional random field. Brief Bioinform 2021; 23:6363052. [PMID: 34486019 DOI: 10.1093/bib/bbab361] [Citation(s) in RCA: 37] [Impact Index Per Article: 9.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2021] [Revised: 07/19/2021] [Accepted: 08/16/2021] [Indexed: 12/12/2022] Open
Abstract
Long noncoding RNAs (lncRNAs) play important roles in various biological regulatory processes, and are closely related to the occurrence and development of diseases. Identifying lncRNA-disease associations is valuable for revealing the molecular mechanism of diseases and exploring treatment strategies. Thus, it is necessary to computationally predict lncRNA-disease associations as a complementary method for biological experiments. In this study, we proposed a novel prediction method GCRFLDA based on the graph convolutional matrix completion. GCRFLDA first constructed a graph using the available lncRNA-disease association information. Then, it constructed an encoder consisting of conditional random field and attention mechanism to learn efficient embeddings of nodes, and a decoder layer to score lncRNA-disease associations. In GCRFLDA, the Gaussian interaction profile kernels similarity and cosine similarity were fused as side information of lncRNA and disease nodes. Experimental results on four benchmark datasets show that GCRFLDA is superior to other existing methods. Moreover, we conducted case studies on four diseases and observed that 70 of 80 predicted associated lncRNAs were confirmed by the literature.
Collapse
Affiliation(s)
- Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology
| | - Meijun Chen
- Guilin University of Electronic Technology, Guilin 541004, China
| | - Xiaoyong Pan
- Department of Automation of Shanghai Jiao Tong University
| |
Collapse
|
31
|
Lei X, Mudiyanselage TB, Zhang Y, Bian C, Lan W, Yu N, Pan Y. A comprehensive survey on computational methods of non-coding RNA and disease association prediction. Brief Bioinform 2020; 22:6042241. [PMID: 33341893 DOI: 10.1093/bib/bbaa350] [Citation(s) in RCA: 35] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2020] [Revised: 10/20/2020] [Accepted: 11/01/2020] [Indexed: 02/06/2023] Open
Abstract
The studies on relationships between non-coding RNAs and diseases are widely carried out in recent years. A large number of experimental methods and technologies of producing biological data have also been developed. However, due to their high labor cost and production time, nowadays, calculation-based methods, especially machine learning and deep learning methods, have received a lot of attention and been used commonly to solve these problems. From a computational point of view, this survey mainly introduces three common non-coding RNAs, i.e. miRNAs, lncRNAs and circRNAs, and the related computational methods for predicting their association with diseases. First, the mainstream databases of above three non-coding RNAs are introduced in detail. Then, we present several methods for RNA similarity and disease similarity calculations. Later, we investigate ncRNA-disease prediction methods in details and classify these methods into five types: network propagating, recommend system, matrix completion, machine learning and deep learning. Furthermore, we provide a summary of the applications of these five types of computational methods in predicting the associations between diseases and miRNAs, lncRNAs and circRNAs, respectively. Finally, the advantages and limitations of various methods are identified, and future researches and challenges are also discussed.
Collapse
Affiliation(s)
- Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | | | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Chen Bian
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Wei Lan
- School of Computer, Electronics and Information at Guangxi University, Nanning, China
| | - Ning Yu
- Department of Computing Sciences at the College at Brockport, State University of New York, Rochester, NY, USA
| | - Yi Pan
- Computer Science Department at Georgia State University, Atlanta, GA, USA
| |
Collapse
|
32
|
Wei H, Ding Y, Liu B. iPiDA-sHN: Identification of Piwi-interacting RNA-disease associations by selecting high quality negative samples. Comput Biol Chem 2020; 88:107361. [PMID: 32916452 DOI: 10.1016/j.compbiolchem.2020.107361] [Citation(s) in RCA: 16] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/05/2020] [Revised: 07/31/2020] [Accepted: 08/15/2020] [Indexed: 12/31/2022]
Abstract
As a large group of small non-coding RNAs (ncRNAs), Piwi-interacting RNAs (piRNAs) have been detected to be associated with various diseases. Identifying disease associated piRNAs can provide promising candidate molecular targets to promote the drug design. Although, a few computational ensemble methods have been developed for identifying piRNA-disease associations, the low-quality negative associations even with positive associations used during the training process prevent the predictive performance improvement. In this study, we proposed a new computational predictor named iPiDA-sHN to predict potential piRNA-disease associations. iPiDA-sHN presented the piRNA-disease pairs by incorporating piRNA sequence information, the known piRNA-disease association network, and the disease semantic graph. High-level features of piRNA-disease associations were extracted by the Convolutional Neural Network (CNN). Two-step positive-unlabeled learning strategy based on Support Vector Machine (SVM) was employed to select the high quality negative samples from the unknown piRNA-disease pairs. Finally, the SVM predictor trained with the known piRNA-disease associations and the high quality negative associations was used to predict new piRNA-disease associations. The experimental results showed that iPiDA-sHN achieved superior predictive ability compared with other state-of-the-art predictors.
Collapse
Affiliation(s)
- Hang Wei
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.
| | - Yuxin Ding
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China.
| | - Bin Liu
- School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen, Guangdong 518055, China; School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China; Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing 100081, China.
| |
Collapse
|
33
|
Wei H, Xu Y, Liu B. iPiDi-PUL: identifying Piwi-interacting RNA-disease associations based on positive unlabeled learning. Brief Bioinform 2020; 22:5829704. [PMID: 32393982 DOI: 10.1093/bib/bbaa058] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/16/2020] [Revised: 03/15/2020] [Accepted: 03/24/2020] [Indexed: 12/20/2022] Open
Abstract
Accumulated researches have revealed that Piwi-interacting RNAs (piRNAs) are regulating the development of germ and stem cells, and they are closely associated with the progression of many diseases. As the number of the detected piRNAs is increasing rapidly, it is important to computationally identify new piRNA-disease associations with low cost and provide candidate piRNA targets for disease treatment. However, it is a challenging problem to learn effective association patterns from the positive piRNA-disease associations and the large amount of unknown piRNA-disease pairs. In this study, we proposed a computational predictor called iPiDi-PUL to identify the piRNA-disease associations. iPiDi-PUL extracted the features of piRNA-disease associations from three biological data sources, including piRNA sequence information, disease semantic terms and the available piRNA-disease association network. Principal component analysis (PCA) was then performed on these features to extract the key features. The training datasets were constructed based on known positive associations and the negative associations selected from the unknown pairs. Various random forest classifiers trained with these different training sets were merged to give the predictive results via an ensemble learning approach. Finally, the web server of iPiDi-PUL was established at http://bliulab.net/iPiDi-PUL to help the researchers to explore the associated diseases for newly discovered piRNAs.
Collapse
|