1
|
Xuan P, Qi X, Chen S, Gu J, Wang X, Cui H, Lu J, Zhang T. Subgraph Topology and Dynamic Graph Topology Enhanced Graph Learning and Pairwise Feature Context Relationship Integration for Predicting Disease-Related miRNAs. J Chem Inf Model 2025; 65:1631-1640. [PMID: 39865931 DOI: 10.1021/acs.jcim.4c01757] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/28/2025]
Abstract
As an increasing number of microRNAs (miRNAs) have become biomarkers of various human diseases, prediction of the candidate disease-related miRNAs is helpful for facilitating the early diagnosis of diseases. Most of the recent prediction models concentrated on learning of the features from the heterogeneous graph composed of miRNAs and diseases. However, they failed to fully exploit the subgraph structures consisting of multiple miRNA and disease nodes, and they also did not completely integrate the context relationships among the pairwise features. We proposed a prediction model, SFPred, to integrate and encode the local topologies from neighborhood subgraphs, the dynamically evolved heterogeneous graph topology, and the context among pairwise features. First, the importance of an miRNA (disease) node to another node is formulated according to the subgraphs composed of their neighbors. Second, the features of each miRNA (disease) node continuously change when the graph encoding gradually deepens for the miRNA-disease heterogeneous network. A strategy based on multi-layer perceptron (MLP) is designed to estimate the edge weights according to the changed node features and form the dynamic graph topology. Third, considering the context relationships among the features of a pair of miRNA and disease nodes, a context relationship sensitive transformer is constructed to integrate these relationships. Finally, since the previous encoding layer of the transformer contains more detailed features of the pairwise, we present a multiperspective residual strategy to supplement the detailed features to the following encoding layer from the channel perspective and the feature one, respectively. The extensive experiments confirmed that SFPred outperforms eight state-of-the-art methods for the prediction of miRNA-disease associations, and the ablation experiments validate the effectiveness of the proposed innovations. The recall rates for the top-ranked candidate miRNAs related to the diseases and the case studies on three diseases indicate SFPred's ability in screening the reliable candidates for subsequent biological experiments.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Xiaoying Qi
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Sentao Chen
- Department of Computer Science and Technology, Shantou University, Shantou 515063, China
| | - Jing Gu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Xiuju Wang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Jun Lu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Tiangang Zhang
- School of Cyberspace Security, Hainan University, Haikou 570228, China
| |
Collapse
|
2
|
Kamble P, Nagar PR, Bhakhar KA, Garg P, Sobhia ME, Naidu S, Bharatam PV. Cancer pharmacoinformatics: Databases and analytical tools. Funct Integr Genomics 2024; 24:166. [PMID: 39294509 DOI: 10.1007/s10142-024-01445-5] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2024] [Revised: 08/26/2024] [Accepted: 09/03/2024] [Indexed: 09/20/2024]
Abstract
Cancer is a subject of extensive investigation, and the utilization of omics technology has resulted in the generation of substantial volumes of big data in cancer research. Numerous databases are being developed to manage and organize this data effectively. These databases encompass various domains such as genomics, transcriptomics, proteomics, metabolomics, immunology, and drug discovery. The application of computational tools into various core components of pharmaceutical sciences constitutes "Pharmacoinformatics", an emerging paradigm in rational drug discovery. The three major features of pharmacoinformatics include (i) Structure modelling of putative drugs and targets, (ii) Compilation of databases and analysis using statistical approaches, and (iii) Employing artificial intelligence/machine learning algorithms for the discovery of novel therapeutic molecules. The development, updating, and analysis of databases using statistical approaches play a pivotal role in pharmacoinformatics. Multiple software tools are associated with oncoinformatics research. This review catalogs the databases and computational tools related to cancer drug discovery and highlights their potential implications in the pharmacoinformatics of cancer.
Collapse
Affiliation(s)
- Pradnya Kamble
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prinsa R Nagar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Kaushikkumar A Bhakhar
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Prabha Garg
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - M Elizabeth Sobhia
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India
| | - Srivatsava Naidu
- Center of Biomedical Engineering, Indian Institute of Technology Ropar, Rupnagar, Punjab, India
| | - Prasad V Bharatam
- Department of Pharmacoinformatics, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
- Department of Medicinal Chemistry, National Institute of Pharmaceutical Education and Research, S.A.S. Nagar, Punjab, India.
| |
Collapse
|
3
|
Xuan P, Wang X, Cui H, Meng X, Nakaguchi T, Zhang T. Meta-Path Semantic and Global-Local Representation Learning Enhanced Graph Convolutional Model for Disease-Related miRNA Prediction. IEEE J Biomed Health Inform 2024; 28:4306-4316. [PMID: 38709611 DOI: 10.1109/jbhi.2024.3397003] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 05/08/2024]
Abstract
Dysregulation of miRNAs is closely related to the progression of various diseases, so identifying disease-related miRNAs is crucial. Most recently proposed methods are based on graph reasoning, while they did not completely exploit the topological structure composed of the higher-order neighbor nodes and the global and local features of miRNA and disease nodes. We proposed a prediction method, MDAP, to learn semantic features of miRNA and disease nodes based on various meta-paths, as well as node features from the entire heterogeneous network perspective, and node pair attributes. Firstly, for both the miRNA and disease nodes, node category-wise meta-paths were constructed to integrate the similarity and association connection relationships. Each target node has its specific neighbor nodes for each meta-path, and the neighbors of longer meta-paths constitute its higher-order neighbor topological structure. Secondly, we constructed a meta-path specific graph convolutional network module to integrate the features of higher-order neighbors and their topology, and then learned the semantic representations of nodes. Thirdly, for the entire miRNA-disease heterogeneous network, a global-aware graph convolutional autoencoder was built to learn the network-view feature representations of nodes. We also designed semantic-level and representation-level attentions to obtain informative semantic features and node representations. Finally, the strategy based on the parallel convolutional-deconvolutional neural networks were designed to enhance the local feature learning for a pair of miRNA and disease nodes. The experiment results showed that MDAP outperformed other state-of-the-art methods, and the ablation experiments demonstrated the effectiveness of MDAP's major innovations. MDAP's ability in discovering potential disease-related miRNAs was further analyzed by the case studies over three diseases.
Collapse
|
4
|
Wang X, Chen G, Hu H, Zhang M, Rao Y, Yue Z. PDDGCN: A Parasitic Disease-Drug Association Predictor Based on Multi-view Fusion Graph Convolutional Network. Interdiscip Sci 2024; 16:231-242. [PMID: 38294648 DOI: 10.1007/s12539-023-00600-z] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2023] [Revised: 12/20/2023] [Accepted: 12/21/2023] [Indexed: 02/01/2024]
Abstract
The precise identification of associations between diseases and drugs is paramount for comprehending the etiology and mechanisms underlying parasitic diseases. Computational approaches are highly effective in discovering and predicting disease-drug associations. However, the majority of these approaches primarily rely on link-based methodologies within distinct biomedical bipartite networks. In this study, we reorganized a fundamental dataset of parasitic disease-drug associations using the latest databases, and proposed a prediction model called PDDGCN, based on a multi-view graph convolutional network. To begin with, we fused similarity networks with binary networks to establish multi-view heterogeneous networks. We utilized neighborhood information aggregation layers to refine node embeddings within each view of the multi-view heterogeneous networks, leveraging inter- and intra-domain message passing to aggregate information from neighboring nodes. Subsequently, we integrated multiple embeddings from each view and fed them into the ultimate discriminator. The experimental results demonstrate that PDDGCN outperforms five state-of-the-art methods and four compared machine learning algorithms. Additionally, case studies have substantiated the effectiveness of PDDGCN in identifying associations between parasitic diseases and drugs. In summary, the PDDGCN model has the potential to facilitate the discovery of potential treatments for parasitic diseases and advance our comprehension of the etiology in this field. The source code is available at https://github.com/AhauBioinformatics/PDDGCN .
Collapse
Affiliation(s)
- Xiaosong Wang
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China
| | - Guojun Chen
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China
| | - Hang Hu
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China
| | - Min Zhang
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China
| | - Yuan Rao
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China.
| | - Zhenyu Yue
- School of Information and Artificial Intelligence, Anhui Provincial Engineering Research Center for Beidou Precision Agriculture Information, Key Laboratory of Agricultural Sensors for Ministry of Agriculture and Rural Affairs, Anhui Agricultural University, Hefei, 230036, Anhui, People's Republic of China.
| |
Collapse
|
5
|
Xuan P, Xiu J, Cui H, Zhang X, Nakaguchi T, Zhang T. Complementary feature learning across multiple heterogeneous networks and multimodal attribute learning for predicting disease-related miRNAs. iScience 2024; 27:108639. [PMID: 38303724 PMCID: PMC10831890 DOI: 10.1016/j.isci.2023.108639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/02/2023] [Accepted: 12/01/2023] [Indexed: 02/03/2024] Open
Abstract
Inferring the latent disease-related miRNAs is helpful for providing a deep insight into observing the disease pathogenesis. We propose a method, CMMDA, to encode and integrate the context relationship among multiple heterogeneous networks, the complementary information across these networks, and the pairwise multimodal attributes. We first established multiple heterogeneous networks according to the diverse disease similarities. The feature representation embedding the context relationship is formulated for each miRNA (disease) node based on transformer. We designed a co-attention fusion mechanism to encode the complementary information among multiple networks. In terms of a pair of miRNA and disease nodes, the pairwise attributes from multiple networks form a multimodal attribute embedding. A module based on depthwise separable convolution is constructed to enhance the encoding of the specific features from each modality. The experimental results and the ablation studies show that CMMDA's superior performance and the effectiveness of its major innovations.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Jinshan Xiu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3083, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
6
|
Li ZW, Wang QK, Yuan CA, Han PY, You ZH, Wang L. Predicting MiRNA-Disease Associations by Graph Representation Learning Based on Jumping Knowledge Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:2629-2638. [PMID: 35925844 DOI: 10.1109/tcbb.2022.3196394] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/15/2023]
Abstract
Growing studies have shown that miRNAs are inextricably linked with many human diseases, and a great deal of effort has been spent on identifying their potential associations. Compared with traditional experimental methods, computational approaches have achieved promising results. In this article, we propose a graph representation learning method to predict miRNA-disease associations. Specifically, we first integrate the verified miRNA-disease associations with the similarity information of miRNA and disease to construct a miRNA-disease heterogeneous graph. Then, we apply a graph attention network to aggregate the neighbor information of nodes in each layer, and then feed the representation of the hidden layer into the structure-aware jumping knowledge network to obtain the global features of nodes. The output features of miRNAs and diseases are then concatenated and fed into a fully connected layer to score the potential associations. Through five-fold cross-validation, the average AUC, accuracy and precision values of our model are 93.30%, 85.18% and 88.90%, respectively. In addition, for three case studies of the esophageal tumor, lymphoma and prostate tumor, 46, 45 and 45 of the top 50 miRNAs predicted by our model were confirmed by relevant databases. Overall, our method could provide a reliable alternative for miRNA-disease association prediction.
Collapse
|
7
|
Woicik A, Zhang M, Xu H, Mostafavi S, Wang S. Gemini: memory-efficient integration of hundreds of gene networks with high-order pooling. Bioinformatics 2023; 39:i504-i512. [PMID: 37387142 DOI: 10.1093/bioinformatics/btad247] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 07/01/2023] Open
Abstract
MOTIVATION The exponential growth of genomic sequencing data has created ever-expanding repositories of gene networks. Unsupervised network integration methods are critical to learn informative representations for each gene, which are later used as features for downstream applications. However, these network integration methods must be scalable to account for the increasing number of networks and robust to an uneven distribution of network types within hundreds of gene networks. RESULTS To address these needs, we present Gemini, a novel network integration method that uses memory-efficient high-order pooling to represent and weight each network according to its uniqueness. Gemini then mitigates the uneven network distribution through mixing up existing networks to create many new networks. We find that Gemini leads to more than a 10% improvement in F1 score, 15% improvement in micro-AUPRC, and 63% improvement in macro-AUPRC for human protein function prediction by integrating hundreds of networks from BioGRID, and that Gemini's performance significantly improves when more networks are added to the input network collection, while Mashup and BIONIC embeddings' performance deteriorates. Gemini thereby enables memory-efficient and informative network integration for large gene networks and can be used to massively integrate and analyze networks in other domains. AVAILABILITY AND IMPLEMENTATION Gemini can be accessed at: https://github.com/MinxZ/Gemini.
Collapse
Affiliation(s)
- Addie Woicik
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Mingxin Zhang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Hanwen Xu
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sara Mostafavi
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| | - Sheng Wang
- Paul G. Allen School of Computer Science and Engineering, University of Washington, Seattle, WA 98195, United States
| |
Collapse
|
8
|
Chen M, Deng Y, Li Z, Ye Y, He Z. KATZNCP: a miRNA-disease association prediction model integrating KATZ algorithm and network consistency projection. BMC Bioinformatics 2023; 24:229. [PMID: 37268893 DOI: 10.1186/s12859-023-05365-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/27/2022] [Accepted: 05/26/2023] [Indexed: 06/04/2023] Open
Abstract
BACKGROUND Clinical studies have shown that miRNAs are closely related to human health. The study of potential associations between miRNAs and diseases will contribute to a profound understanding of the mechanism of disease development, as well as human disease prevention and treatment. MiRNA-disease associations predicted by computational methods are the best complement to biological experiments. RESULTS In this research, a federated computational model KATZNCP was proposed on the basis of the KATZ algorithm and network consistency projection to infer the potential miRNA-disease associations. In KATZNCP, a heterogeneous network was initially constructed by integrating the known miRNA-disease association, integrated miRNA similarities, and integrated disease similarities; then, the KATZ algorithm was implemented in the heterogeneous network to obtain the estimated miRNA-disease prediction scores. Finally, the precise scores were obtained by the network consistency projection method as the final prediction results. KATZNCP achieved the reliable predictive performance in leave-one-out cross-validation (LOOCV) with an AUC value of 0.9325, which was better than the state-of-the-art comparable algorithms. Furthermore, case studies of lung neoplasms and esophageal neoplasms demonstrated the excellent predictive performance of KATZNCP. CONCLUSION A new computational model KATZNCP was proposed for predicting potential miRNA-drug associations based on KATZ and network consistency projections, which can effectively predict the potential miRNA-disease interactions. Therefore, KATZNCP can be used to provide guidance for future experiments.
Collapse
Affiliation(s)
- Min Chen
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yingwei Deng
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China.
| | - Zejun Li
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Yifan Ye
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| | - Ziyi He
- School of Computer Science and Technology, Hunan Institute of Technology, Hengyang, 421002, China
| |
Collapse
|
9
|
Li Z, Zhang Y, Bai Y, Xie X, Zeng L. IMC-MDA: Prediction of miRNA-disease association based on induction matrix completion. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2023; 20:10659-10674. [PMID: 37322953 DOI: 10.3934/mbe.2023471] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/17/2023]
Abstract
To comprehend the etiology and pathogenesis of many illnesses, it is essential to identify disease-associated microRNAs (miRNAs). However, there are a number of challenges with current computational approaches, such as the lack of "negative samples", that is, confirmed irrelevant miRNA-disease pairs, and the poor performance in terms of predicting miRNAs related with "isolated diseases", i.e. illnesses with no known associated miRNAs, which presents the need for novel computational methods. In this study, for the purpose of predicting the connection between disease and miRNA, an inductive matrix completion model was designed, referred to as IMC-MDA. In the model of IMC-MDA, for each miRNA-disease pair, the predicted marks are calculated by combining the known miRNA-disease connection with the integrated disease similarities and miRNA similarities. Based on LOOCV, IMC-MDA had an AUC of 0.8034, which shows better performance than previous methods. Furthermore, experiments have validated the prediction of disease-related miRNAs for three major human diseases: colon cancer, kidney cancer, and lung cancer.
Collapse
Affiliation(s)
- Zejun Li
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| | - Yuxiang Zhang
- School of Computer and Artificial Intelligence, Zhengzhou University, Zhengzhou, Henan, 450001, China
| | - Yuting Bai
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Xiaohui Xie
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| | - Lijun Zeng
- School of Computer and Information Science, Hunan Institute of Technology, Hengyang 412002, China
| |
Collapse
|
10
|
Song BF, Xu LZ, Jiang K, Cheng F. MiR-124-3p inhibits tumor progression in prostate cancer by targeting EZH2. Funct Integr Genomics 2023; 23:80. [PMID: 36884182 PMCID: PMC9995421 DOI: 10.1007/s10142-023-00991-8] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/08/2022] [Revised: 02/05/2023] [Accepted: 02/16/2023] [Indexed: 03/09/2023]
Abstract
Prostate cancer (PCa) is widespread cancer with significant morbidity and mortality rates. MicroRNAs (miRNAs) have been identified as important post-transcriptional modulators in various malignancies. This study investigated the miR-124-3p effect on PCa cell proliferation, infiltration, and apoptosis. EZH2 and miR-124-3p expression levels were measured in PCa tissues. PCa cell lines DU145 and PC3 were transfected with miR-124-3p inhibitors or analogs. EZH2 and miR-124-3p linkage was validated by conducting the luciferase enzyme reporter test. The cell viability and apoptosis were assessed by flow cytometry and MTT test. Cell movement was noted during infiltration using transwell assays. EZH2, AKT, and mTOR contents were assessed using qRT-PCR and western blotting. In clinical PCa specimens, miR-124-3p and EZH2 contents were inversely correlated. Further research has demonstrated that EZH2 is the miR-124-3p direct target. Furthermore, miR-124-3p overexpression reduced EZH2 levels and lowered cell viability, infiltration, and promoted cell death, whereas miR-124-3p silencing had the opposite effect. Overexpression of miR-124-3p decreased the phosphorylation level of AKT and mTOR, whereas miR-124-3p downregulation produced the opposite result. Our findings depict that miR-124-3p prevents PCa proliferative and invasive processes while promoting apoptosis by targeting EZH2.
Collapse
Affiliation(s)
- Bao-Feng Song
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei Province, People's Republic of China
| | - Li-Zhe Xu
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei Province, People's Republic of China
| | - Kun Jiang
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei Province, People's Republic of China.
| | - Fan Cheng
- Department of Urology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei Province, People's Republic of China.
| |
Collapse
|
11
|
Tian Z, Fang H, Teng Z, Ye Y. GOGCN: Graph Convolutional Network on Gene Ontology for Functional Similarity Analysis of Genes. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1053-1064. [PMID: 35687647 DOI: 10.1109/tcbb.2022.3181300] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
The measurement of gene functional similarity plays a critical role in numerous biological applications, such as gene clustering, the construction of gene similarity networks. However, most existing approaches still rely heavily on traditional computational strategies, which are not guaranteed to achieve satisfactory performance. In this study, we propose a novel computational approach called GOGCN to measure gene functional similarity by modeling the Gene Ontology (GO) through Graph Convolutional Network (GCN). GOGCN is a graph-based approach that performs sufficient representation learning for terms and relations in the GO graph. First, GOGCN employs the GCN-based knowledge graph embedding (KGE) model to learn vector representations (i.e., embeddings) for all entities (i.e., terms). Second, GOGCN calculates the semantic similarity between two terms based on their corresponding vector representations. Finally, GOGCN estimates gene functional similarity by making use of the pair-wise strategy. During the representation learning period, GOGCN promotes semantic interaction between terms through GCN, thereby capturing the rich structural information of the GO graph. Further experimental results on various datasets suggest that GOGCN is superior to the other state-of-the-art approaches, which shows its reliability and effectiveness.
Collapse
|
12
|
Zhao X, Wu J, Zhao X, Yin M. Multi-view contrastive heterogeneous graph attention network for lncRNA-disease association prediction. Brief Bioinform 2023; 24:6931723. [PMID: 36528809 DOI: 10.1093/bib/bbac548] [Citation(s) in RCA: 11] [Impact Index Per Article: 5.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Revised: 10/23/2022] [Accepted: 11/11/2022] [Indexed: 12/23/2022] Open
Abstract
MOTIVATION Exploring the potential long noncoding RNA (lncRNA)-disease associations (LDAs) plays a critical role for understanding disease etiology and pathogenesis. Given the high cost of biological experiments, developing a computational method is a practical necessity to effectively accelerate experimental screening process of candidate LDAs. However, under the high sparsity of LDA dataset, many computational models hardly exploit enough knowledge to learn comprehensive patterns of node representations. Moreover, although the metapath-based GNN has been recently introduced into LDA prediction, it discards intermediate nodes along the meta-path and results in information loss. RESULTS This paper presents a new multi-view contrastive heterogeneous graph attention network (GAT) for lncRNA-disease association prediction, MCHNLDA for brevity. Specifically, MCHNLDA firstly leverages rich biological data sources of lncRNA, gene and disease to construct two-view graphs, feature structural graph of feature schema view and lncRNA-gene-disease heterogeneous graph of network topology view. Then, we design a cross-contrastive learning task to collaboratively guide graph embeddings of the two views without relying on any labels. In this way, we can pull closer the nodes of similar features and network topology, and push other nodes away. Furthermore, we propose a heterogeneous contextual GAT, where long short-term memory network is incorporated into attention mechanism to effectively capture sequential structure information along the meta-path. Extensive experimental comparisons against several state-of-the-art methods show the effectiveness of proposed framework.The code and data of proposed framework is freely available at https://github.com/zhaoxs686/MCHNLDA.
Collapse
Affiliation(s)
- Xiaosa Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Jun Wu
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Xiaowei Zhao
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| | - Minghao Yin
- School of Information Science and Technology, Northeast Normal University, Changchun 130117, China
| |
Collapse
|
13
|
Zhou F, Yin MM, Zhao JX, Shang J, Liu JX. A Method Based On Dual-Network Information Fusion to Predict MiRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:52-60. [PMID: 34882558 DOI: 10.1109/tcbb.2021.3133006] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
MicroRNAs (miRNAs) are single-stranded small RNAs. An increasing number of studies have shown that miRNAs play a vital role in many important biological processes. However, some experimental methods to predict unknown miRNA-disease associations (MDAs) are time-consuming and costly. Only a small percentage of MDAs are verified by researchers. Therefore, there is a great need for high-speed and efficient methods to predict novel MDAs. In this paper, a new computational method based on Dual-Network Information Fusion (DNIF) is developed to predict potential MDAs. Specifically, on the one hand, two enhanced sub-models are integrated to reconstruct an effective prediction framework; on the other hand, the prediction performance of the algorithm is improved by fully fusing multiple omics data information, including validated miRNA-disease associations network, miRNA functional similarity, disease semantic similarity and Gaussian interaction profile (GIP) kernel network associations. As a result, DNIF achieves the excellent performance under situation of 5-fold cross validation (average AUC of 0.9571). In the cases study of three important human diseases, our model has achieved satisfactory performance in predicting potential miRNAs for certain diseases. The reliable experimental results demonstrate that DNIF could serve as an effective calculation method to accelerate the identification of MDAs.
Collapse
|
14
|
Liao Q, Ye Y, Li Z, Chen H, Zhuo L. Prediction of miRNA-disease associations in microbes based on graph convolutional networks and autoencoders. Front Microbiol 2023; 14:1170559. [PMID: 37187536 PMCID: PMC10175670 DOI: 10.3389/fmicb.2023.1170559] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/21/2023] [Accepted: 03/21/2023] [Indexed: 05/17/2023] Open
Abstract
MicroRNAs (miRNAs) are short RNA molecular fragments that regulate gene expression by targeting and inhibiting the expression of specific RNAs. Due to the fact that microRNAs affect many diseases in microbial ecology, it is necessary to predict microRNAs' association with diseases at the microbial level. To this end, we propose a novel model, termed as GCNA-MDA, where dual-autoencoder and graph convolutional network (GCN) are integrated to predict miRNA-disease association. The proposed method leverages autoencoders to extract robust representations of miRNAs and diseases and meantime exploits GCN to capture the topological information of miRNA-disease networks. To alleviate the impact of insufficient information for the original data, the association similarity and feature similarity data are combined to calculate a more complete initial basic vector of nodes. The experimental results on the benchmark datasets demonstrate that compared with the existing representative methods, the proposed method has achieved the superior performance and its precision reaches up to 0.8982. These results demonstrate that the proposed method can serve as a tool for exploring miRNA-disease associations in microbial environments.
Collapse
Affiliation(s)
- Qingquan Liao
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Yuxiang Ye
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
| | - Zihang Li
- School of Computing and Data Science, Xiamen University Malaysia, Sepang, Selangor, Malaysia
| | - Hao Chen
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
- *Correspondence: Hao Chen
| | - Linlin Zhuo
- School of Data Science and Artificial Intelligence, Wenzhou University of Technology, Wenzhou, China
- Linlin Zhuo
| |
Collapse
|
15
|
Huang L, Zhang L, Chen X. Updated review of advances in microRNAs and complex diseases: experimental results, databases, webservers and data fusion. Brief Bioinform 2022; 23:6696143. [PMID: 36094095 DOI: 10.1093/bib/bbac397] [Citation(s) in RCA: 55] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/20/2022] [Revised: 07/19/2022] [Accepted: 08/15/2022] [Indexed: 12/14/2022] Open
Abstract
MicroRNAs (miRNAs) are gene regulators involved in the pathogenesis of complex diseases such as cancers, and thus serve as potential diagnostic markers and therapeutic targets. The prerequisite for designing effective miRNA therapies is accurate discovery of miRNA-disease associations (MDAs), which has attracted substantial research interests during the last 15 years, as reflected by more than 55 000 related entries available on PubMed. Abundant experimental data gathered from the wealth of literature could effectively support the development of computational models for predicting novel associations. In 2017, Chen et al. published the first-ever comprehensive review on MDA prediction, presenting various relevant databases, 20 representative computational models, and suggestions for building more powerful ones. In the current review, as the continuation of the previous study, we revisit miRNA biogenesis, detection techniques and functions; summarize recent experimental findings related to common miRNA-associated diseases; introduce recent updates of miRNA-relevant databases and novel database releases since 2017, present mainstream webservers and new webserver releases since 2017 and finally elaborate on how fusion of diverse data sources has contributed to accurate MDA prediction.
Collapse
Affiliation(s)
- Li Huang
- Academy of Arts and Design, Tsinghua University, Beijing, 10084, China.,The Future Laboratory, Tsinghua University, Beijing, 10084, China
| | - Li Zhang
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China
| | - Xing Chen
- School of Information and Control Engineering, China University of Mining and Technology, Xuzhou, 221116, China.,Artificial Intelligence Research Institute, China University of Mining and Technology, Xuzhou, 221116, China
| |
Collapse
|
16
|
Ma J, Zhang L, Li S, Liu H. BRPCA: Bounded Robust Principal Component Analysis to Incorporate Similarity Network for N7-Methylguanosine(m 7G) Site-Disease Association Prediction. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3295-3306. [PMID: 34469307 DOI: 10.1109/tcbb.2021.3109055] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recent studies have revealed that N7-methylguanosine(m7G) plays a pivotal role in various biological processes and disease pathogenesis. To date, transcriptome-wide m7G modification sites have been identified by high-throughput sequencing approaches, and some related information has been recorded in a few biological databases. However, the mechanism of site action in disease remains uncharted. Wet experiments can help identify true m7G sites with high confidence, but it is time-consuming to find the true ones in such a large number of sites, which will also cost too much. Thus, computational methods are emergently needed to predict the associations between m7G sites and various diseases, thus help to uncover potential active sites for specific diseases. In this article, we proposed a bounded robust principal component analysis (BRPCA) method to predict unknown m7G-disease association based on similarity information. Importantly, BRPCA tolerates the noise and redundancy existing in association and similarity information. Moreover, a suitable bounded constraint is incorporated into BRPCA to ensure that the predicted association scores locate in a meaningful interval. The extensive experiments demonstrate the superiority and robustness of the BRPCA.
Collapse
|
17
|
Li L, Gao Z, Zheng CH, Qi R, Wang YT, Ni JC. Predicting miRNA-Disease Association Based on Improved Graph Regression. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3604-3613. [PMID: 34757912 DOI: 10.1109/tcbb.2021.3127017] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Recently, as a growing number of associations between microRNAs (miRNAs) and diseases are discovered, researchers gradually realize that miRNAs are closely related to several complicated biological processes and human diseases. Hence, it is especially important to construct availably models to infer associations between miRNAs and diseases. In this study, we presented Improved Graph Regression for miRNA-Disease Association Prediction (IGRMDA) to observe potential relationship between miRNAs and diseases. In order to reduce the inherent noise existing in the acquired biological datasets, we utilized matrix decomposition algorithm to process miRNA functional similarity and disease semantic similarity and then combining them with existing similarity information to obtain final miRNA similarity data and disease similarity data. Then, we applied miRNA-disease association data, miRNA similarity data and disease similarity data to form corresponding latent spaces. Furthermore, we performed improved graph regression algorithm in latent spaces, which included miRNA-disease association space, miRNA similarity space and disease similarity space. Non-negative matrix factorization and partial least squares were used in the graph regression process to obtain important related attributes. The cross validation experiments and case studies were also implemented to prove the effectiveness of IGRMDA, which showed that IGRMDA could predict potential associations between miRNAs and diseases.
Collapse
|
18
|
Chen B, Zhang J, Wang T, Shao C, Miao L, Zhang S, Shang X. Investigating the evolution process of lung adenocarcinoma via random walk and dynamic network analysis. Front Genet 2022; 13:953801. [PMID: 36246662 PMCID: PMC9559577 DOI: 10.3389/fgene.2022.953801] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Accepted: 09/05/2022] [Indexed: 11/30/2022] Open
Abstract
Lung adenocarcinoma (LUAD) is a typical disease regarded as having multi-stage progression. However, many existing methods often ignore the critical differences among these stages, thereby limiting their effectiveness for discovering key biological molecules and biological functions as signals at each stage. In this study, we propose a method to discover the evolution between biological molecules and biological functions by investigating the multi-stage biological molecules of LUAD. The method is based on the random walk algorithm and the Monte Carlo method to generate clusters as the modules, which were used as subgraphs of the differentiated biological molecules network in each stage. The connection between modules of adjacent stages is based on the measurement of the Jaccard coefficient. The online gene set enrichment analysis tool (DAVID) was used to obtain biological functions corresponding to the individual important modules. The core evolution network was constructed by combining the aforementioned two networks. Since the networks here are all dynamic, we also propose a strategy to visualize the dynamic information together in one network. Eventually, 12 core modules and 11 core biological functions were found through such evolutionary analyses. Among the core biological functions that we obtained, six functions are related to the disease, the biological function of neutrophil chemotaxis is not directly associated with LUAD but can serve as a predictor, two functions may serve as a predictive signal, and two functions need to be verified through more biological evidence. Compared with two alternative design methods, the method proposed in this study performed more efficiently.
Collapse
Affiliation(s)
- Bolin Chen
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Jinlei Zhang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Teng Wang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Ci Shao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Lijun Miao
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Shengli Zhang
- School of Information Technology, Minzu Normal University of Xingyi, Xingyi, China
| | - Xuequn Shang
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| |
Collapse
|
19
|
Li M, Fan Y, Zhang Y, Lv Z. Using Sequence Similarity Based on CKSNP Features and a Graph Neural Network Model to Identify miRNA-Disease Associations. Genes (Basel) 2022; 13:1759. [PMID: 36292644 PMCID: PMC9602123 DOI: 10.3390/genes13101759] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/04/2022] [Revised: 09/25/2022] [Accepted: 09/26/2022] [Indexed: 01/12/2024] Open
Abstract
Among many machine learning models for analyzing the relationship between miRNAs and diseases, the prediction results are optimized by establishing different machine learning models, and less attention is paid to the feature information contained in the miRNA sequence itself. This study focused on the impact of the different feature information of miRNA sequences on the relationship between miRNA and disease. It was found that when the graph neural network used was the same and the miRNA features based on the K-spacer nucleic acid pair composition (CKSNAP) feature were adopted, a better graph neural network prediction model of miRNA-disease relationship could be built (AUC = 93.71%), which was 0.15% greater than the best model in the literature based on the same benchmark dataset. The optimized model was also used to predict miRNAs related to lung tumors, esophageal tumors, and kidney tumors, and 47, 47, and 37 of the top 50 miRNAs related to three diseases predicted separately by the model were consistent with descriptions in the wet experiment validation database (dbDEMC).
Collapse
Affiliation(s)
- Mingxin Li
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yu Fan
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| | - Yiting Zhang
- College of Biology, Southwest Jiaotong University, Chengdu 611756, China
- College of Biology, Georgia State University, Atlanta, GA 30302-3965, USA
| | - Zhibin Lv
- College of Biomedical Engineering, Sichuan University, Chengdu 610065, China
| |
Collapse
|
20
|
GamalEl Din SF, Motawi AT, Rashed LA, Elghobary H, Saad HM, Ismail MM, Abdel‐latif HF. Study of the role of microRNAs 16 and 135a in patients with lifelong premature ejaculation receiving fluoxetine daily for 3 months: A prospective case control study. Andrologia 2022; 54:e14549. [DOI: 10.1111/and.14549] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2022] [Revised: 06/26/2022] [Accepted: 07/13/2022] [Indexed: 11/29/2022] Open
Affiliation(s)
- Sameh Fayek GamalEl Din
- Department of Andrology, Sexology and STDs, Kasr AlAiny Faculty of Medicine Cairo University Giza Egypt
| | - Ahmad Tarek Motawi
- Department of Andrology, Sexology and STDs, Kasr AlAiny Faculty of Medicine Cairo University Giza Egypt
| | - Laila Ahmed Rashed
- Biochemistry Department, Kasr AlAiny Faculty of Medicine Cairo University Giza Egypt
| | - Hany Elghobary
- Chemical Pathology Department, Kasr AlAiny Faculty of Medicine Cairo University Giza Egypt
| | - Hany Mohammed Saad
- Department of Andrology, Faculty of Medicine Suez Canal University Ismailia Egypt
| | | | - Hesham Fouad Abdel‐latif
- Department of Andrology, Sexology and STDs, Kasr AlAiny Faculty of Medicine Cairo University Giza Egypt
| |
Collapse
|
21
|
Zhang W, Hou J, Liu B. iPiDA-LTR: Identifying piwi-interacting RNA-disease associations based on Learning to Rank. PLoS Comput Biol 2022; 18:e1010404. [PMID: 35969645 PMCID: PMC9410559 DOI: 10.1371/journal.pcbi.1010404] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/27/2022] [Revised: 08/25/2022] [Accepted: 07/18/2022] [Indexed: 12/01/2022] Open
Abstract
Piwi-interacting RNAs (piRNAs) are regarded as drug targets and biomarkers for the diagnosis and therapy of diseases. However, biological experiments cost substantial time and resources, and the existing computational methods only focus on identifying missing associations between known piRNAs and diseases. With the fast development of biological experiments, more and more piRNAs are detected. Therefore, the identification of piRNA-disease associations of newly detected piRNAs has significant theoretical value and practical significance on pathogenesis of diseases. In this study, the iPiDA-LTR predictor is proposed to identify associations between piRNAs and diseases based on Learning to Rank. The iPiDA-LTR predictor not only identifies the missing associations between known piRNAs and diseases, but also detects diseases associated with newly detected piRNAs. Experimental results demonstrate that iPiDA-LTR effectively predicts piRNA-disease associations outperforming the other related methods.
Collapse
Affiliation(s)
- Wenxiang Zhang
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
22
|
Yang M, Huang ZA, Gu W, Han K, Pan W, Yang X, Zhu Z. Prediction of biomarker-disease associations based on graph attention network and text representation. Brief Bioinform 2022; 23:6651308. [PMID: 35901464 DOI: 10.1093/bib/bbac298] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/03/2022] [Revised: 06/28/2022] [Accepted: 06/30/2022] [Indexed: 02/06/2023] Open
Abstract
MOTIVATION The associations between biomarkers and human diseases play a key role in understanding complex pathology and developing targeted therapies. Wet lab experiments for biomarker discovery are costly, laborious and time-consuming. Computational prediction methods can be used to greatly expedite the identification of candidate biomarkers. RESULTS Here, we present a novel computational model named GTGenie for predicting the biomarker-disease associations based on graph and text features. In GTGenie, a graph attention network is utilized to characterize diverse similarities of biomarkers and diseases from heterogeneous information resources. Meanwhile, a pretrained BERT-based model is applied to learn the text-based representation of biomarker-disease relation from biomedical literature. The captured graph and text features are then integrated in a bimodal fusion network to model the hybrid entity representation. Finally, inductive matrix completion is adopted to infer the missing entries for reconstructing relation matrix, with which the unknown biomarker-disease associations are predicted. Experimental results on HMDD, HMDAD and LncRNADisease data sets showed that GTGenie can obtain competitive prediction performance with other state-of-the-art methods. AVAILABILITY The source code of GTGenie and the test data are available at: https://github.com/Wolverinerine/GTGenie.
Collapse
Affiliation(s)
- Minghao Yang
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| | - Zhi-An Huang
- Center for Computer Science and Information Technology, City University of Hong Kong Dongguan Research Institute, Dongguan, China
| | - Wenhao Gu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China.,GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Kun Han
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Wenying Pan
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Xiao Yang
- GeneGenieDx Corp, 160 E Tasman Dr, San Jose, CA 95134
| | - Zexuan Zhu
- College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, 518000, China
| |
Collapse
|
23
|
Ni J, Li L, Wang Y, Ji C, Zheng C. MDSCMF: Matrix Decomposition and Similarity-Constrained Matrix Factorization for miRNA-Disease Association Prediction. Genes (Basel) 2022; 13:1021. [PMID: 35741782 PMCID: PMC9223216 DOI: 10.3390/genes13061021] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/01/2022] [Revised: 06/01/2022] [Accepted: 06/02/2022] [Indexed: 11/16/2022] Open
Abstract
MicroRNAs (miRNAs) are small non-coding RNAs that are related to a number of complicated biological processes, and numerous studies have demonstrated that miRNAs are closely associated with many human diseases. In this study, we present a matrix decomposition and similarity-constrained matrix factorization (MDSCMF) to predict potential miRNA-disease associations. First of all, we utilized a matrix decomposition (MD) algorithm to get rid of outliers from the miRNA-disease association matrix. Then, miRNA similarity was determined by utilizing similarity kernel fusion (SKF) to integrate miRNA function similarity and Gaussian interaction profile (GIP) kernel similarity, and disease similarity was determined by utilizing SKF to integrate disease semantic similarity and GIP kernel similarity. Furthermore, we added L2 regularization terms and similarity constraint terms to non-negative matrix factorization to form a similarity-constrained matrix factorization (SCMF) algorithm, which was applied to make prediction. MDSCMF achieved AUC values of 0.9488, 0.9540, and 0.8672 based on fivefold cross-validation (5-CV), global leave-one-out cross-validation (global LOOCV), and local leave-one-out cross-validation (local LOOCV), respectively. Case studies on three common human diseases were also implemented to demonstrate the prediction ability of MDSCMF. All experimental results confirmed that MDSCMF was effective in predicting underlying associations between miRNAs and diseases.
Collapse
Affiliation(s)
- Jiancheng Ni
- Network Information Center, Qufu Normal University, Qufu 273165, China;
| | - Lei Li
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Y.W.); (C.J.)
| | - Yutian Wang
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Y.W.); (C.J.)
| | - Cunmei Ji
- School of Cyber Science and Engineering, Qufu Normal University, Qufu 273165, China; (Y.W.); (C.J.)
| | - Chunhou Zheng
- School of Artifial Intelligence, Anhui University, Hefei 230601, China
| |
Collapse
|
24
|
Xu H, Hu X, Yan X, Zhong W, Yin D, Gai Y. Exploring noncoding RNAs in thyroid cancer using a graph convolutional network approach. Comput Biol Med 2022; 145:105447. [DOI: 10.1016/j.compbiomed.2022.105447] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Revised: 03/20/2022] [Accepted: 03/21/2022] [Indexed: 12/01/2022]
|
25
|
Zhao S, Pan Q, Zou Q, Ju Y, Shi L, Su X. Identifying and Classifying Enhancers by Dinucleotide-Based Auto-Cross Covariance and Attention-Based Bi-LSTM. COMPUTATIONAL AND MATHEMATICAL METHODS IN MEDICINE 2022; 2022:7518779. [PMID: 35422876 PMCID: PMC9005296 DOI: 10.1155/2022/7518779] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 03/12/2022] [Indexed: 11/17/2022]
Abstract
Enhancers are a class of noncoding DNA elements located near structural genes. In recent years, their identification and classification have been the focus of research in the field of bioinformatics. However, due to their high free scattering and position variability, although the performance of the prediction model has been continuously improved, there is still a lot of room for progress. In this paper, density-based spatial clustering of applications with noise (DBSCAN) was used to screen the physicochemical properties of dinucleotides to extract dinucleotide-based auto-cross covariance (DACC) features; then, the features are reduced by feature selection Python toolkit MRMD 2.0. The reduced features are input into the random forest to identify enhancers. The enhancer classification model was built by word2vec and attention-based Bi-LSTM. Finally, the accuracies of our enhancer identification and classification models were 77.25% and 73.50%, respectively, and the Matthews' correlation coefficients (MCCs) were 0.5470 and 0.4881, respectively, which were better than the performance of most predictors.
Collapse
Affiliation(s)
- Shulin Zhao
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Qingfeng Pan
- General Hospital of Heilongjiang Province Land Reclamation Bureau, Harbin, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, China
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, Zhejiang, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Lei Shi
- Department of Spine Surgery, Changzheng Hospital, Naval Medical University, Shanghai, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
26
|
Chen Y, Wang Y, Ding Y, Su X, Wang C. RGCNCDA: Relational graph convolutional network improves circRNA-disease association prediction by incorporating microRNAs. Comput Biol Med 2022; 143:105322. [PMID: 35217342 DOI: 10.1016/j.compbiomed.2022.105322] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2022] [Revised: 02/11/2022] [Accepted: 02/13/2022] [Indexed: 12/21/2022]
Abstract
Recently, a large number of studies have indicated that circRNAs with covalently closed loops play important roles in biological processes and have potential as diagnostic biomarkers. Therefore, research on the circRNA-disease relationship is helpful in disease diagnosis and treatment. However, traditional biological verification methods require considerable labor and time costs. In this paper, we propose a new computational method (RGCNCDA) to predict circRNA-disease associations based on relational graph convolutional networks (R-GCNs). The method first integrates the circRNA similarity network, miRNA similarity network, disease similarity network and association networks among them to construct a global heterogeneous network. Then, it employs the random walk with restart (RWR) and principal component analysis (PCA) models to learn low-dimensional and high-order information from the global heterogeneous network as the topological features. Finally, a prediction model based on an R-GCN encoder and a DistMult decoder is built to predict the potential disease-associated circRNA. The predicted results demonstrate that RGCNCDA performs significantly better than the other six state-of-the-art methods in a 5-fold cross validation. Furthermore, the case study illustrates that RGCNCDA can effectively discover potential circRNA-disease associations.
Collapse
Affiliation(s)
- Yaojia Chen
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Yanpeng Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| | - Yijie Ding
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China
| | - Xi Su
- Foshan Maternity & Child Healthcare Hospital, Southern Medical University, Foshan, China.
| | - Chunyu Wang
- Faculty of Computing, Harbin Institute of Technology, Harbin, China.
| |
Collapse
|
27
|
Cai L, Gao M, Ren X, Fu X, Xu J, Wang P, Chen Y. MILNP: Plant lncRNA-miRNA Interaction Prediction Based on Improved Linear Neighborhood Similarity and Label Propagation. FRONTIERS IN PLANT SCIENCE 2022; 13:861886. [PMID: 35401586 PMCID: PMC8990282 DOI: 10.3389/fpls.2022.861886] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 01/25/2022] [Accepted: 02/21/2022] [Indexed: 06/14/2023]
Abstract
Knowledge of the interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) is the basis of understanding various biological activities and designing new drugs. Previous computational methods for predicting lncRNA-miRNA interactions lacked for plants, and they suffer from various limitations that affect the prediction accuracy and their applicability. Research on plant lncRNA-miRNA interactions is still in its infancy. In this paper, we propose an accurate predictor, MILNP, for predicting plant lncRNA-miRNA interactions based on improved linear neighborhood similarity measurement and linear neighborhood propagation algorithm. Specifically, we propose a novel similarity measure based on linear neighborhood similarity from multiple similarity profiles of lncRNAs and miRNAs and derive more precise neighborhood ranges so as to escape the limits of the existing methods. We then simultaneously update the lncRNA-miRNA interactions predicted from both similarity matrices based on label propagation. We comprehensively evaluate MILNP on the latest plant lncRNA-miRNA interaction benchmark datasets. The results demonstrate the superior performance of MILNP than the most up-to-date methods. What's more, MILNP can be leveraged for isolated plant lncRNAs (or miRNAs). Case studies suggest that MILNP can identify novel plant lncRNA-miRNA interactions, which are confirmed by classical tools. The implementation is available on https://github.com/HerSwain/gra/tree/MILNP.
Collapse
Affiliation(s)
| | | | | | - Xiangzheng Fu
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | | - Peng Wang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | | |
Collapse
|
28
|
Zhang Y, Chen L, Li S. CIPHER-SC: Disease-Gene Association Inference Using Graph Convolution on a Context-Aware Network With Single-Cell Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:819-829. [PMID: 32809944 DOI: 10.1109/tcbb.2020.3017547] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
Inference of disease-gene associations helps unravel the pathogenesis of diseases and contributes to the treatment. Although many machine learning-based methods have been developed to predict causative genes, accurate association inference remains challenging. One major reason is the inaccurate feature selection and accumulation of error brought by commonly used multi-stage training architecture. In addition, the existing methods do not incorporate cell-type-specific information, thus fail to study gene functions at a higher resolution. Therefore, we introduce single-cell transcriptome data and construct a context-aware network to unbiasedly integrate all data sources. Then we develop a graph convolution-based approach named CIPHER-SC to realize a complete end-to-end learning architecture. Our approach outperforms four state-of-the-art approaches in five-fold cross-validations on three distinct test sets with the best AUC of 0.9501, demonstrating its stable ability either to predict the novel genes or to predict with genetic basis. The ablation study shows that our complete end-to-end design and unbiased data integration boost the performance from 0.8727 to 0.9443 in AUC. The addition of single-cell data further improves the prediction accuracy and makes our results be enriched for cell-type-specific genes. These results confirm the ability of CIPHER-SC to discover reliable disease genes. Our implementation is available at http://github.com/YidingZhang117/CIPHER-SC.
Collapse
|
29
|
Li Z, Zhong T, Huang D, You ZH, Nie R. Hierarchical graph attention network for miRNA-disease association prediction. Mol Ther 2022; 30:1775-1786. [PMID: 35121109 PMCID: PMC9077381 DOI: 10.1016/j.ymthe.2022.01.041] [Citation(s) in RCA: 43] [Impact Index Per Article: 14.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2021] [Revised: 12/29/2021] [Accepted: 01/28/2022] [Indexed: 11/25/2022] Open
Abstract
Many biological studies show that the mutation and abnormal expression of microRNAs (miRNAs) could cause a variety of diseases. As an important biomarker for disease diagnosis, miRNA is helpful to understand pathogenesis, and could promote the identification, diagnosis and treatment of diseases. However, the pathogenic mechanism how miRNAs affect these diseases has not been fully understood. Therefore, predicting the potential miRNA-disease associations is of great importance for the development of clinical medicine and drug research. In this study, we proposed a novel deep learning model based on hierarchical graph attention network for predicting miRNA-disease associations (HGANMDA). Firstly, we constructed a miRNA-disease-lncRNA heterogeneous graph based on known miRNA-disease associations, miRNA-lncRNA associations and disease-lncRNA associations. Secondly, the node-layer attention was applied to learn the importance of neighbor nodes based on different meta-paths. Thirdly, the semantic-layer attention was applied to learn the importance of different meta-paths. Finally, a bilinear decoder was employed to reconstruct the connections between miRNAs and diseases. The extensive experimental results indicated that our model achieved good performance and satisfactory results in predicting miRNA-disease associations.
Collapse
|
30
|
Li X, Ai H, Li B, Zhang C, Meng F, Ai Y. MIMRDA: A Method Incorporating the miRNA and mRNA Expression Profiles for Predicting miRNA-Disease Associations to Identify Key miRNAs (microRNAs). Front Genet 2022; 13:825318. [PMID: 35154284 PMCID: PMC8829120 DOI: 10.3389/fgene.2022.825318] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Accepted: 01/10/2022] [Indexed: 01/22/2023] Open
Abstract
Identifying cancer-related miRNAs (or microRNAs) that precisely target mRNAs is important for diagnosis and treatment of cancer. Creating novel methods to identify candidate miRNAs becomes an imminent Frontier of researches in the field. One major obstacle lies in the integration of the state-of-the-art databases. Here, we introduce a novel method, MIMRDA, which incorporates the miRNA and mRNA expression profiles for predicting miRNA-disease associations to identify key miRNAs. As a proof-of-principle study, we use the MIMRDA method to analyze TCGA datasets of 20 types (BLCA, BRCA, CESE, CHOL, COAD, ESCA, HNSC, KICH, KIRC, KIRP, LIHC, LUAD, LUSC, PAAD, PRAD, READ, SKCM, STAD, THCA and UCEC) of cancer, which identified hundreds of top-ranked miRNAs. Some (as Category 1) of them are endorsed by public databases including TCGA, miRTarBase, miR2Disease, HMDD, MISIM, ncDR and mTD; others (as Category 2) are supported by literature evidences. miR-21 (representing Category 1) and miR-1258 (representing Category 2) display the excellent characteristics of biomarkers in multi-dimensional assessments focusing on the function similarity analysis, overall survival analysis, and anti-cancer drugs’ sensitivity or resistance analysis. We compare the performance of the MIMRDA method over the Limma and SPIA packages, and estimate the accuracy of the MIMRDA method in classifying top-ranked miRNAs via the Random Forest simulation test. Our results indicate the superiority and effectiveness of the MIMRDA method, and recommend some top-ranked key miRNAs be potential biomarkers that warrant experimental validations.
Collapse
Affiliation(s)
- Xianbin Li
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Hannan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- Department of Electrical and Computer Engineering, The Grainger College of Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, United States
- National Center for Quality Supervision and Inspection of Automatic Equipment, National Center for Testing and Evaluation of Robots (Guangzhou), CRAT, SINOMACH-IT, Guangzhou, China
- *Correspondence: Yuncan Ai, ; Hannan Ai,
| | - Bizhou Li
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Chaohui Zhang
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Fanmei Meng
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
| | - Yuncan Ai
- State Key Laboratory for Biocontrol, School of Life Sciences, Sun Yat-sen University, Guangzhou, China
- *Correspondence: Yuncan Ai, ; Hannan Ai,
| |
Collapse
|
31
|
Predicting miRNA-Disease Association Based on Neural Inductive Matrix Completion with Graph Autoencoders and Self-Attention Mechanism. Biomolecules 2022; 12:biom12010064. [PMID: 35053212 PMCID: PMC8774034 DOI: 10.3390/biom12010064] [Citation(s) in RCA: 17] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Revised: 12/29/2021] [Accepted: 12/31/2021] [Indexed: 02/06/2023] Open
Abstract
Many studies have clarified that microRNAs (miRNAs) are associated with many human diseases. Therefore, it is essential to predict potential miRNA-disease associations for disease pathogenesis and treatment. Numerous machine learning and deep learning approaches have been adopted to this problem. In this paper, we propose a Neural Inductive Matrix completion-based method with Graph Autoencoders (GAE) and Self-Attention mechanism for miRNA-disease associations prediction (NIMGSA). Some of the previous works based on matrix completion ignore the importance of label propagation procedure for inferring miRNA-disease associations, while others cannot integrate matrix completion and label propagation effectively. Varying from previous studies, NIMGSA unifies inductive matrix completion and label propagation via neural network architecture, through the collaborative training of two graph autoencoders. This neural inductive matrix completion-based method is also an implementation of self-attention mechanism for miRNA-disease associations prediction. This end-to-end framework can strengthen the robustness and preciseness of both matrix completion and label propagation. Cross validations indicate that NIMGSA outperforms current miRNA-disease prediction methods. Case studies demonstrate that NIMGSA is competent in detecting potential miRNA-disease associations.
Collapse
|
32
|
Yu H, Dong W, Shi J. RANEDDI: Relation-aware network embedding for drug-drug interaction prediction. Inf Sci (N Y) 2022. [DOI: 10.1016/j.ins.2021.09.008] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/26/2022]
|
33
|
Guo Y, Ju Y, Chen D, Wang L. Research on the Computational Prediction of Essential Genes. Front Cell Dev Biol 2021; 9:803608. [PMID: 34938741 PMCID: PMC8685449 DOI: 10.3389/fcell.2021.803608] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 11/22/2021] [Indexed: 11/19/2022] Open
Abstract
Genes, the nucleotide sequences that encode a polypeptide chain or functional RNA, are the basic genetic unit controlling biological traits. They are the guarantee of the basic structures and functions in organisms, and they store information related to biological factors and processes such as blood type, gestation, growth, and apoptosis. The environment and genetics jointly affect important physiological processes such as reproduction, cell division, and protein synthesis. Genes are related to a wide range of phenomena including growth, decline, illness, aging, and death. During the evolution of organisms, there is a class of genes that exist in a conserved form in multiple species. These genes are often located on the dominant strand of DNA and tend to have higher expression levels. The protein encoded by it usually either performs very important functions or is responsible for maintaining and repairing these essential functions. Such genes are called persistent genes. Among them, the irreplaceable part of the body’s life activities is the essential gene. For example, when starch is the only source of energy, the genes related to starch digestion are essential genes. Without them, the organism will die because it cannot obtain enough energy to maintain basic functions. The function of the proteins encoded by these genes is thought to be fundamental to life. Nowadays, DNA can be extracted from blood, saliva, or tissue cells for genetic testing, and detailed genetic information can be obtained using the most advanced scientific instruments and technologies. The information gained from genetic testing is useful to assess the potential risks of disease, and to help determine the prognosis and development of diseases. Such information is also useful for developing personalized medication and providing targeted health guidance to improve the quality of life. Therefore, it is of great theoretical and practical significance to identify important and essential genes. In this paper, the research status of essential genes and the essential genome database of bacteria are reviewed, the computational prediction method of essential genes based on communication coding theory is expounded, and the significance and practical application value of essential genes are discussed.
Collapse
Affiliation(s)
- Yuxin Guo
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, China.,Key Laboratory of Computational Science and Application of Hainan Province, Haikou, China.,Key Laboratory of Data Science and Intelligence Education, Hainan Normal University, Ministry of Education, Haikou, China.,School of Mathematics and Statistics, Hainan Normal University, Haikou, China
| | - Ying Ju
- School of Informatics, Xiamen University, Xiamen, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, China
| | - Lihong Wang
- Beidahuang Industry Group General Hospital, Harbin, China
| |
Collapse
|
34
|
RFLMDA: A Novel Reinforcement Learning-Based Computational Model for Human MicroRNA-Disease Association Prediction. Biomolecules 2021; 11:biom11121835. [PMID: 34944479 PMCID: PMC8699433 DOI: 10.3390/biom11121835] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2021] [Revised: 12/01/2021] [Accepted: 12/02/2021] [Indexed: 11/23/2022] Open
Abstract
Numerous studies have confirmed that microRNAs play a crucial role in the research of complex human diseases. Identifying the relationship between miRNAs and diseases is important for improving the treatment of complex diseases. However, traditional biological experiments are not without restrictions. It is an urgent necessity for computational simulation to predict unknown miRNA-disease associations. In this work, we combine Q-learning algorithm of reinforcement learning to propose a RFLMDA model, three submodels CMF, NRLMF, and LapRLS are fused via Q-learning algorithm to obtain the optimal weight S. The performance of RFLMDA was evaluated through five-fold cross-validation and local validation. As a result, the optimal weight is obtained as S (0.1735, 0.2913, 0.5352), and the AUC is 0.9416. By comparing the experiments with other methods, it is proved that RFLMDA model has better performance. For better validate the predictive performance of RFLMDA, we use eight diseases for local verification and carry out case study on three common human diseases. Consequently, all the top 50 miRNAs related to Colorectal Neoplasms and Breast Neoplasms have been confirmed. Among the top 50 miRNAs related to Colon Neoplasms, Gastric Neoplasms, Pancreatic Neoplasms, Kidney Neoplasms, Esophageal Neoplasms, and Lymphoma, we confirm 47, 41, 49, 46, 46 and 48 miRNAs respectively.
Collapse
|
35
|
Nguyen VT, Le TTK, Nguyen TQV, Tran DH. Inferring miRNA-disease associations using collaborative filtering and resource allocation on a tripartite graph. BMC Med Genomics 2021; 14:225. [PMID: 34789252 PMCID: PMC8600685 DOI: 10.1186/s12920-021-01078-8] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/18/2021] [Accepted: 09/07/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Developing efficient and successful computational methods to infer potential miRNA-disease associations is urgently needed and is attracting many computer scientists in recent years. The reason is that miRNAs are involved in many important biological processes and it is tremendously expensive and time-consuming to do biological experiments to verify miRNA-disease associations. METHODS In this paper, we proposed a new method to infer miRNA-disease associations using collaborative filtering and resource allocation algorithms on a miRNA-disease-lncRNA tripartite graph. It combined the collaborative filtering algorithm in CFNBC model to solve the problem of imbalanced data and the method for association prediction established multiple types of known associations among multiple objects presented in TPGLDA model. RESULTS The experimental results showed that our proposed method achieved a reliable performance with Area Under Roc Curve (AUC) and Area Under Precision-Recall Curve (AUPR) values of 0.9788 and 0.9373, respectively, under fivefold-cross-validation experiments. It outperformed than some other previous methods such as DCSMDA and TPGLDA. Furthermore, it demonstrated the ability to derive new associations between miRNAs and diseases among 8, 19 and 14 new associations out of top 40 predicted associations in case studies of Prostatic Neoplasms, Heart Failure, and Glioma diseases, respectively. All of these new predicted associations have been confirmed by recent literatures. Besides, it could discover new associations for new diseases (or miRNAs) without any known associations as demonstrated in the case study of Open-angle glaucoma disease. CONCLUSION With the reliable performance to infer new associations between miRNAs and diseases as well as to discover new associations for new diseases (or miRNAs) without any known associations, our proposed method can be considered as a powerful tool to infer miRNA-disease associations.
Collapse
Affiliation(s)
- Van Tinh Nguyen
- Faculty of Information Technology, Hanoi University of Industry, Hanoi, Vietnam
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
| | - Thi Tu Kien Le
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam
| | - Tran Quoc Vinh Nguyen
- Faculty of Information Technology, The University of Da Nang - University of Science and Education, Da Nang, Vietnam
| | - Dang Hung Tran
- Faculty of Information Technology, Hanoi National University of Education, Hanoi, Vietnam.
| |
Collapse
|
36
|
Graph convolutional network approach to discovering disease-related circRNA-miRNA-mRNA axes. Methods 2021; 198:45-55. [PMID: 34758394 DOI: 10.1016/j.ymeth.2021.10.006] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2021] [Revised: 10/07/2021] [Accepted: 10/19/2021] [Indexed: 02/05/2023] Open
Abstract
Non-coding RNAs are gaining prominence in biology and medicine, as they play major roles in cellular homeostasis among which the circRNA-miRNA-mRNA axes are involved in a series of disease-related pathways, such as apoptosis, cell invasion and metastasis. Recently, many computational methods have been developed for the prediction of the relationship between ncRNAs and diseases, which can alleviate the time-consuming and labor-intensive exploration involved with biological experiments. However, these methods handle ncRNAs separately, ignoring the impact of the interactions among ncRNAs on the diseases. In this paper we present a novel approach to discovering disease-related circRNA-miRNA-mRNA axes from the disease-RNA information network. Our method, using graph convolutional network, learns the characteristic representation of each biological entity by propagating and aggregating local neighbor information based on the global structure of the network. The approach is evaluated using the real-world datasets and the results show that it outperforms other state-of-the-art baselines on most of the metrics.
Collapse
|
37
|
Zhao X, Yang Y, Yin M. MHRWR: Prediction of lncRNA-Disease Associations Based on Multiple Heterogeneous Networks. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2577-2585. [PMID: 32086216 DOI: 10.1109/tcbb.2020.2974732] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
In the last few years, accumulating evidences had demonstrated that long non-coding RNAs (lncRNAs) participated in the regulation of target gene expression and played an important role in biological processes and human disease development. Thus, prediction of the associations between lncRNAs and disease had become a hot research in the fields of human sophisticated diseases. Most of these methods considered the information of two networks (lncRNA, disease) while neglected other networks. In this study, we designed a multi-layer network by integrating the similarity networks of lncRNAs, diseases and genes, and the known association networks of lncRNA-disease, lncRNAs-gene, and disease-gene, and then we developed a model called MHRWR for predicting the lncRNA-disease potential associations based on random walk with restart. The performance of MHRWR was evaluated by experimentally verified lncRNA-disease associations based on leave-one-out cross validation. MHRWR obtained a reliable AUC value of 0.91344, which significantly outperformed some previous methods. To further validate the reproducibility of performance, we used the model of MHRWR to verify related lncRNAs of colon cancer, colorectal cancer and lung adenocarcinoma in the case studies. The codes of MHRWR is available on: https://github.com/yangyq505/MHRWR.
Collapse
|
38
|
Zeng M, Lu C, Fei Z, Wu FX, Li Y, Wang J, Li M. DMFLDA: A Deep Learning Framework for Predicting lncRNA-Disease Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2353-2363. [PMID: 32248123 DOI: 10.1109/tcbb.2020.2983958] [Citation(s) in RCA: 38] [Impact Index Per Article: 9.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
A growing amount of evidence suggests that long non-coding RNAs (lncRNAs) play important roles in the regulation of biological processes in many human diseases. However, the number of experimentally verified lncRNA-disease associations is very limited. Thus, various computational approaches are proposed to predict lncRNA-disease associations. Current matrix factorization-based methods cannot capture the complex non-linear relationship between lncRNAs and diseases, and traditional machine learning-based methods are not sufficiently powerful to learn the representation of lncRNAs and diseases. Considering these limitations in existing computational methods, we propose a deep matrix factorization model to predict lncRNA-disease associations (DMFLDA in short). DMFLDA uses a cascade of non-linear hidden layers to learn latent representation to represent lncRNAs and diseases. By using non-linear hidden layers, DMFLDA captures the more complex non-linear relationship between lncRNAs and diseases than traditional matrix factorization-based methods. In addition, DMFLDA learns features directly from the lncRNA-disease interaction matrix and thus can obtain more accurate representation learning for lncRNAs and diseases than traditional machine learning methods. The low dimensional representations of the lncRNAs and diseases are fused to estimate the new interaction value. To evaluate the performance of DMFLDA, we perform leave-one-out cross-validation and 5-fold cross-validation on known experimentally verified lncRNA-disease associations. The experimental results show that DMFLDA performs better than the existing methods. The case studies show that many predicted interactions of colorectal cancer, prostate cancer, and renal cancer have been verified by recent biomedical literature. The source code and datasets can be obtained from https://github.com/CSUBioGroup/DMFLDA.
Collapse
|
39
|
Yi HC, You ZH, Guo ZH, Huang DS, Chan KCC. Learning Representation of Molecules in Association Network for Predicting Intermolecular Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2546-2554. [PMID: 32070992 DOI: 10.1109/tcbb.2020.2973091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.
Collapse
|
40
|
Zheng Y, Wang H, Ding Y, Guo F. CEPZ: A Novel Predictor for Identification of DNase I Hypersensitive Sites. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2768-2774. [PMID: 33481716 DOI: 10.1109/tcbb.2021.3053661] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/12/2023]
Abstract
DNase I hypersensitive sites (DHSs) have proven to be tightly associated with cis-regulatory elements, commonly indicating specific function on the chromatin structure. Thus, identifying DHSs plays a fundamental role in decoding gene regulatory behavior. While traditional experimental methods turn to be time-consuming and expensive, computational techniques promise to be practical to discovering and analyzing regulatory factors. In this study, we applied an efficient model that considered composition information and physicochemical properties and effectively selected features with a boosting algorithm. CEPZ, our predictor, greatly improved a Matthews correlation coefficient and accuracy of 0.7740 and 0.9113 respectively, more competitive than any predictor before. This result suggests that it may become a useful tool for DHSs research in the human and other complex genomes. Our research was anchored on the properties of dinucleotides and we identified several dinucleotides with significant differences in the distribution of DHS and non-DHS samples, which are likely to have a special meaning in the chromatin structure. The datasets, feature sets and the relevant algorithm are available at https://github.com/YanZheng-16/CEPZ_DHS/.
Collapse
|
41
|
Xuan P, Wang D, Cui H, Zhang T, Nakaguchi T. Integration of pairwise neighbor topologies and miRNA family and cluster attributes for miRNA-disease association prediction. Brief Bioinform 2021; 23:6385813. [PMID: 34634106 DOI: 10.1093/bib/bbab428] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/14/2021] [Revised: 09/01/2021] [Accepted: 09/19/2021] [Indexed: 12/14/2022] Open
Abstract
Identifying disease-related microRNAs (miRNAs) assists the understanding of disease pathogenesis. Existing research methods integrate multiple kinds of data related to miRNAs and diseases to infer candidate disease-related miRNAs. The attributes of miRNA nodes including their family and cluster belonging information, however, have not been deeply integrated. Besides, the learning of neighbor topology representation of a pair of miRNA and disease is a challenging issue. We present a disease-related miRNA prediction method by encoding and integrating multiple representations of miRNA and disease nodes learnt from the generative and adversarial perspective. We firstly construct a bilayer heterogeneous network of miRNA and disease nodes, and it contains multiple types of connections among these nodes, which reflect neighbor topology of miRNA-disease pairs, and the attributes of miRNA nodes, especially miRNA-related families and clusters. To learn enhanced pairwise neighbor topology, we propose a generative and adversarial model with a convolutional autoencoder-based generator to encode the low-dimensional topological representation of the miRNA-disease pair and multi-layer convolutional neural network-based discriminator to discriminate between the true and false neighbor topology embeddings. Besides, we design a novel feature category-level attention mechanism to learn the various importance of different features for final adaptive fusion and prediction. Comparison results with five miRNA-disease association methods demonstrated the superior performance of our model and technical contributions in terms of area under the receiver operating characteristic curve and area under the precision-recall curve. The results of recall rates confirmed that our model can find more actual miRNA-disease associations among top-ranked candidates. Case studies on three cancers further proved the ability to detect potential candidate miRNAs.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Dong Wang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne 3083, Australia
| | - Tiangang Zhang
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| |
Collapse
|
42
|
Ding P, Ouyang W, Luo J, Kwoh CK. Heterogeneous information network and its application to human health and disease. Brief Bioinform 2021; 21:1327-1346. [PMID: 31566212 DOI: 10.1093/bib/bbz091] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2019] [Revised: 06/29/2019] [Accepted: 06/30/2019] [Indexed: 12/11/2022] Open
Abstract
The molecular components with the functional interdependencies in human cell form complicated biological network. Diseases are mostly caused by the perturbations of the composite of the interaction multi-biomolecules, rather than an abnormality of a single biomolecule. Furthermore, new biological functions and processes could be revealed by discovering novel biological entity relationships. Hence, more and more biologists focus on studying the complex biological system instead of the individual biological components. The emergence of heterogeneous information network (HIN) offers a promising way to systematically explore complicated and heterogeneous relationships between various molecules for apparently distinct phenotypes. In this review, we first present the basic definition of HIN and the biological system considered as a complex HIN. Then, we discuss the topological properties of HIN and how these can be applied to detect network motif and functional module. Afterwards, methodologies of discovering relationships between disease and biomolecule are presented. Useful insights on how HIN aids in drug development and explores human interactome are provided. Finally, we analyze the challenges and opportunities for uncovering combinatorial patterns among pharmacogenomics and cell-type detection based on single-cell genomic data.
Collapse
Affiliation(s)
- Pingjian Ding
- School of Computer Science, University of South China, Hengyang, China
| | - Wenjue Ouyang
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Jiawei Luo
- College of Computer Science and Electronic Engineering, Hunan University, Changsha, China
| | - Chee-Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, Singapore, Singapore
| |
Collapse
|
43
|
Zheng K, You ZH, Wang L, Li YR, Zhou JR, Zeng HT. MISSIM: An Incremental Learning-Based Model With Applications to the Prediction of miRNA-Disease Association. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1733-1742. [PMID: 32749964 DOI: 10.1109/tcbb.2020.3013837] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In the past few years, the prediction models have shown remarkable performance in most biological correlation prediction tasks. These tasks traditionally use a fixed dataset, and the model, once trained, is deployed as is. These models often encounter training issues such as sensitivity to hyperparameter tuning and "catastrophic forgetting" when adding new data. However, with the development of biomedicine and the accumulation of biological data, new predictive models are required to face the challenge of adapting to change. To this end, we propose a computational approach based on Broad learning system (BLS) to predict potential disease-associated miRNAs that retain the ability to distinguish prior training associations when new data need to be adapted. In particular, we are introducing incremental learning to the field of biological association prediction for the first time and proposed a new method for quantifying sequence similarity. In the performance evaluation, the AUC in the 5-fold cross-validation was 0.9400 +/- 0.0041. To better assess the effectiveness of MISSIM, we compared it with various classifiers and former prediction models. Its performance is superior to the previous method. Besides, the case study on identifying miRNAs associated with breast neoplasms, lung neoplasms and esophageal neoplasms show that 34, 36 and 35 out of the top 40 associations predicted by MISSIM are confirmed by recent biomedical resources. These results provide ample convincing evidence of this approach have potential value and prospect in promoting biomedical research productivity.
Collapse
|
44
|
Yi HC, You ZH, Huang DS, Kwoh CK. Graph representation learning in bioinformatics: trends, methods and applications. Brief Bioinform 2021; 23:6361044. [PMID: 34471921 DOI: 10.1093/bib/bbab340] [Citation(s) in RCA: 58] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/24/2021] [Revised: 07/18/2021] [Accepted: 08/02/2021] [Indexed: 12/12/2022] Open
Abstract
Graph is a natural data structure for describing complex systems, which contains a set of objects and relationships. Ubiquitous real-life biomedical problems can be modeled as graph analytics tasks. Machine learning, especially deep learning, succeeds in vast bioinformatics scenarios with data represented in Euclidean domain. However, rich relational information between biological elements is retained in the non-Euclidean biomedical graphs, which is not learning friendly to classic machine learning methods. Graph representation learning aims to embed graph into a low-dimensional space while preserving graph topology and node properties. It bridges biomedical graphs and modern machine learning methods and has recently raised widespread interest in both machine learning and bioinformatics communities. In this work, we summarize the advances of graph representation learning and its representative applications in bioinformatics. To provide a comprehensive and structured analysis and perspective, we first categorize and analyze both graph embedding methods (homogeneous graph embedding, heterogeneous graph embedding, attribute graph embedding) and graph neural networks. Furthermore, we summarize their representative applications from molecular level to genomics, pharmaceutical and healthcare systems level. Moreover, we provide open resource platforms and libraries for implementing these graph representation learning methods and discuss the challenges and opportunities of graph representation learning in bioinformatics. This work provides a comprehensive survey of emerging graph representation learning algorithms and their applications in bioinformatics. It is anticipated that it could bring valuable insights for researchers to contribute their knowledge to graph representation learning and future-oriented bioinformatics studies.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- Chinese Academy of Sciences, Xinjiang Technical Institute of Physics and Chemistry, Urumqi 830011, China.,University of Chinese Academy of Sciences, Beijing 100049, China
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi'an 710129, China
| | - De-Shuang Huang
- Institute of Machine Learning and Systems Biology, School of Electronics and Information Engineering, Tongji University, Shanghai 201804, China
| | - Chee Keong Kwoh
- School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore
| |
Collapse
|
45
|
Dai Q, Chu Y, Li Z, Zhao Y, Mao X, Wang Y, Xiong Y, Wei DQ. MDA-CF: Predicting MiRNA-Disease associations based on a cascade forest model by fusing multi-source information. Comput Biol Med 2021; 136:104706. [PMID: 34371319 DOI: 10.1016/j.compbiomed.2021.104706] [Citation(s) in RCA: 19] [Impact Index Per Article: 4.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/17/2021] [Revised: 07/26/2021] [Accepted: 07/26/2021] [Indexed: 01/17/2023]
Abstract
MicroRNAs (miRNAs) are significant regulators in various biological processes. They may become promising biomarkers or therapeutic targets, which provide a new perspective in diagnosis and treatment of multiple diseases. Since the experimental methods are always costly and resource-consuming, prediction of disease-related miRNAs using computational methods is in great need. In this study, we developed MDA-CF to identify underlying miRNA-disease associations based on a cascade forest model. In this method, multi-source information was integrated to represent miRNAs and diseases comprehensively, and the autoencoder was utilized for dimension reduction to obtain the optimal feature space. The cascade forest model was then employed for miRNA-disease association prediction. As a result, the average AUC of MDA-CF was 0.9464 on HMDD v3.2 in five-fold cross-validation. Compared with previous computational methods, MDA-CF performed better on HMDD v2.0 with an average AUC of 0.9258. Moreover, MDA-CF was implemented to investigate colon neoplasm, breast neoplasm, and gastric neoplasm, and 100%, 86%, 88% of the top 50 potential miRNAs were validated by authoritative databases. In conclusion, MDA-CF appears to be a reliable method to uncover disease-associated miRNAs. The source code of MDA-CF is available at https://github.com/a1622108/MDA-CF.
Collapse
Affiliation(s)
- Qiuying Dai
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanyi Chu
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Zhiqi Li
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yusong Zhao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Xueying Mao
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yanjing Wang
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China
| | - Yi Xiong
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China.
| | - Dong-Qing Wei
- State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai, 200240, China; Peng Cheng Laboratory, Vanke Cloud City Phase I Building 8, Xili Street, Nanshan District, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
46
|
SCMFMDA: Predicting microRNA-disease associations based on similarity constrained matrix factorization. PLoS Comput Biol 2021; 17:e1009165. [PMID: 34252084 PMCID: PMC8345837 DOI: 10.1371/journal.pcbi.1009165] [Citation(s) in RCA: 27] [Impact Index Per Article: 6.8] [Reference Citation Analysis] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/05/2021] [Revised: 08/06/2021] [Accepted: 06/08/2021] [Indexed: 11/21/2022] Open
Abstract
miRNAs belong to small non-coding RNAs that are related to a number of complicated biological processes. Considerable studies have suggested that miRNAs are closely associated with many human diseases. In this study, we proposed a computational model based on Similarity Constrained Matrix Factorization for miRNA-Disease Association Prediction (SCMFMDA). In order to effectively combine different disease and miRNA similarity data, we applied similarity network fusion algorithm to obtain integrated disease similarity (composed of disease functional similarity, disease semantic similarity and disease Gaussian interaction profile kernel similarity) and integrated miRNA similarity (composed of miRNA functional similarity, miRNA sequence similarity and miRNA Gaussian interaction profile kernel similarity). In addition, the L2 regularization terms and similarity constraint terms were added to traditional Nonnegative Matrix Factorization algorithm to predict disease-related miRNAs. SCMFMDA achieved AUCs of 0.9675 and 0.9447 based on global Leave-one-out cross validation and five-fold cross validation, respectively. Furthermore, the case studies on two common human diseases were also implemented to demonstrate the prediction accuracy of SCMFMDA. The out of top 50 predicted miRNAs confirmed by experimental reports that indicated SCMFMDA was effective for prediction of relationship between miRNAs and diseases. Considerable studies have suggested that miRNAs are closely associated with many human diseases, so predicting potential associations between miRNAs and diseases can contribute to the diagnose and treatment of diseases. Several models of discovering unknown miRNA-diseases associations make the prediction more productive and effective. We proposed SCMFMDA to obtain more accuracy prediction result by applying similarity network fusion to fuse multi-source disease and miRNA information and utilizing similarity constrained matrix factorization to make prediction based on biological information. The global Leave-one-out cross validation and five-fold cross validation were applied to evaluate our model. Consequently, SCMFMDA could achieve AUCs of 0.9675 and 0.9447 that were obviously higher than previous computational models. Furthermore, we implemented case studies on significant human diseases including colon neoplasms and lung neoplasms, 47 and 46 of top-50 were confirmed by experimental reports. All results proved that SCMFMDA could be regard as an effective way to discover unverified connections of miRNA-disease.
Collapse
|
47
|
Min X, Lu F, Li C. Sequence-Based Deep Learning Frameworks on Enhancer-Promoter Interactions Prediction. Curr Pharm Des 2021; 27:1847-1855. [PMID: 33234095 DOI: 10.2174/1381612826666201124112710] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/15/2020] [Revised: 07/29/2020] [Accepted: 08/06/2020] [Indexed: 11/22/2022]
Abstract
Enhancer-promoter interactions (EPIs) in the human genome are of great significance to transcriptional regulation, which tightly controls gene expression. Identification of EPIs can help us better decipher gene regulation and understand disease mechanisms. However, experimental methods to identify EPIs are constrained by funds, time, and manpower, while computational methods using DNA sequences and genomic features are viable alternatives. Deep learning methods have shown promising prospects in classification and efforts that have been utilized to identify EPIs. In this survey, we specifically focus on sequence-based deep learning methods and conduct a comprehensive review of the literature. First, we briefly introduce existing sequence- based frameworks on EPIs prediction and their technique details. After that, we elaborate on the dataset, pre-processing means, and evaluation strategies. Finally, we concluded with the challenges these methods are confronted with and suggest several future opportunities. We hope this review will provide a useful reference for further studies on enhancer-promoter interactions.
Collapse
Affiliation(s)
- Xiaoping Min
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Fengqing Lu
- School of Informatics, Xiamen University, Xiamen 361005, China
| | - Chunyan Li
- Graduate School, Yunnan Minzu University, Kunming 650504, China
| |
Collapse
|
48
|
Peng W, Du J, Dai W, Lan W. Predicting miRNA-Disease Association Based on Modularity Preserving Heterogeneous Network Embedding. Front Cell Dev Biol 2021; 9:603758. [PMID: 34178973 PMCID: PMC8223753 DOI: 10.3389/fcell.2021.603758] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/07/2020] [Accepted: 03/23/2021] [Indexed: 12/12/2022] Open
Abstract
MicroRNAs (miRNAs) are a category of small non-coding RNAs that profoundly impact various biological processes related to human disease. Inferring the potential miRNA-disease associations benefits the study of human diseases, such as disease prevention, disease diagnosis, and drug development. In this work, we propose a novel heterogeneous network embedding-based method called MDN-NMTF (Module-based Dynamic Neighborhood Non-negative Matrix Tri-Factorization) for predicting miRNA-disease associations. MDN-NMTF constructs a heterogeneous network of disease similarity network, miRNA similarity network and a known miRNA-disease association network. After that, it learns the latent vector representation for miRNAs and diseases in the heterogeneous network. Finally, the association probability is computed by the product of the latent miRNA and disease vectors. MDN-NMTF not only successfully integrates diverse biological information of miRNAs and diseases to predict miRNA-disease associations, but also considers the module properties of miRNAs and diseases in the course of learning vector representation, which can maximally preserve the heterogeneous network structural information and the network properties. At the same time, we also extend MDN-NMTF to a new version (called MDN-NMTF2) by using modular information to improve the miRNA-disease association prediction ability. Our methods and the other four existing methods are applied to predict miRNA-disease associations in four databases. The prediction results show that our methods can improve the miRNA-disease association prediction to a high level compared with the four existing methods.
Collapse
Affiliation(s)
- Wei Peng
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Jielin Du
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China
| | - Wei Dai
- Faculty of Information Engineering and Automation, Kunming University of Science and Technology, Kunming, China.,Computer Technology Application Key Laboratory of Yunnan Province, Kunming University of Science and Technology, Kunming, China
| | - Wei Lan
- Guangxi Key Laboratory of Multimedia Communications and Network Technology, Guangxi University, Nanning, China
| |
Collapse
|
49
|
Zhu Q, Fan Y, Pan X. Fusing Multiple Biological Networks to Effectively Predict miRNA-disease Associations. Curr Bioinform 2021. [DOI: 10.2174/1574893615999200715165335] [Citation(s) in RCA: 10] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/22/2022]
Abstract
Background:
MicroRNAs (miRNAs) are a class of endogenous non-coding RNAs with
about 22 nucleotides, and they play a significant role in a variety of complex biological processes.
Many researches have shown that miRNAs are closely related to human diseases. Although the
biological experiments are reliable in identifying miRNA-disease associations, they are timeconsuming
and costly.
Objective:
Thus, computational methods are urgently needed to effectively predict miRNA-disease
associations.
Methods:
In this paper, we proposed a novel method, BIRWMDA, based on a bi-random walk
model to predict miRNA-disease associations. Specifically, in BIRWMDA, the similarity network
fusion algorithm is used to combine the multiple similarity matrices to obtain a miRNA-miRNA
similarity matrix and a disease-disease similarity matrix, then the miRNA-disease associations were
predicted by the bi-random walk model.
Results:
To evaluate the performance of BIRWMDA, we ran the leave-one-out cross-validation and
5-fold cross-validation, and their corresponding AUCs were 0.9303 and 0.9223 ± 0.00067,
respectively. To further demonstrate the effectiveness of the BIRWMDA, from the perspective of
exploring disease-related miRNAs, we conducted three case studies of breast neoplasms, prostate
neoplasms and gastric neoplasms, where 48, 50 and 50 out of the top 50 predicted miRNAs were
confirmed by literature, respectively. From the perspective of exploring miRNA-related diseases, we
conducted two case studies of hsa-mir-21 and hsa-mir-155, where 7 and 5 out of the top 10 predicted
diseases were confirmed by literatures, respectively.
Conclusion:
The fusion of multiple biological networks could effectively predict miRNA-diseases
associations. We expected BIRWMDA to serve as a biological tool for mining potential miRNAdisease
associations.
Collapse
Affiliation(s)
- Qingqi Zhu
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Yongxian Fan
- School of Computer Science and Information Security, Guilin University of Electronic Technology, Guilin, China
| | - Xiaoyong Pan
- Institute of Image Processing and Recognition, Shanghai Jiao Tong University, and Key Laboratory of System Control and Information Processing, Ministry of Education of China, Shanghai, China
| |
Collapse
|
50
|
Zeng R, Cheng S, Liao M. 4mCPred-MTL: Accurate Identification of DNA 4mC Sites in Multiple Species Using Multi-Task Deep Learning Based on Multi-Head Attention Mechanism. Front Cell Dev Biol 2021; 9:664669. [PMID: 34041243 PMCID: PMC8141656 DOI: 10.3389/fcell.2021.664669] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/05/2021] [Accepted: 03/17/2021] [Indexed: 01/10/2023] Open
Abstract
DNA methylation is one of the most extensive epigenetic modifications. DNA 4mC modification plays a key role in regulating chromatin structure and gene expression. In this study, we proposed a generic 4mC computational predictor, namely, 4mCPred-MTL using multi-task learning coupled with Transformer to predict 4mC sites in multiple species. In this predictor, we utilize a multi-task learning framework, in which each task is to train species-specific data based on Transformer. Extensive experimental results show that our multi-task predictive model can significantly improve the performance of the model based on single task and outperform existing methods on benchmarking comparison. Moreover, we found that our model can sufficiently capture better characteristics of 4mC sites as compared to existing commonly used feature descriptors, demonstrating the strong feature learning ability of our model. Therefore, based on the above results, it can be expected that our 4mCPred-MTL can be a useful tool for research communities of interest.
Collapse
Affiliation(s)
- Rao Zeng
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| | - Song Cheng
- Department of Thoracic Surgery, Heilongjiang Province Land Reclamation Headquarters General Hospital, Harbin, China
| | - Minghong Liao
- Department of Software Engineering, School of Informatics, Xiamen University, Xiamen, China
| |
Collapse
|