1
|
Li F, Zheng M, Jia J. Validate association of gene loci and establish genetic risk prediction models for late-onset Alzheimer's disease in Chinese populations. J Alzheimers Dis 2025:13872877251326283. [PMID: 40116671 DOI: 10.1177/13872877251326283] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/23/2025]
Abstract
BackgroundMore than 60 independent single-nucleotide polymorphisms (SNPs) have been associated with Alzheimer's disease risk by genome-wide association studies in European.ObjectiveWe aimed to confirm these SNPs in Chinese Han populations and investigate the utility of these genetic markers.MethodsAltogether 1595 late-onset Alzheimer's disease (LOAD) patients and 2474 controls from Chinese population were recruited. We replicated the association of 68 SNPs with LOAD and established polygenetic risk score (PRS) prediction model using significant SNPs. Meta-analysis for MS4A6A rs610932 and PICALM rs3851179 were performed.ResultsAccording to our findings, 14 out of 68 SNPs are validated significantly associated with LOAD (adjusted p < 0.05) after adjusting age and sex in the Chinese population. Besides, after stratification by APOE ε4 status, almost all SNPs retain markedly relationship with LOAD in APOE ε4 noncarriers. However, few loci retain correlation in APOE ε4 carriers. Furthermore, the area under the receiver operating characteristic curve prediction model for distinguishing LOAD patients from normal subjects were 0.614 for PRS and 0.689 for PRS and APOE. In addition, meta-analysis including this study of East Asian populations confirmed that rs610932 and rs3851179 were dramatically related to the LOAD (OR = 0.85, 95% CI = 0.74-0.97; OR = 0.87, 95% CI = 0.83-0.91).ConclusionsDespite genetic heterogeneity, there are still common loci among different races. PRS based on AD risk-associated SNPs may supplement APOE for better assessing individual risk for AD in Chinese. Besides, interactions between genes and gene environment affect the impact of risk allele on diverse populations.
Collapse
Affiliation(s)
- Fangyu Li
- Innovation Center for Neurological Disorders and Department of Neurology, Xuanwu Hospital, Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Menghan Zheng
- Innovation Center for Neurological Disorders and Department of Neurology, Xuanwu Hospital, Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, China
| | - Jianping Jia
- Innovation Center for Neurological Disorders and Department of Neurology, Xuanwu Hospital, Capital Medical University, National Clinical Research Center for Geriatric Diseases, Beijing, China
- Beijing Key Laboratory of Geriatric Cognitive Disorders, Beijing, China
- Clinical Center for Neurodegenerative Disease and Memory Impairment, Capital Medical University, Beijing, China
- Center of Alzheimer's Disease, Beijing Institute of Brain Disorders, Collaborative Innovation Center for Brain Disorders, Capital Medical University, Beijing, China
- Key Laboratory of Neurodegenerative Diseases, Ministry of Education, Beijing, China
| |
Collapse
|
2
|
Cao G, Chen D. Unveiling Long Non-coding RNA Networks from Single-Cell Omics Data Through Artificial Intelligence. Methods Mol Biol 2025; 2883:257-279. [PMID: 39702712 DOI: 10.1007/978-1-0716-4290-0_11] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/21/2024]
Abstract
Single-cell omics technologies have revolutionized the study of long non-coding RNAs (lncRNAs), offering unprecedented resolution in elucidating their expression dynamics, cell-type specificity, and associated gene regulatory networks (GRNs). Concurrently, the integration of artificial intelligence (AI) methodologies has significantly advanced our understanding of lncRNA functions and its implications in disease pathogenesis. This chapter discusses the progress in single-cell omics data analysis, emphasizing its pivotal role in unraveling the molecular mechanisms underlying cellular heterogeneity and the associated regulatory networks involving lncRNAs. Additionally, we provide a summary of single-cell omics resources and AI models for constructing single-cell gene regulatory networks (scGRNs). Finally, we explore the challenges and prospects of exploring scGRNs in the context of lncRNA biology.
Collapse
Affiliation(s)
- Guangshuo Cao
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China
| | - Dijun Chen
- State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing, China.
| |
Collapse
|
3
|
Zhang C, Li Y, Dong Y, Chen W, Yu C. Prediction of miRNA-disease associations based on PCA and cascade forest. BMC Bioinformatics 2024; 25:386. [PMID: 39701957 DOI: 10.1186/s12859-024-05999-w] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/04/2024] [Accepted: 11/26/2024] [Indexed: 12/21/2024] Open
Abstract
BACKGROUND As a key non-coding RNA molecule, miRNA profoundly affects gene expression regulation and connects to the pathological processes of several kinds of human diseases. However, conventional experimental methods for validating miRNA-disease associations are laborious. Consequently, the development of efficient and reliable computational prediction models is crucial for the identification and validation of these associations. RESULTS In this research, we developed the PCACFMDA method to predict the potential associations between miRNAs and diseases. To construct a multidimensional feature matrix, we consider the fusion similarities of miRNA and disease and miRNA-disease pairs. We then use principal component analysis(PCA) to reduce data complexity and extract low-dimensional features. Subsequently, a tuned cascade forest is used to mine the features and output prediction scores deeply. The results of the 5-fold cross-validation using the HMDD v2.0 database indicate that the PCACFMDA algorithm achieved an AUC of 98.56%. Additionally, we perform case studies on breast, esophageal and lung neoplasms. The findings revealed that the top 50 miRNAs most strongly linked to each disease have been validated. CONCLUSIONS Based on PCA and optimized cascade forests, we propose the PCACFMDA model for predicting undiscovered miRNA-disease associations. The experimental results demonstrate superior prediction performance and commendable stability. Consequently, the PCACFMDA is a potent instrument for in-depth exploration of miRNA-disease associations.
Collapse
Affiliation(s)
- Chuanlei Zhang
- Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Yubo Li
- Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Yinglun Dong
- Artificial Intelligence, Tianjin University of Science and Technology, Tianjin, 300457, China
| | - Wei Chen
- Computer Science, China University of Mining and Technology, Xuzhou, 221116, China
| | - Changqing Yu
- Electronic Information, Xijing University, Xi'an, 710123, China.
| |
Collapse
|
4
|
Jafari S, Motedayyen H, Javadi P, Jamali K, Moradi Hasan-Abad A, Atapour A, Sarab GA. The roles of lncRNAs and miRNAs in pancreatic cancer: a focus on cancer development and progression and their roles as potential biomarkers. Front Oncol 2024; 14:1355064. [PMID: 38559560 PMCID: PMC10978783 DOI: 10.3389/fonc.2024.1355064] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/13/2023] [Accepted: 02/27/2024] [Indexed: 04/04/2024] Open
Abstract
Pancreatic ductal adenocarcinoma (PDAC) is among the most penetrative malignancies affecting humans, with mounting incidence prevalence worldwide. This cancer is usually not diagnosed in the early stages. There is also no effective therapy against PDAC, and most patients have chemo-resistance. The combination of these factors causes PDAC to have a poor prognosis, and often patients do not live longer than six months. Because of the failure of conventional therapies, the identification of key biomarkers is crucial in the early diagnosis, treatment, and prognosis of pancreatic cancer. 65% of the human genome encodes ncRNAs. There are different types of ncRNAs that are classified based on their sequence lengths and functions. They play a vital role in replication, transcription, translation, and epigenetic regulation. They also participate in some cellular processes, such as proliferation, differentiation, metabolism, and apoptosis. The roles of ncRNAs as tumor suppressors or oncogenes in the growth of tumors in a variety of tissues, including the pancreas, have been demonstrated in several studies. This study discusses the key roles of some lncRNAs and miRNAs in the growth and advancement of pancreatic carcinoma. Because they are involved not only in the premature identification, chemo-resistance and prognostication, also their roles as potential biomarkers for better management of PDAC patients.
Collapse
Affiliation(s)
- Somayeh Jafari
- Department of Molecular Medicine, School of Medicine, Birjand University of Medical Sciences, Birjand, Iran
| | - Hossein Motedayyen
- Autoimmune Diseases Research Center, Kashan University of Medical Sciences, Kashan, Iran
| | - Parisa Javadi
- Department of Medical Nanotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Kazem Jamali
- Emergency Medicine Research Center, Shiraz University of Medical Sciences, Shiraz, Iran
- Trauma Research Center, Shahid Rajaee (Emtiaz) Trauma Hospital, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Amin Moradi Hasan-Abad
- Autoimmune Diseases Research Center, Kashan University of Medical Sciences, Kashan, Iran
| | - Amir Atapour
- Department of Medical Biotechnology, School of Advanced Medical Sciences and Technologies, Shiraz University of Medical Sciences, Shiraz, Iran
| | - Gholamreza Anani Sarab
- Cellular and Molecular Research Center, Birjand University of Medical Sciences, Birjand, Iran
| |
Collapse
|
5
|
Xuan P, Xiu J, Cui H, Zhang X, Nakaguchi T, Zhang T. Complementary feature learning across multiple heterogeneous networks and multimodal attribute learning for predicting disease-related miRNAs. iScience 2024; 27:108639. [PMID: 38303724 PMCID: PMC10831890 DOI: 10.1016/j.isci.2023.108639] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/18/2023] [Revised: 11/02/2023] [Accepted: 12/01/2023] [Indexed: 02/03/2024] Open
Abstract
Inferring the latent disease-related miRNAs is helpful for providing a deep insight into observing the disease pathogenesis. We propose a method, CMMDA, to encode and integrate the context relationship among multiple heterogeneous networks, the complementary information across these networks, and the pairwise multimodal attributes. We first established multiple heterogeneous networks according to the diverse disease similarities. The feature representation embedding the context relationship is formulated for each miRNA (disease) node based on transformer. We designed a co-attention fusion mechanism to encode the complementary information among multiple networks. In terms of a pair of miRNA and disease nodes, the pairwise attributes from multiple networks form a multimodal attribute embedding. A module based on depthwise separable convolution is constructed to enhance the encoding of the specific features from each modality. The experimental results and the ablation studies show that CMMDA's superior performance and the effectiveness of its major innovations.
Collapse
Affiliation(s)
- Ping Xuan
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- Department of Computer Science, Shantou University, Shantou 515063, China
| | - Jinshan Xiu
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Hui Cui
- Department of Computer Science and Information Technology, La Trobe University, Melbourne, VIC 3083, Australia
| | - Xiaowen Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
| | - Toshiya Nakaguchi
- Center for Frontier Medical Engineering, Chiba University, Chiba 2638522, Japan
| | - Tiangang Zhang
- School of Computer Science and Technology, Heilongjiang University, Harbin 150080, China
- School of Mathematical Science, Heilongjiang University, Harbin 150080, China
| |
Collapse
|
6
|
Wang Y, Yao M, Li C, Yang K, Qin X, Xu L, Shi S, Yu C, Meng X, Xie C. Targeting ST8SIA6-AS1 counteracts KRAS G12C inhibitor resistance through abolishing the reciprocal activation of PLK1/c-Myc signaling. Exp Hematol Oncol 2023; 12:105. [PMID: 38104151 PMCID: PMC10724920 DOI: 10.1186/s40164-023-00466-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/25/2023] [Accepted: 12/03/2023] [Indexed: 12/19/2023] Open
Abstract
BACKGROUND KRASG12C inhibitors (KRASG12Ci) AMG510 and MRTX849 have shown promising efficacy in clinical trials and been approved for the treatment of KRASG12C-mutant cancers. However, the emergence of therapy-related drug resistance limits their long-term potential. This study aimed to identify the critical mediators and develop overcoming strategies. METHODS By using RNA sequencing, RT-qPCR and immunoblotting, we identified and validated the upregulation of c-Myc activity and the amplification of the long noncoding RNA ST8SIA6-AS1 in KRASG12Ci-resistant cells. The regulatory axis ST8SIA6-AS1/Polo-like kinase 1 (PLK1)/c-Myc was investigated by bioinformatics, RNA fluorescence in situ hybridization, RNA immunoprecipitation, RNA pull-down and chromatin immunoprecipitation. Gain/loss-of-function assays, cell viability assay, xenograft models, and IHC staining were conducted to evaluate the anti-cancer effects of co-inhibition of ST8SIA6-AS1/PLK1 pathway and KRAS both in vitro and in vivo. RESULTS KRASG12Ci sustainably decreased c-Myc levels in responsive cell lines but not in cell lines with intrinsic or acquired resistance to KRASG12Ci. PLK1 activation contributed to this ERK-independent c-Myc stability, which in turn directly induced PLK1 transcription, forming a positive feedback loop and conferring resistance to KRASG12Ci. ST8SIA6-AS1 was found significantly upregulated in resistant cells and facilitated the proliferation of KRASG12C-mutant cancers. ST8SIA6-AS1 bound to Aurora kinase A (Aurora A)/PLK1 and promoted Aurora A-mediated PLK1 phosphorylation. Concurrent targeting of KRAS and ST8SIA6-AS1/PLK1 signaling suppressed both ERK-dependent and -independent c-Myc expression, synergistically led to cell death and tumor regression and overcame KRASG12Ci resistance. CONCLUSIONS Our study deciphers that the axis of ST8SIA6-AS1/PLK1/c-Myc confers both intrinsic and acquired resistance to KRASG12Ci and represents a promising therapeutic target for combination strategies with KRASG12Ci in the treatment of KRASG12C-mutant cancers.
Collapse
Affiliation(s)
- Yafang Wang
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
| | - Mingyue Yao
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
- Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC (Anhui Provincial Hospital), University of Science and Technology of China, Hefei, Anhui, China
- Drug Discovery and Development Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, People's Republic of China
| | - Cheng Li
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Kexin Yang
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
- Lingang Laboratory, 319 Yueyang Road, Shanghai, 200031, China
| | - Xiaolong Qin
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Lansong Xu
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
- Division of Life Sciences and Medicine, The First Affiliated Hospital of USTC (Anhui Provincial Hospital), University of Science and Technology of China, Hefei, Anhui, China
- Drug Discovery and Development Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, People's Republic of China
| | - Shangxuan Shi
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China
| | - Chengcheng Yu
- Drug Discovery and Development Center, Shanghai Institute of Materia Medica, Chinese Academy of Sciences, 555 Zuchongzhi Road, Shanghai, 201203, People's Republic of China
- Lingang Laboratory, 319 Yueyang Road, Shanghai, 200031, China
| | - Xiangjun Meng
- Gastroenterology, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, 200001, China
- China Center for Digestive Diseases Research and Clinical Translation of Shanghai Jiao Tong University, Shanghai, 200001, China
- China Shanghai Key Laboratory of Gut Microecology and Associated Major Diseases Research, Shanghai, 200001, China
| | - Chengying Xie
- Shanghai Institute for Advanced Immunochemical Studies, ShanghaiTech University, 393 Middle Huaxia Road, Shanghai, 201210, People's Republic of China.
- School of Life Science and Technology, ShanghaiTech University, Shanghai, 201210, China.
- Lingang Laboratory, 319 Yueyang Road, Shanghai, 200031, China.
| |
Collapse
|
7
|
Gong L, Chen J, Cui X, Liu Y. RPIPCM: A deep network model for predicting lncRNA-protein interaction based on sequence feature encoding. Comput Biol Med 2023; 165:107366. [PMID: 37633089 DOI: 10.1016/j.compbiomed.2023.107366] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2023] [Revised: 07/29/2023] [Accepted: 08/12/2023] [Indexed: 08/28/2023]
Abstract
LncRNA-protein interactionplays an important regulatory role in biological processes. In this paper, the proposed RPIPCM based on a novel deep network model uses the sequence feature encoding of both RNA and protein to predict lncRNA-protein interactions (LPIs). A negative sampling of sliding window method is proposed for solving the problem of unbalanced between positive and negative samples. The proposed negative sampling method is effective and helpful to solve the problem of data imbalance in the existing LPIs research by comparative experiments. Experimental results also show that the proposed sequence feature encoding method has good performance in predicting LPIs for different datasets of different sizes and types. In the RPI488 dataset related to animal, compared with the direct original sequence encoding model, the accuracy of sequence feature encoding model increased by 1.02%, the recall increased by 4.08%, and the value of MCC increased by 1.67%. In the case of the plant dataset ATH948, the sequence feature-based encoding demonstrated a 1.58% higher accuracy, a 1.53% higher recall, a 1.62% higher specificity, a 1.62% higher precision, and a 3.16% higher value of MCC compared to the direct original sequence-based encoding. Compared with the latest prediction work in the ZEA22133 dataset, RPIPCM is shown to be more effective with the accuracy increased by 2.23%, the recall increased by 1.78%, the specificity increased by 2.67%, the precision increased by 2.52%, and the value of MCC increased by 4.43%, which also proves the effectiveness and robustness of RPIPCM. In conclusion, RPIPCM of deep network model based on sequence feature encoding can automatically mine the hidden feature information of the sequence in the lncRNA-protein interaction without relying on external features or prior biomedical knowledge, and its low cost and high efficiency can provide a reference for biomedical researchers.
Collapse
Affiliation(s)
- Lejun Gong
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China.
| | - Jingmei Chen
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Xiong Cui
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| | - Yang Liu
- School of Computer Science, Nanjing University of Posts and Telecommunications, Nanjing, 210023, China
| |
Collapse
|
8
|
Zhou Z, Du Z, Wei J, Zhuo L, Pan S, Fu X, Lian X. MHAM-NPI: Predicting ncRNA-protein interactions based on multi-head attention mechanism. Comput Biol Med 2023; 163:107143. [PMID: 37339574 DOI: 10.1016/j.compbiomed.2023.107143] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2023] [Revised: 05/20/2023] [Accepted: 06/06/2023] [Indexed: 06/22/2023]
Abstract
Non-coding RNA (ncRNA) is a functional RNA molecule that plays a key role in various fundamental biological processes, such as gene regulation. Therefore, studying the connection between ncRNA and proteins holds significant importance in exploring the function of ncRNA. Although many efficient and accurate methods have been developed by modern biological scientists, accurate predictions still pose a major challenge for various issues. In our approach, we utilize a multi-head attention mechanism to merge residual connections, allowing for the automatic learning of ncRNA and protein sequence features. Specifically, the proposed method projects node features into multiple spaces based on multi-head attention mechanism, thereby obtaining different feature interaction patterns in these spaces. By stacking interaction layers, higher-order interaction modes can be derived, while still preserving the initial feature information through the residual connection. This strategy effectively leverages the sequence information of ncRNA and protein, enabling the capture of hidden high-order features. The final experimental results demonstrate the effectiveness of our method, with AUC values of 97.4%, 98.5%, and 94.8% achieved on the NPInter v2.0, RPI807, and RPI488 datasets, respectively. These impressive results solidify our method as a powerful tool for exploring the connection between ncRNAs and proteins. We have uploaded the implementation code on GitHub: https://github.com/ZZCrazy00/MHAM-NPI.
Collapse
Affiliation(s)
- Zhecheng Zhou
- Wenzhou University of Technology, Wenzhou, 325000, China
| | - Zhenya Du
- Guangzhou Xinhua University, Guangzhou, 510520, China
| | - Jinhang Wei
- Wenzhou University of Technology, Wenzhou, 325000, China
| | - Linlin Zhuo
- Wenzhou University of Technology, Wenzhou, 325000, China; Hunan University, Changsha, 410000, China.
| | - Shiyao Pan
- Wenzhou University of Technology, Wenzhou, 325000, China
| | | | - Xinze Lian
- Wenzhou University of Technology, Wenzhou, 325000, China.
| |
Collapse
|
9
|
Du Z, Huang T, Uversky VN, Li J. Predicting TF Proteins by Incorporating Evolution Information Through PSSM. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2023; 20:1319-1326. [PMID: 35981062 DOI: 10.1109/tcbb.2022.3199758] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/04/2023]
Abstract
Transcription factors (TFs) are DNA binding proteins involved in the regulation of gene expression. They exist in all organisms and activate or repress transcription by binding to specific DNA sequences. Traditionally, TFs have been identified by experimental methods that are time-consuming and costly. In recent years, various computational methods have been developed to identify TF to overcome these limitations. However, there is a room for further improvement in the predictive performance of these tools in terms of accuracy. We report here a novel computational tool, TFnet, that provides accurate and comprehensive TF predictions from protein sequences. The accuracy of these predictions is substantially better than the results of the existing TF predictors and methods. Especially, it outperforms comparable methods significantly when sequence similarity to other known sequences in the database drops below 40%. Ablation tests reveal that the high predictive performance stems from innovative ways used in TFnet to derive sequence Position-Specific Scoring Matrix (PSSM) and encode inputs.
Collapse
|
10
|
Han Y, Zhang SW. Docsubty: FLAncRPI-LGAT: Prediction of ncRNA-Protein Interactions with Line Graph Attention Network Framework. Comput Struct Biotechnol J 2023; 21:2286-2295. [PMID: 37035546 PMCID: PMC10073990 DOI: 10.1016/j.csbj.2023.03.027] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2022] [Revised: 03/11/2023] [Accepted: 03/16/2023] [Indexed: 03/19/2023] Open
Abstract
Identification of ncRNA-protein interactions (ncRPIs) through wet experiments is still time-consuming and highly-costly. Although several computational approaches have been developed to predict ncRPIs using the structure and sequence information of ncRNAs and proteins, the prediction accuracy needs to be improved, and the results lack interpretability. In this work, we proposed a novel computational method (called ncRPI-LGAT) to predict the ncRNA-Protein Interactions by transforming the link prediction (i.e., subgraph classification) task into a node classification task in the line network, and introducing a Line Graph ATtention network framework. ncRPI-LGAT first extracts the ncRNA/protein attributes using node2vec, and then generates the local enclosing subgraph of a target ncRNA-protein pair with SEAL. Because using the pooling operations in local enclosing subgraphs to learn a fixed-size feature vector for representing ncRNAs/proteins will cause the information loss, ncRPI-LGAT converts the local enclosing subgraphs into their corresponding line graphs, in which the node corresponds to the edge (i.e., ncRNA-protein pair) of the local enclosing subgraphs. Then, the attention mechanism-based graph neural network GATv2 is used on these line graphs to efficiently learn the embedding features of the target nodes (i.e., ncRNA-protein pairs) by focusing on learning the significance of one ncRNA-protein pair to another ncRNA-protein pair. These embedding features of one ncRNA-protein pair obtained from multi-head attention are concatenated in series and then fed them into a fully connected network to predict ncRPIs. Compared with other state-of-the-art methods in the 5CV test, ncRPI-LGAT shows superior performance on three benchmark datasets, demonstrating the effectiveness of our ncRPI-LGAT method in predicting ncRNA-protein interactions.
Collapse
|
11
|
Ren ZH, You ZH, Zou Q, Yu CQ, Ma YF, Guan YJ, You HR, Wang XF, Pan J. DeepMPF: deep learning framework for predicting drug-target interactions based on multi-modal representation with meta-path semantic analysis. J Transl Med 2023; 21:48. [PMID: 36698208 PMCID: PMC9876420 DOI: 10.1186/s12967-023-03876-3] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2022] [Accepted: 01/05/2023] [Indexed: 01/26/2023] Open
Abstract
BACKGROUND Drug-target interaction (DTI) prediction has become a crucial prerequisite in drug design and drug discovery. However, the traditional biological experiment is time-consuming and expensive, as there are abundant complex interactions present in the large size of genomic and chemical spaces. For alleviating this phenomenon, plenty of computational methods are conducted to effectively complement biological experiments and narrow the search spaces into a preferred candidate domain. Whereas, most of the previous approaches cannot fully consider association behavior semantic information based on several schemas to represent complex the structure of heterogeneous biological networks. Additionally, the prediction of DTI based on single modalities cannot satisfy the demand for prediction accuracy. METHODS We propose a multi-modal representation framework of 'DeepMPF' based on meta-path semantic analysis, which effectively utilizes heterogeneous information to predict DTI. Specifically, we first construct protein-drug-disease heterogeneous networks composed of three entities. Then the feature information is obtained under three views, containing sequence modality, heterogeneous structure modality and similarity modality. We proposed six representative schemas of meta-path to preserve the high-order nonlinear structure and catch hidden structural information of the heterogeneous network. Finally, DeepMPF generates highly representative comprehensive feature descriptors and calculates the probability of interaction through joint learning. RESULTS To evaluate the predictive performance of DeepMPF, comparison experiments are conducted on four gold datasets. Our method can obtain competitive performance in all datasets. We also explore the influence of the different feature embedding dimensions, learning strategies and classification methods. Meaningfully, the drug repositioning experiments on COVID-19 and HIV demonstrate DeepMPF can be applied to solve problems in reality and help drug discovery. The further analysis of molecular docking experiments enhances the credibility of the drug candidates predicted by DeepMPF. CONCLUSIONS All the results demonstrate the effectively predictive capability of DeepMPF for drug-target interactions. It can be utilized as a useful tool to prescreen the most potential drug candidates for the protein. The web server of the DeepMPF predictor is freely available at http://120.77.11.78/DeepMPF/ , which can help relevant researchers to further study.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Zhu-Hong You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Quan Zou
- grid.54549.390000 0004 0369 4060Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054 China
| | - Chang-Qing Yu
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Yan-Fang Ma
- grid.417234.70000 0004 1808 3203Department of Galactophore, The Third People’s Hospital of Gansu Province, Lanzhou, 730020 China
| | - Yong-Jian Guan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Hai-Ru You
- grid.440588.50000 0001 0307 1240School of Computer Science, Northwestern Polytechnical University, Xi’an, 710129 China
| | - Xin-Fei Wang
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| | - Jie Pan
- grid.460132.20000 0004 1758 0275School of Information Engineering, Xijing University, Xi’an, 710100 China
| |
Collapse
|
12
|
Zhang Z, Xu J, Wu Y, Liu N, Wang Y, Liang Y. CapsNet-LDA: predicting lncRNA-disease associations using attention mechanism and capsule network based on multi-view data. Brief Bioinform 2023; 24:6889447. [PMID: 36511221 DOI: 10.1093/bib/bbac531] [Citation(s) in RCA: 20] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/24/2022] [Revised: 10/25/2022] [Accepted: 11/07/2022] [Indexed: 12/15/2022] Open
Abstract
Cumulative studies have shown that many long non-coding RNAs (lncRNAs) are crucial in a number of diseases. Predicting potential lncRNA-disease associations (LDAs) can facilitate disease prevention, diagnosis and treatment. Therefore, it is vital to develop practical computational methods for LDA prediction. In this study, we propose a novel predictor named capsule network (CapsNet)-LDA for LDA prediction. CapsNet-LDA first uses a stacked autoencoder for acquiring the informative low-dimensional representations of the lncRNA-disease pairs under multiple views, then the attention mechanism is leveraged to implement an adaptive allocation of importance weights to them, and they are subsequently processed using a CapsNet-based architecture for predicting LDAs. Different from the conventional convolutional neural networks (CNNs) that have some restrictions with the usage of scalar neurons and pooling operations. the CapsNets use vector neurons instead of scalar neurons that have better robustness for the complex combination of features and they use dynamic routing processes for updating parameters. CapsNet-LDA is superior to other five state-of-the-art models on four benchmark datasets, four perturbed datasets and an independent test set in the comparison experiments, demonstrating that CapsNet-LDA has excellent performance and robustness against perturbation, as well as good generalization ability. The ablation studies verify the effectiveness of some modules of CapsNet-LDA. Moreover, the ability of multi-view data to improve performance is proven. Case studies further indicate that CapsNet-LDA can accurately predict novel LDAs for specific diseases.
Collapse
Affiliation(s)
- Zequn Zhang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Junlin Xu
- College of Information Science and Engineering, Hunan University, Changsha 410082, Hunan, China
| | - Yanan Wu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Niannian Liu
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Yinglong Wang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| | - Ying Liang
- College of Computer and Information Engineering, Jiangxi Agricultural University, Nanchang, 310045 Jiangxi, China
| |
Collapse
|
13
|
Vora DS, Kalakoti Y, Sundar D. Computational Methods and Deep Learning for Elucidating Protein Interaction Networks. Methods Mol Biol 2023; 2553:285-323. [PMID: 36227550 DOI: 10.1007/978-1-0716-2617-7_15] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 06/16/2023]
Abstract
Protein interactions play a critical role in all biological processes, but experimental identification of protein interactions is a time- and resource-intensive process. The advances in next-generation sequencing and multi-omics technologies have greatly benefited large-scale predictions of protein interactions using machine learning methods. A wide range of tools have been developed to predict protein-protein, protein-nucleic acid, and protein-drug interactions. Here, we discuss the applications, methods, and challenges faced when employing the various prediction methods. We also briefly describe ways to overcome the challenges and prospective future developments in the field of protein interaction biology.
Collapse
Affiliation(s)
- Dhvani Sandip Vora
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Yogesh Kalakoti
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India
| | - Durai Sundar
- Department of Biochemical Engineering and Biotechnology, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
- School of Artificial Intelligence, Indian Institute of Technology Delhi, Hauz Khas, New Delhi, India.
| |
Collapse
|
14
|
Cui Z, Chen ZH, Zhang QH, Gribova V, Filaretov VF, Huang DS. RMSCNN: A Random Multi-Scale Convolutional Neural Network for Marine Microbial Bacteriocins Identification. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3663-3672. [PMID: 34699364 DOI: 10.1109/tcbb.2021.3122183] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
The abuse of traditional antibiotics has led to an increase in the resistance of bacteria and viruses. Similar to the function of antibacterial peptides, bacteriocins are more common as a kind of peptides produced by bacteria that have bactericidal or bacterial effects. More importantly, the marine environment is one of the most abundant resources for extracting marine microbial bacteriocins (MMBs). Identifying bacteriocins from marine microorganisms is a common goal for the development of new drugs. Effective use of MMBs will greatly alleviate the current antibiotic abuse problem. In this work, deep learning is used to identify meaningful MMBs. We propose a random multi-scale convolutional neural network method. In the scale setting, we set a random model to update the scale value randomly. The scale selection method can reduce the contingency caused by artificial setting under certain conditions, thereby making the method more extensive. The results show that the classification performance of the proposed method is better than the state-of-the-art classification methods. In addition, some potential MMBs are predicted, and some different sequence analyses are performed on these candidates. It is worth mentioning that after sequence analysis, the HNH endonucleases of different marine bacteria are considered as potential bacteriocins.
Collapse
|
15
|
Li B, Tian Y, Tian Y, Zhang S, Zhang X. Predicting Cancer Lymph-Node Metastasis From LncRNA Expression Profiles Using Local Linear Reconstruction Guided Distance Metric Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3179-3189. [PMID: 35139024 DOI: 10.1109/tcbb.2022.3149791] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Lymph-node metastasis is the most perilous cancer progressive state, where long non-coding RNA (lncRNA) has been confirmed to be an important genetic indicator in cancer prediction. However, lncRNA expression profile is often characterized of large features and small samples, it is urgent to establish an efficient judgment to deal with such high dimensional lncRNA data, which will aid in clinical targeted treatment. Thus, in this study, a local linear reconstruction guided distance metric learning is put forward to handle lncRNA data for determination of cancer lymph-node metastasis. In the original locally linear embedding (LLE) approach, any point can be approximately linearly reconstructed using its nearest neighborhood points, from which a novel distance metric can be learned by satisfying both nonnegative and sum-to-one constraints on the reconstruction weights. Taking the defined distance metric and lncRNA data supervised information into account, a local margin model will be deduced to find a low dimensional subspace for lncRNA signature extraction. At last, a classifier is constructed to predict cancer lymph-node metastasis, where the learned distance metric is also adopted. Several experiments on lncRNA data sets have been carried out, and experimental results show the performance of the proposed method by making comparisons with some other related dimensionality reduction methods and the classical classifier models.
Collapse
|
16
|
Arora V, Sanguinetti G. De novo prediction of RNA-protein interactions with graph neural networks. RNA (NEW YORK, N.Y.) 2022; 28:1469-1480. [PMID: 36008134 PMCID: PMC9745830 DOI: 10.1261/rna.079365.122] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/15/2022] [Accepted: 08/17/2022] [Indexed: 06/15/2023]
Abstract
RNA-binding proteins (RBPs) are key co- and post-transcriptional regulators of gene expression, playing a crucial role in many biological processes. Experimental methods like CLIP-seq have enabled the identification of transcriptome-wide RNA-protein interactions for select proteins; however, the time- and resource-intensive nature of these technologies call for the development of computational methods to complement their predictions. Here, we leverage recent, large-scale CLIP-seq experiments to construct a de novo predictor of RNA-protein interactions based on graph neural networks (GNN). We show that the GNN method allows us not only to predict missing links in an RNA-protein network, but to predict the entire complement of targets of previously unassayed proteins, and even to reconstruct the entire network of RNA-protein interactions in different conditions based on minimal information. Our results demonstrate the potential of modern machine learning methods to extract useful information on post-transcriptional regulation from large data sets.
Collapse
Affiliation(s)
- Viplove Arora
- Data Science, Department of Physics, SISSA, Trieste 34136, Italy
| | | |
Collapse
|
17
|
Wu QW, Cao RF, Xia JF, Ni JC, Zheng CH, Su YS. Extra Trees Method for Predicting LncRNA-Disease Association Based On Multi-Layer Graph Embedding Aggregation. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3171-3178. [PMID: 34529571 DOI: 10.1109/tcbb.2021.3113122] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Lots of experimental studies have revealed the significant associations between lncRNAs and diseases. Identifying accurate associations will provide a new perspective for disease therapy. Calculation-based methods have been developed to solve these problems, but these methods have some limitations. In this paper, we proposed an accurate method, named MLGCNET, to discover potential lncRNA-disease associations. Firstly, we reconstructed similarity networks for both lncRNAs and diseases using top k similar information, and constructed a lncRNA-disease heterogeneous network (LDN). Then, we applied Multi-Layer Graph Convolutional Network on LDN to obtain latent feature representations of nodes. Finally, the Extra Trees was used to calculate the probability of association between disease and lncRNA. The results of extensive 5-fold cross-validation experiments show that MLGCNET has superior prediction performance compared to the state-of-the-art methods. Case studies confirm the performance of our model on specific diseases. All the experiment results prove the effectiveness and practicality of MLGCNET in predicting potential lncRNA-disease associations.
Collapse
|
18
|
Lu X, Li J, Zhu Z, Yuan Y, Chen G, He K. Predicting miRNA-Disease Associations via Combining Probability Matrix Feature Decomposition With Neighbor Learning. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2022; 19:3160-3170. [PMID: 34260356 DOI: 10.1109/tcbb.2021.3097037] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/13/2023]
Abstract
Predicting the associations of miRNAs and diseases may uncover the causation of various diseases. Many methods are emerging to tackle the sparse and unbalanced disease related miRNA prediction. Here, we propose a Probabilistic matrix decomposition combined with neighbor learning to identify MiRNA-Disease Associations utilizing heterogeneous data(PMDA). First, we build similarity networks for diseases and miRNAs, respectively, by integrating semantic information and functional interactions. Second, we construct a neighbor learning model in which the neighbor information of individual miRNA or disease is utilized to enhance the association relationship to tackle the spare problem. Third, we predict the potential association between miRNAs and diseases via probability matrix decomposition. The experimental results show that PMDA is superior to other five methods in sparse and unbalanced data. The case study shows that the new miRNA-disease interactions predicted by the PMDA are effective and the performance of the PMDA is superior to other methods.
Collapse
|
19
|
Pepe G, Appierdo R, Carrino C, Ballesio F, Helmer-Citterich M, Gherardini PF. Artificial intelligence methods enhance the discovery of RNA interactions. Front Mol Biosci 2022; 9:1000205. [PMID: 36275611 PMCID: PMC9585310 DOI: 10.3389/fmolb.2022.1000205] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2022] [Accepted: 09/20/2022] [Indexed: 11/13/2022] Open
Abstract
Understanding how RNAs interact with proteins, RNAs, or other molecules remains a challenge of main interest in biology, given the importance of these complexes in both normal and pathological cellular processes. Since experimental datasets are starting to be available for hundreds of functional interactions between RNAs and other biomolecules, several machine learning and deep learning algorithms have been proposed for predicting RNA-RNA or RNA-protein interactions. However, most of these approaches were evaluated on a single dataset, making performance comparisons difficult. With this review, we aim to summarize recent computational methods, developed in this broad research area, highlighting feature encoding and machine learning strategies adopted. Given the magnitude of the effect that dataset size and quality have on performance, we explored the characteristics of these datasets. Additionally, we discuss multiple approaches to generate datasets of negative examples for training. Finally, we describe the best-performing methods to predict interactions between proteins and specific classes of RNA molecules, such as circular RNAs (circRNAs) and long non-coding RNAs (lncRNAs), and methods to predict RNA-RNA or RNA-RBP interactions independently of the RNA type.
Collapse
Affiliation(s)
- G Pepe
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - R Appierdo
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - C Carrino
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - F Ballesio
- PhD Program in Cellular and Molecular Biology, Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| | - M Helmer-Citterich
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
- *Correspondence: G Pepe, ; M Helmer-Citterich,
| | - PF Gherardini
- Department of Biology, University of Rome “Tor Vergata”, Rome, Italy
| |
Collapse
|
20
|
Predicting Protein–Protein Interactions Based on Ensemble Learning-Based Model from Protein Sequence. BIOLOGY 2022; 11:biology11070995. [PMID: 36101379 PMCID: PMC9311754 DOI: 10.3390/biology11070995] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 04/24/2022] [Revised: 05/27/2022] [Accepted: 06/29/2022] [Indexed: 11/17/2022]
Abstract
Simple Summary Due to most traditional high-throughput experiments are tedious and laborious in identifying potential protein–protein interaction. To better improve accuracy prediction in protein–protein interactions. We proposed a novel computational method that can identify unknown protein–protein interaction efficiently and hope this method can provide a helpful idea and tool for proteomics research. Abstract Protein–protein interactions (PPIs) play an essential role in many biological cellular functions. However, it is still tedious and time-consuming to identify protein–protein interactions through traditional experimental methods. For this reason, it is imperative and necessary to develop a computational method for predicting PPIs efficiently. This paper explores a novel computational method for detecting PPIs from protein sequence, the approach which mainly adopts the feature extraction method: Locality Preserving Projections (LPP) and classifier: Rotation Forest (RF). Specifically, we first employ the Position Specific Scoring Matrix (PSSM), which can remain evolutionary information of biological for representing protein sequence efficiently. Then, the LPP descriptor is applied to extract feature vectors from PSSM. The feature vectors are fed into the RF to obtain the final results. The proposed method is applied to two datasets: Yeast and H. pylori, and obtained an average accuracy of 92.81% and 92.56%, respectively. We also compare it with K nearest neighbors (KNN) and support vector machine (SVM) to better evaluate the performance of the proposed method. In summary, all experimental results indicate that the proposed approach is stable and robust for predicting PPIs and promising to be a useful tool for proteomics research.
Collapse
|
21
|
Wang Y, Wang LL, Wong L, Li Y, Wang L, You ZH. SIPGCN: A Novel Deep Learning Model for Predicting Self-Interacting Proteins from Sequence Information Using Graph Convolutional Networks. Biomedicines 2022; 10:biomedicines10071543. [PMID: 35884848 PMCID: PMC9313220 DOI: 10.3390/biomedicines10071543] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/26/2022] [Revised: 06/24/2022] [Accepted: 06/24/2022] [Indexed: 11/16/2022] Open
Abstract
Protein is the basic organic substance that constitutes the cell and is the material condition for the life activity and the guarantee of the biological function activity. Elucidating the interactions and functions of proteins is a central task in exploring the mysteries of life. As an important protein interaction, self-interacting protein (SIP) has a critical role. The fast growth of high-throughput experimental techniques among biomolecules has led to a massive influx of available SIP data. How to conduct scientific research using the massive amount of SIP data has become a new challenge that is being faced in related research fields such as biology and medicine. In this work, we design an SIP prediction method SIPGCN using a deep learning graph convolutional network (GCN) based on protein sequences. First, protein sequences are characterized using a position-specific scoring matrix, which is able to describe the biological evolutionary message, then their hidden features are extracted by the deep learning method GCN, and, finally, the random forest is utilized to predict whether there are interrelationships between proteins. In the cross-validation experiment, SIPGCN achieved 93.65% accuracy and 99.64% specificity in the human data set. SIPGCN achieved 90.69% and 99.08% of these two indicators in the yeast data set, respectively. Compared with other feature models and previous methods, SIPGCN showed excellent results. These outcomes suggest that SIPGCN may be a suitable instrument for predicting SIP and may be a reliable candidate for future wet experiments.
Collapse
Affiliation(s)
- Ying Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
| | - Lin-Lin Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Correspondence: (L.-L.W.); (L.W.)
| | - Leon Wong
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
| | - Yang Li
- School of Computer Science and Information Engineering, Hefei University of Technology, Hefei 230601, China;
| | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Zaozhuang 277160, China;
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- Correspondence: (L.-L.W.); (L.W.)
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China; (L.W.); (Z.-H.Y.)
- School of Computer Science, Northwestern Polytechnical University, Xi’an 710129, China
| |
Collapse
|
22
|
Mi JX, Feng J, Huang KY. Designing efficient convolutional neural network structure: A survey. Neurocomputing 2022. [DOI: 10.1016/j.neucom.2021.08.158] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/18/2022]
|
23
|
Song J, Tian S, Yu L, Yang Q, Dai Q, Wang Y, Wu W, Duan X. RLF-LPI: An ensemble learning framework using sequence information for predicting lncRNA-protein interaction based on AE-ResLSTM and fuzzy decision. MATHEMATICAL BIOSCIENCES AND ENGINEERING : MBE 2022; 19:4749-4764. [PMID: 35430839 DOI: 10.3934/mbe.2022222] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]
Abstract
Long non-coding RNAs (lncRNAs) play a regulatory role in many biological cells, and the recognition of lncRNA-protein interactions is helpful to reveal the functional mechanism of lncRNAs. Identification of lncRNA-protein interaction by biological techniques is costly and time-consuming. Here, an ensemble learning framework, RLF-LPI is proposed, to predict lncRNA-protein interactions. The RLF-LPI of the residual LSTM autoencoder module with fusion attention mechanism can extract the potential representation of features and capture the dependencies between sequences and structures by k-mer method. Finally, the relationship between lncRNA and protein is learned through the method of fuzzy decision. The experimental results show that the ACC of RLF-LPI is 0.912 on ATH948 dataset and 0.921 on ZEA22133 dataset. Thus, it is demonstrated that our proposed method performed better in predicting lncRNA-protein interaction than other methods.
Collapse
Affiliation(s)
- Jinmiao Song
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Shengwei Tian
- Department of Software, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Signal and Information Processing, Xinjiang University, Urumqi 830008, China
- Key Laboratory of Software Engineering Technology, Xinjiang University, Urumqi 830008, China
| | - Long Yu
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qimeng Yang
- Department of Information Science and Engineering, Xinjiang University, Urumqi 830008, China
| | - Qiguo Dai
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Yuanxu Wang
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| | - Weidong Wu
- Center for Science Education, People's Hospital of Xinjiang Uygur Autonomous Region, Urumqi 830001, China
| | - Xiaodong Duan
- Key Laboratory of Big Data Applied Technology, State Ethnic Affairs Commission, Dalian Minzu University, Dalian 116600, China
| |
Collapse
|
24
|
Li LP, Zhang B, Cheng L. CPIELA: Computational Prediction of Plant Protein–Protein Interactions by Ensemble Learning Approach From Protein Sequences and Evolutionary Information. Front Genet 2022; 13:857839. [PMID: 35360876 PMCID: PMC8963800 DOI: 10.3389/fgene.2022.857839] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/19/2022] [Accepted: 02/10/2022] [Indexed: 11/22/2022] Open
Abstract
Identification and characterization of plant protein–protein interactions (PPIs) are critical in elucidating the functions of proteins and molecular mechanisms in a plant cell. Although experimentally validated plant PPIs data have become increasingly available in diverse plant species, the high-throughput techniques are usually expensive and labor-intensive. With the incredibly valuable plant PPIs data accumulating in public databases, it is progressively important to propose computational approaches to facilitate the identification of possible PPIs. In this article, we propose an effective framework for predicting plant PPIs by combining the position-specific scoring matrix (PSSM), local optimal-oriented pattern (LOOP), and ensemble rotation forest (ROF) model. Specifically, the plant protein sequence is firstly transformed into the PSSM, in which the protein evolutionary information is perfectly preserved. Then, the local textural descriptor LOOP is employed to extract texture variation features from PSSM. Finally, the ROF classifier is adopted to infer the potential plant PPIs. The performance of CPIELA is evaluated via cross-validation on three plant PPIs datasets: Arabidopsis thaliana, Zea mays, and Oryza sativa. The experimental results demonstrate that the CPIELA method achieved the high average prediction accuracies of 98.63%, 98.09%, and 94.02%, respectively. To further verify the high performance of CPIELA, we also compared it with the other state-of-the-art methods on three gold standard datasets. The experimental results illustrate that CPIELA is efficient and reliable for predicting plant PPIs. It is anticipated that the CPIELA approach could become a useful tool for facilitating the identification of possible plant PPIs.
Collapse
Affiliation(s)
- Li-Ping Li
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China
- Xinjiang Key Laboratory of Grassland Resources and Ecology, Urumqi, China
- *Correspondence: Li-Ping Li, ; Bo Zhang,
| | - Bo Zhang
- College of Grassland and Environment Sciences, Xinjiang Agricultural University, Urumqi, China
- Xinjiang Key Laboratory of Grassland Resources and Ecology, Urumqi, China
- *Correspondence: Li-Ping Li, ; Bo Zhang,
| | - Li Cheng
- Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Science, Urumqi, China
| |
Collapse
|
25
|
Su L, Xu C, Zeng S, Su L, Joshi T, Stacey G, Xu D. Large-Scale Integrative Analysis of Soybean Transcriptome Using an Unsupervised Autoencoder Model. FRONTIERS IN PLANT SCIENCE 2022; 13:831204. [PMID: 35310659 PMCID: PMC8927983 DOI: 10.3389/fpls.2022.831204] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 12/08/2021] [Accepted: 02/09/2022] [Indexed: 06/14/2023]
Abstract
Plant tissues are distinguished by their gene expression patterns, which can help identify tissue-specific highly expressed genes and their differential functional modules. For this purpose, large-scale soybean transcriptome samples were collected and processed starting from raw sequencing reads in a uniform analysis pipeline. To address the gene expression heterogeneity in different tissues, we utilized an adversarial deconfounding autoencoder (AD-AE) model to map gene expressions into a latent space and adapted a standard unsupervised autoencoder (AE) model to help effectively extract meaningful biological signals from the noisy data. As a result, four groups of 1,743, 914, 2,107, and 1,451 genes were found highly expressed specifically in leaf, root, seed and nodule tissues, respectively. To obtain key transcription factors (TFs), hub genes and their functional modules in each tissue, we constructed tissue-specific gene regulatory networks (GRNs), and differential correlation networks by using corrected and compressed gene expression data. We validated our results from the literature and gene enrichment analysis, which confirmed many identified tissue-specific genes. Our study represents the largest gene expression analysis in soybean tissues to date. It provides valuable targets for tissue-specific research and helps uncover broader biological patterns. Code is publicly available with open source at https://github.com/LingtaoSu/SoyMeta.
Collapse
Affiliation(s)
- Lingtao Su
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Chunhui Xu
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Shuai Zeng
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Li Su
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Trupti Joshi
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Department of Health Management and Informatics and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Gary Stacey
- Division of Plant Sciences and Technology and Biochemistry Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| | - Dong Xu
- Department of Electrical Engineering and Computer Science and Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
- Institute for Data Science and Informatics, Christopher S. Bond Life Sciences Center, University of Missouri, Columbia, MO, United States
| |
Collapse
|
26
|
Ren ZH, Yu CQ, Li LP, You ZH, Guan YJ, Li YC, Pan J. SAWRPI: A Stacking Ensemble Framework With Adaptive Weight for Predicting ncRNA-Protein Interactions Using Sequence Information. Front Genet 2022; 13:839540. [PMID: 35360836 PMCID: PMC8963817 DOI: 10.3389/fgene.2022.839540] [Citation(s) in RCA: 6] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/20/2021] [Accepted: 02/07/2022] [Indexed: 11/13/2022] Open
Abstract
Non-coding RNAs (ncRNAs) take essential effects on biological processes, like gene regulation. One critical way of ncRNA executing biological functions is interactions between ncRNA and RNA binding proteins (RBPs). Identifying proteins, involving ncRNA-protein interactions, can well understand the function ncRNA. Many high-throughput experiment have been applied to recognize the interactions. As a consequence of these approaches are time- and labor-consuming, currently, a great number of computational methods have been developed to improve and advance the ncRNA-protein interactions research. However, these methods may be not available to all RNAs and proteins, particularly processing new RNAs and proteins. Additionally, most of them cannot process well with long sequence. In this work, a computational method SAWRPI is proposed to make prediction of ncRNA-protein through sequence information. More specifically, the raw features of protein and ncRNA are firstly extracted through the k-mer sparse matrix with SVD reduction and learning nucleic acid symbols by natural language processing with local fusion strategy, respectively. Then, to classify easily, Hilbert Transformation is exploited to transform raw feature data to the new feature space. Finally, stacking ensemble strategy is adopted to learn high-level abstraction features automatically and generate final prediction results. To confirm the robustness and stability, three different datasets containing two kinds of interactions are utilized. In comparison with state-of-the-art methods and other results classifying or feature extracting strategies, SAWRPI achieved high performance on three datasets, containing two kinds of lncRNA-protein interactions. Upon our finding, SAWRPI is a trustworthy, robust, yet simple and can be used as a beneficial supplement to the task of predicting ncRNA-protein interactions.
Collapse
Affiliation(s)
- Zhong-Hao Ren
- School of Information Engineering, Xijing University, Xi’an, China
| | - Chang-Qing Yu
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
- *Correspondence: Li-Ping Li, ; Chang-Qing Yu,
| | - Zhu-Hong You
- School of Computer Science, Northwestern Polytechnical University, Xi’an, China
| | - Yong-Jian Guan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Yue-Chao Li
- School of Information Engineering, Xijing University, Xi’an, China
| | - Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| |
Collapse
|
27
|
Cui F, Zhang Z, Cao C, Zou Q, Chen D, Su X. Protein-DNA/RNA interactions: Machine intelligence tools and approaches in the era of artificial intelligence and big data. Proteomics 2022; 22:e2100197. [PMID: 35112474 DOI: 10.1002/pmic.202100197] [Citation(s) in RCA: 25] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2021] [Revised: 01/02/2022] [Accepted: 01/17/2022] [Indexed: 11/09/2022]
Abstract
With the development of artificial intelligence technologies and the availability of large amounts of biological data, computational methods for proteomics have undergone a developmental process from traditional machine learning to deep learning. This review focuses on computational approaches and tools for the prediction of protein-DNA/RNA interactions using machine intelligence techniques. We provide an overview of the development progress of computational methods and summarize the advantages and shortcomings of these methods. We further compiled applications in tasks related to the protein-DNA/RNA interactions, and pointed out possible future application trends. Moreover, biological sequence-digitizing representation strategies used in different types of computational methods are also summarized and discussed. This article is protected by copyright. All rights reserved.
Collapse
Affiliation(s)
- Feifei Cui
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Zilong Zhang
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Chen Cao
- Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Quan Zou
- Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, 610054, China.,Yangtze Delta Region Institute (Quzhou), University of Electronic Science and Technology of China, Quzhou, 324000, China
| | - Dong Chen
- College of Electrical and Information Engineering, Quzhou University, Quzhou, 324000, China
| | - Xi Su
- Foshan Maternal and Child Health Hospital, Foshan, Guangdong, China
| |
Collapse
|
28
|
Jiang H, Huang Y. An effective drug-disease associations prediction model based on graphic representation learning over multi-biomolecular network. BMC Bioinformatics 2022; 23:9. [PMID: 34983364 PMCID: PMC8726520 DOI: 10.1186/s12859-021-04553-2] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/28/2021] [Accepted: 12/29/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug-disease associations (DDAs) can provide important information for exploring the potential efficacy of drugs. However, up to now, there are still few DDAs verified by experiments. Previous evidence indicates that the combination of information would be conducive to the discovery of new DDAs. How to integrate different biological data sources and identify the most effective drugs for a certain disease based on drug-disease coupled mechanisms is still a challenging problem. RESULTS In this paper, we proposed a novel computation model for DDA predictions based on graph representation learning over multi-biomolecular network (GRLMN). More specifically, we firstly constructed a large-scale molecular association network (MAN) by integrating the associations among drugs, diseases, proteins, miRNAs, and lncRNAs. Then, a graph embedding model was used to learn vector representations for all drugs and diseases in MAN. Finally, the combined features were fed to a random forest (RF) model to predict new DDAs. The proposed model was evaluated on the SCMFDD-S data set using five-fold cross-validation. Experiment results showed that GRLMN model was very accurate with the area under the ROC curve (AUC) of 87.9%, which outperformed all previous works in terms of both accuracy and AUC in benchmark dataset. To further verify the high performance of GRLMN, we carried out two case studies for two common diseases. As a result, in the ranking of drugs that were predicted to be related to certain diseases (such as kidney disease and fever), 15 of the top 20 drugs have been experimentally confirmed. CONCLUSIONS The experimental results show that our model has good performance in the prediction of DDA. GRLMN is an effective prioritization tool for screening the reliable DDAs for follow-up studies concerning their participation in drug reposition.
Collapse
Affiliation(s)
- Hanjing Jiang
- Key Laboratory of Image Information Processing and Intelligent Control of Education Ministry of China, Institute of Artificial Intelligence, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, Wuhan, 430074, China
| | - Yabing Huang
- Department of Pathology, Renmin Hospital of Wuhan University, Wuhan, 430060, Hubei, China.
| |
Collapse
|
29
|
Ge F, Hu J, Zhu YH, Arif M, Yu DJ. TargetMM: Accurate Missense Mutation Prediction by Utilizing Local and Global Sequence Information with Classifier Ensemble. Comb Chem High Throughput Screen 2022; 25:38-52. [PMID: 33280588 DOI: 10.2174/1386207323666201204140438] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2020] [Revised: 10/22/2020] [Accepted: 10/26/2020] [Indexed: 11/22/2022]
Abstract
AIM AND OBJECTIVE Missense mutation (MM) may lead to various human diseases by disabling proteins. Accurate prediction of MM is important and challenging for both protein function annotation and drug design. Although several computational methods yielded acceptable success rates, there is still room for further enhancing the prediction performance of MM. MATERIALS AND METHODS In the present study, we designed a new feature extracting method, which considers the impact degree of residues in the microenvironment range to the mutation site. Stringent cross-validation and independent test on benchmark datasets were performed to evaluate the efficacy of the proposed feature extracting method. Furthermore, three heterogeneous prediction models were trained and then ensembled for the final prediction. By combining the feature representation method and classifier ensemble technique, we reported a novel MM predictor called TargetMM for identifying the pathogenic mutations from the neutral ones. RESULTS Comparison outcomes based on statistical evaluation demonstrate that TargetMM outperforms the prior advanced methods on the independent test data. The source codes and benchmark datasets of TargetMM are freely available at https://github.com/sera616/TargetMM.git for academic use.
Collapse
Affiliation(s)
- Fang Ge
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Jun Hu
- College of Information Engineering, Zhejiang University of Technology, Hangzhou 310023,China
| | - Yi-Heng Zhu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Muhammad Arif
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| | - Dong-Jun Yu
- School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing 210094,China
| |
Collapse
|
30
|
Ahmed S, Muhammod R, Khan ZH, Adilina S, Sharma A, Shatabda S, Dehzangi A. ACP-MHCNN: an accurate multi-headed deep-convolutional neural network to predict anticancer peptides. Sci Rep 2021; 11:23676. [PMID: 34880291 PMCID: PMC8654959 DOI: 10.1038/s41598-021-02703-3] [Citation(s) in RCA: 46] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2021] [Accepted: 11/17/2021] [Indexed: 01/10/2023] Open
Abstract
Although advancing the therapeutic alternatives for treating deadly cancers has gained much attention globally, still the primary methods such as chemotherapy have significant downsides and low specificity. Most recently, Anticancer peptides (ACPs) have emerged as a potential alternative to therapeutic alternatives with much fewer negative side-effects. However, the identification of ACPs through wet-lab experiments is expensive and time-consuming. Hence, computational methods have emerged as viable alternatives. During the past few years, several computational ACP identification techniques using hand-engineered features have been proposed to solve this problem. In this study, we propose a new multi headed deep convolutional neural network model called ACP-MHCNN, for extracting and combining discriminative features from different information sources in an interactive way. Our model extracts sequence, physicochemical, and evolutionary based features for ACP identification using different numerical peptide representations while restraining parameter overhead. It is evident through rigorous experiments using cross-validation and independent-dataset that ACP-MHCNN outperforms other models for anticancer peptide identification by a substantial margin on our employed benchmarks. ACP-MHCNN outperforms state-of-the-art model by 6.3%, 8.6%, 3.7%, 4.0%, and 0.20 in terms of accuracy, sensitivity, specificity, precision, and MCC respectively. ACP-MHCNN and its relevant codes and datasets are publicly available at: https://github.com/mrzResearchArena/Anticancer-Peptides-CNN . ACP-MHCNN is also publicly available as an online predictor at: https://anticancer.pythonanywhere.com/ .
Collapse
Affiliation(s)
- Sajid Ahmed
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Rafsanjani Muhammod
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Zahid Hossain Khan
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Sheikh Adilina
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh
| | - Alok Sharma
- Laboratory for Medical Science Mathematics, RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan
- Institute for Integrated and Intelligent Systems, Griffith University, Brisbane, QLD, 4111, Australia
| | - Swakkhar Shatabda
- Department of Computer Science and Engineering, United International University, Dhaka, Bangladesh.
| | - Abdollah Dehzangi
- Department of Computer Science, Rutgers University, Camden, NJ, 08102, USA.
- Center for Computational and Integrative Biology, Rutgers University, Camden, NJ, 08102, USA.
| |
Collapse
|
31
|
Yi HC, You ZH, Guo ZH, Huang DS, Chan KCC. Learning Representation of Molecules in Association Network for Predicting Intermolecular Associations. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:2546-2554. [PMID: 32070992 DOI: 10.1109/tcbb.2020.2973091] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
A key aim of post-genomic biomedical research is to systematically understand molecules and their interactions in human cells. Multiple biomolecules coordinate to sustain life activities, and interactions between various biomolecules are interconnected. However, existing studies usually only focusing on associations between two or very limited types of molecules. In this study, we propose a network representation learning based computational framework MAN-SDNE to predict any intermolecular associations. More specifically, we constructed a large-scale molecular association network of multiple biomolecules in human by integrating associations among long non-coding RNA, microRNA, protein, drug, and disease, containing 6,528 molecular nodes, 9 kind of,105,546 associations. And then, the feature of each node is represented by its network proximity and attribute features. Furthermore, these features are used to train Random Forest classifier to predict intermolecular associations. MAN-SDNE achieves a remarkable performance with an AUC of 0.9552 and an AUPR of 0.9338 under five-fold cross-validation. To indicate the ability to predict specific types of interactions, a case study for predicting lncRNA-protein interactions using MAN-SDNE is also executed. Experimental results demonstrate this work offers a systematic insight for understanding the synergistic associations between molecules and complex diseases and provides a network-based computational tool to systematically explore intermolecular interactions.
Collapse
|
32
|
Rincón-Riveros A, Morales D, Rodríguez JA, Villegas VE, López-Kleine L. Bioinformatic Tools for the Analysis and Prediction of ncRNA Interactions. Int J Mol Sci 2021; 22:11397. [PMID: 34768830 PMCID: PMC8583695 DOI: 10.3390/ijms222111397] [Citation(s) in RCA: 23] [Impact Index Per Article: 5.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2021] [Revised: 09/30/2021] [Accepted: 09/30/2021] [Indexed: 12/16/2022] Open
Abstract
Noncoding RNAs (ncRNAs) play prominent roles in the regulation of gene expression via their interactions with other biological molecules such as proteins and nucleic acids. Although much of our knowledge about how these ncRNAs operate in different biological processes has been obtained from experimental findings, computational biology can also clearly substantially boost this knowledge by suggesting possible novel interactions of these ncRNAs with other molecules. Computational predictions are thus used as an alternative source of new insights through a process of mutual enrichment because the information obtained through experiments continuously feeds through into computational methods. The results of these predictions in turn shed light on possible interactions that are subsequently validated experimentally. This review describes the latest advances in databases, bioinformatic tools, and new in silico strategies that allow the establishment or prediction of biological interactions of ncRNAs, particularly miRNAs and lncRNAs. The ncRNA species described in this work have a special emphasis on those found in humans, but information on ncRNA of other species is also included.
Collapse
Affiliation(s)
- Andrés Rincón-Riveros
- Bioinformatics and Systems Biology Group, Universidad Nacional de Colombia, Bogotá 111221, Colombia;
| | - Duvan Morales
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá 111221, Colombia;
| | - Josefa Antonia Rodríguez
- Grupo de Investigación en Biología del Cáncer, Instituto Nacional de Cancerología, Bogotá 111221, Colombia;
| | - Victoria E. Villegas
- Centro de Investigaciones en Microbiología y Biotecnología-UR (CIMBIUR), Facultad de Ciencias Naturales, Universidad del Rosario, Bogotá 111221, Colombia;
| | - Liliana López-Kleine
- Department of Statistics, Faculty of Science, Universidad Nacional de Colombia, Bogotá 111221, Colombia
| |
Collapse
|
33
|
Wang S, He Y, Chen Z, Zhang Q. FCNGRU: Locating Transcription Factor Binding Sites by combing Fully Convolutional Neural Network with Gated Recurrent Unit. IEEE J Biomed Health Inform 2021; 26:1883-1890. [PMID: 34613923 DOI: 10.1109/jbhi.2021.3117616] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/08/2022]
Abstract
Deciphering the relationship between transcription factors (TFs) and DNA sequences is very helpful for computational inference of gene regulation and a comprehensive understanding of gene regulation mechanisms. Transcription factor binding sites (TFBSs) are specific DNA short sequences that play a pivotal role in controlling gene expression through interaction with TF proteins. Although recently many computational and deep learning methods have been proposed to predict TFBSs aiming to predict sequence specificity of TF-DNA binding, there is still a lack of effective methods to directly locate TFBSs. In order to address this problem, we propose FCNGRU combing a fully convolutional neural network (FCN) with the gated recurrent unit (GRU) to directly locate TFBSs in this paper. Furthermore, we present a two-task framework (FCNGRU-double): one is a classification task at nucleotide level which predicts the probability of each nucleotide and locates TFBSs, and the other is a regression task at sequence level which predicts the intensity of each sequence. A series of experiments are conducted on 45 in-vitro datasets collected from the UniPROBE database derived from universal protein binding microarrays (uPBMs). Compared with competing methods, FCNGRU-double achieves much better results on these datasets. Moreover, FCNGRU-double has an advantage over a single-task framework, FCNGRU-single, which only contains the branch of locating TFBSs. In additionwe combine with in vivo datasets to make a further analysis and discussion. The source codes are avaiable at https://github.com/wangguoguoa/FCNGRU.
Collapse
|
34
|
Pan J, Li LP, You ZH, Yu CQ, Ren ZH, Guan YJ. Prediction of Protein-Protein Interactions in Arabidopsis, Maize, and Rice by Combining Deep Neural Network With Discrete Hilbert Transform. Front Genet 2021; 12:745228. [PMID: 34616437 PMCID: PMC8488469 DOI: 10.3389/fgene.2021.745228] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2021] [Accepted: 08/18/2021] [Indexed: 11/21/2022] Open
Abstract
Protein-protein interactions (PPIs) in plants play an essential role in the regulation of biological processes. However, traditional experimental methods are expensive, time-consuming, and need sophisticated technical equipment. These drawbacks motivated the development of novel computational approaches to predict PPIs in plants. In this article, a new deep learning framework, which combined the discrete Hilbert transform (DHT) with deep neural networks (DNN), was presented to predict PPIs in plants. To be more specific, plant protein sequences were first transformed as a position-specific scoring matrix (PSSM). Then, DHT was employed to capture features from the PSSM. To improve the prediction accuracy, we used the singular value decomposition algorithm to decrease noise and reduce the dimensions of the feature descriptors. Finally, these feature vectors were fed into DNN for training and predicting. When performing our method on three plant PPI datasets Arabidopsis thaliana, maize, and rice, we achieved good predictive performance with average area under receiver operating characteristic curve values of 0.8369, 0.9466, and 0.9440, respectively. To fully verify the predictive ability of our method, we compared it with different feature descriptors and machine learning classifiers. Moreover, to further demonstrate the generality of our approach, we also test it on the yeast and human PPI dataset. Experimental results anticipated that our method is an efficient and promising computational model for predicting potential plant-protein interacted pairs.
Collapse
Affiliation(s)
- Jie Pan
- School of Information Engineering, Xijing University, Xi’an, China
| | - Li-Ping Li
- School of Information Engineering, Xijing University, Xi’an, China
| | | | | | | | | |
Collapse
|
35
|
Arzua T, Jiang C, Yan Y, Bai X. The importance of non-coding RNAs in environmental stress-related developmental brain disorders: A systematic review of evidence associated with exposure to alcohol, anesthetic drugs, nicotine, and viral infections. Neurosci Biobehav Rev 2021; 128:633-647. [PMID: 34186153 PMCID: PMC8357057 DOI: 10.1016/j.neubiorev.2021.06.033] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2020] [Revised: 05/23/2021] [Accepted: 06/23/2021] [Indexed: 12/11/2022]
Abstract
Brain development is a dynamic and lengthy process that includes cell proliferation, migration, neurogenesis, gliogenesis, synaptogenesis, and pruning. Disruption of any of these developmental events can result in long-term outcomes ranging from brain structural changes, to cognitive and behavioral abnormality, with the mechanisms largely unknown. Emerging evidence suggests non-coding RNAs (ncRNAs) as pivotal molecules that participate in normal brain development and neurodevelopmental disorders. NcRNAs such as long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) are transcribed from the genome but not translated into proteins. Many ncRNAs have been implicated as tuners of cell fate. In this review, we started with an introduction of the current knowledge of lncRNAs and miRNAs, and their potential roles in brain development in health and disorders. We then reviewed and discussed the evidence of ncRNA involvement in abnormal brain development resulted from alcohol, anesthetic drugs, nicotine, and viral infections. The complex connections among these ncRNAs were also discussed, along with potential overlapping ncRNA mechanisms, possible pharmacological targets for therapeutic/neuroprotective interventions, and potential biomarkers for brain developmental disorders.
Collapse
Affiliation(s)
- Thiago Arzua
- Department of Cell Biology, Neurobiology & Anatomy, Medical College of Wisconsin, Milwaukee, WI, 53226, USA; Department of Physiology, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Congshan Jiang
- Department of Anesthesiology, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Yasheng Yan
- Department of Cell Biology, Neurobiology & Anatomy, Medical College of Wisconsin, Milwaukee, WI, 53226, USA
| | - Xiaowen Bai
- Department of Cell Biology, Neurobiology & Anatomy, Medical College of Wisconsin, Milwaukee, WI, 53226, USA; Center of Systems Molecular Medicine, Medical College of Wisconsin, Milwaukee, WI, 53226, USA.
| |
Collapse
|
36
|
Buongiorno D, Cascarano GD, De Feudis I, Brunetti A, Carnimeo L, Dimauro G, Bevilacqua V. Deep learning for processing electromyographic signals: A taxonomy-based survey. Neurocomputing 2021. [DOI: 10.1016/j.neucom.2020.06.139] [Citation(s) in RCA: 16] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/28/2022]
|
37
|
Zhou H, Wekesa JS, Luan Y, Meng J. PRPI-SC: an ensemble deep learning model for predicting plant lncRNA-protein interactions. BMC Bioinformatics 2021; 22:415. [PMID: 34429059 PMCID: PMC8385908 DOI: 10.1186/s12859-021-04328-9] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2020] [Accepted: 11/09/2020] [Indexed: 01/04/2023] Open
Abstract
BACKGROUND Plant long non-coding RNAs (lncRNAs) play vital roles in many biological processes mainly through interactions with RNA-binding protein (RBP). To understand the function of lncRNAs, a fundamental method is to identify which types of proteins interact with the lncRNAs. However, the models or rules of interactions are a major challenge when calculating and estimating the types of RBP. RESULTS In this study, we propose an ensemble deep learning model to predict plant lncRNA-protein interactions using stacked denoising autoencoder and convolutional neural network based on sequence and structural information, named PRPI-SC. PRPI-SC predicts interactions between lncRNAs and proteins based on the k-mer features of RNAs and proteins. Experiments proved good results on Arabidopsis thaliana and Zea mays datasets (ATH948 and ZEA22133). The accuracy rates of ATH948 and ZEA22133 datasets were 88.9% and 82.6%, respectively. PRPI-SC also performed well on some public RNA protein interaction datasets. CONCLUSIONS PRPI-SC accurately predicts the interaction between plant lncRNA and protein, which plays a guiding role in studying the function and expression of plant lncRNA. At the same time, PRPI-SC has a strong generalization ability and good prediction effect for non-plant data.
Collapse
Affiliation(s)
- Haoran Zhou
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jael Sanyanda Wekesa
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Yushi Luan
- School of Bioengineering, Dalian University of Technology, Dalian, 116024 Liaoning China
| | - Jun Meng
- School of Computer Science and Technology, Dalian University of Technology, Dalian, 116024 Liaoning China
| |
Collapse
|
38
|
Pan Y, Lei X, Zhang Y. Association predictions of genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, radiomics, drug, symptoms, environment factor, and disease networks: A comprehensive approach. Med Res Rev 2021; 42:441-461. [PMID: 34346083 DOI: 10.1002/med.21847] [Citation(s) in RCA: 33] [Impact Index Per Article: 8.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2020] [Revised: 05/22/2021] [Accepted: 07/07/2021] [Indexed: 12/12/2022]
Abstract
Currently, the research of multi-omics, such as genomics, proteinomics, transcriptomics, microbiome, metabolomics, pathomics, and radiomics, are hot spots. The relationship between multi-omics data, drugs, and diseases has received extensive attention from researchers. At the same time, multi-omics can effectively predict the diagnosis, prognosis, and treatment of diseases. In essence, these research entities, such as genes, RNAs, proteins, microbes, metabolites, pathways as well as pathological and medical imaging data, can all be represented by the network at different levels. And some computer and biology scholars have tried to use computational methods to explore the potential relationships between biological entities. We summary a comprehensive research strategy, that is to build a multi-omics heterogeneous network, covering multimodal data, and use the current popular computational methods to make predictions. In this study, we first introduce the calculation method of the similarity of biological entities at the data level, second discuss multimodal data fusion and methods of feature extraction. Finally, the challenges and opportunities at this stage are summarized. Some scholars have used such a framework to calculate and predict. We also summarize them and discuss the challenges. We hope that our review could help scholars who are interested in the field of bioinformatics, biomedical image, and computer research.
Collapse
Affiliation(s)
- Yi Pan
- Faculty of Computer Science and Control Engineering, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Xiujuan Lei
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| | - Yuchen Zhang
- School of Computer Science, Shaanxi Normal University, Xi'an, China
| |
Collapse
|
39
|
Zhang J, Chen Q, Liu B. DeepDRBP-2L: A New Genome Annotation Predictor for Identifying DNA-Binding Proteins and RNA-Binding Proteins Using Convolutional Neural Network and Long Short-Term Memory. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2021; 18:1451-1463. [PMID: 31722485 DOI: 10.1109/tcbb.2019.2952338] [Citation(s) in RCA: 30] [Impact Index Per Article: 7.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
DNA-binding proteins (DBPs) and RNA-binding proteins (RBPs) are two kinds of crucial proteins, which are associated with various cellule activities and some important diseases. Accurate identification of DBPs and RBPs facilitate both theoretical research and real world application. Existing sequence-based DBP predictors can accurately identify DBPs but incorrectly predict many RBPs as DBPs, and vice versa, resulting in low prediction precision. Moreover, some proteins (DRBPs) interacting with both DNA and RNA play important roles in gene expression and cannot be identified by existing computational methods. In this study, a two-level predictor named DeepDRBP-2L was proposed by combining Convolutional Neural Network (CNN) and the Long Short-Term Memory (LSTM). It is the first computational method that is able to identify DBPs, RBPs and DRBPs. Rigorous cross-validations and independent tests showed that DeepDRBP-2L is able to overcome the shortcoming of the existing methods and can go one further step to identify DRBPs. Application of DeepDRBP-2L to tomato genome further demonstrated its performance. The webserver of DeepDRBP-2L is freely available at http://bliulab.net/DeepDRBP-2L.
Collapse
|
40
|
Bi XA, Li L, Xu R, Xing Z. Pathogenic Factors Identification of Brain Imaging and Gene in Late Mild Cognitive Impairment. Interdiscip Sci 2021; 13:511-520. [PMID: 34106420 DOI: 10.1007/s12539-021-00449-0] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/07/2021] [Revised: 06/01/2021] [Accepted: 06/04/2021] [Indexed: 11/28/2022]
Abstract
Mild cognitive impairment (MCI) is a dangerous signal of severe cognitive decline. It can be separated into two steps: early MCI (EMCI) and late MCI (LMCI). As the post-state of MCI and pre-state of Alzheimer's disease (AD), LMCI receives insufficient attention in the field of brain science, causing the internal mechanism of LMCI has not been well understood. To better explore the focus and pathological mechanism of LMCI, a method called genetic evolved random forest (GERF) is applied. Resting functional magnetic resonance imaging (rfMRI) and gene data are obtained from 62 subjects (36 LMCI and 26 normal controls), and Pearson correlation analysis is adopted to perform the multimodal fusion of two types of data to construct fusion features. We identified pathogenic brain regions and genes that are highly related to LMCI using GERF and achieves a good effect. Compared with the normal control (NC) group, the abnormal brain regions of LMCI are PUT.L, PreCG.L, IFGtriang.R, REC.R, DCG.R, PoCG.L, and HES.L, and the pathogenic genes are FHIT, RF00019, FRMD4A, PTPRD, and RBFOX1. More importantly, most of these risk genes and abnormal brain regions have been confirmed to be related to AD and MCI in previous studies. In this study, we mapped them to LMCI with higher accuracies, so as to provide a more robust understanding of the physiological mechanism of MCI.
Collapse
Affiliation(s)
- Xia-An Bi
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, People's Republic of China. .,College of Information Science and Engineering, Hunan Normal University, Changsha, People's Republic of China.
| | - Lou Li
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, People's Republic of China.,College of Information Science and Engineering, Hunan Normal University, Changsha, People's Republic of China
| | - Ruihui Xu
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, People's Republic of China.,College of Information Science and Engineering, Hunan Normal University, Changsha, People's Republic of China
| | - Zhaoxu Xing
- Hunan Provincial Key Laboratory of Intelligent Computing and Language Information Processing, Hunan Normal University, Changsha, People's Republic of China.,College of Information Science and Engineering, Hunan Normal University, Changsha, People's Republic of China
| |
Collapse
|
41
|
Yi HC, You ZH, Wang L, Su XR, Zhou X, Jiang TH. In silico drug repositioning using deep learning and comprehensive similarity measures. BMC Bioinformatics 2021; 22:293. [PMID: 34074242 PMCID: PMC8170943 DOI: 10.1186/s12859-020-03882-y] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/10/2020] [Accepted: 11/13/2020] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Drug repositioning, meanings finding new uses for existing drugs, which can accelerate the processing of new drugs research and development. Various computational methods have been presented to predict novel drug-disease associations for drug repositioning based on similarity measures among drugs and diseases. However, there are some known associations between drugs and diseases that previous studies not utilized. METHODS In this work, we develop a deep gated recurrent units model to predict potential drug-disease interactions using comprehensive similarity measures and Gaussian interaction profile kernel. More specifically, the similarity measure is used to exploit discriminative feature for drugs based on their chemical fingerprints. Meanwhile, the Gaussian interactions profile kernel is employed to obtain efficient feature of diseases based on known disease-disease associations. Then, a deep gated recurrent units model is developed to predict potential drug-disease interactions. RESULTS The performance of the proposed model is evaluated on two benchmark datasets under tenfold cross-validation. And to further verify the predictive ability, case studies for predicting new potential indications of drugs were carried out. CONCLUSION The experimental results proved the proposed model is a useful tool for predicting new indications for drugs or new treatments for diseases, and can accelerate drug repositioning and related drug research and discovery.
Collapse
Affiliation(s)
- Hai-Cheng Yi
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Zhu-Hong You
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China.
| | - Lei Wang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Xiao-Rui Su
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
- University of Chinese Academy of Sciences, Beijing, 100049, China
| | - Xi Zhou
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| | - Tong-Hai Jiang
- The Xinjiang Technical Institute of Physics and Chemistry, Chinese Academy of Sciences, Urumqi, 830011, China
| |
Collapse
|
42
|
Li Y, Sun H, Feng S, Zhang Q, Han S, Du W. Capsule-LPI: a LncRNA-protein interaction predicting tool based on a capsule network. BMC Bioinformatics 2021; 22:246. [PMID: 33985444 PMCID: PMC8120853 DOI: 10.1186/s12859-021-04171-y] [Citation(s) in RCA: 21] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2020] [Accepted: 05/05/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND Long noncoding RNAs (lncRNAs) play important roles in multiple biological processes. Identifying LncRNA-protein interactions (LPIs) is key to understanding lncRNA functions. Although some LPIs computational methods have been developed, the LPIs prediction problem remains challenging. How to integrate multimodal features from more perspectives and build deep learning architectures with better recognition performance have always been the focus of research on LPIs. RESULTS We present a novel multichannel capsule network framework to integrate multimodal features for LPI prediction, Capsule-LPI. Capsule-LPI integrates four groups of multimodal features, including sequence features, motif information, physicochemical properties and secondary structure features. Capsule-LPI is composed of four feature-learning subnetworks and one capsule subnetwork. Through comprehensive experimental comparisons and evaluations, we demonstrate that both multimodal features and the architecture of the multichannel capsule network can significantly improve the performance of LPI prediction. The experimental results show that Capsule-LPI performs better than the existing state-of-the-art tools. The precision of Capsule-LPI is 87.3%, which represents a 1.7% improvement. The F-value of Capsule-LPI is 92.2%, which represents a 1.4% improvement. CONCLUSIONS This study provides a novel and feasible LPI prediction tool based on the integration of multimodal features and a capsule network. A webserver ( http://csbg-jlu.site/lpc/predict ) is developed to be convenient for users.
Collapse
Affiliation(s)
- Ying Li
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Hang Sun
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Shiyao Feng
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Qi Zhang
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
| | - Siyu Han
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China
- Department of Computer Science, Faculty of Engineering, University of Bristol, Bristol, BS8 1UB, UK
| | - Wei Du
- Key Laboratory of Symbolic Computation and Knowledge Engineering, Ministry of Education, College of Computer Science and Technology, Jilin University, Qianjin Street, 130012, Changchun, China.
| |
Collapse
|
43
|
Wang J, Zhao Y, Gong W, Liu Y, Wang M, Huang X, Tan J. EDLMFC: an ensemble deep learning framework with multi-scale features combination for ncRNA-protein interaction prediction. BMC Bioinformatics 2021; 22:133. [PMID: 33740884 PMCID: PMC7980572 DOI: 10.1186/s12859-021-04069-9] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/23/2021] [Accepted: 03/05/2021] [Indexed: 11/29/2022] Open
Abstract
Background Non-coding RNA (ncRNA) and protein interactions play essential roles in various physiological and pathological processes. The experimental methods used for predicting ncRNA–protein interactions are time-consuming and labor-intensive. Therefore, there is an increasing demand for computational methods to accurately and efficiently predict ncRNA–protein interactions. Results In this work, we presented an ensemble deep learning-based method, EDLMFC, to predict ncRNA–protein interactions using the combination of multi-scale features, including primary sequence features, secondary structure sequence features, and tertiary structure features. Conjoint k-mer was used to extract protein/ncRNA sequence features, integrating tertiary structure features, then fed into an ensemble deep learning model, which combined convolutional neural network (CNN) to learn dominating biological information with bi-directional long short-term memory network (BLSTM) to capture long-range dependencies among the features identified by the CNN. Compared with other state-of-the-art methods under five-fold cross-validation, EDLMFC shows the best performance with accuracy of 93.8%, 89.7%, and 86.1% on RPI1807, NPInter v2.0, and RPI488 datasets, respectively. The results of the independent test demonstrated that EDLMFC can effectively predict potential ncRNA–protein interactions from different organisms. Furtherly, EDLMFC is also shown to predict hub ncRNAs and proteins presented in ncRNA–protein networks of Mus musculus successfully. Conclusions In general, our proposed method EDLMFC improved the accuracy of ncRNA–protein interaction predictions and anticipated providing some helpful guidance on ncRNA functions research. The source code of EDLMFC and the datasets used in this work are available at https://github.com/JingjingWang-87/EDLMFC. Supplementary Information The online version contains supplementary material available at 10.1186/s12859-021-04069-9.
Collapse
Affiliation(s)
- Jingjing Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yanpeng Zhao
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Weikang Gong
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Yang Liu
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Mei Wang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Xiaoqian Huang
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China
| | - Jianjun Tan
- Department of Biomedical Engineering, Faculty of Environment and Life, Beijing International Science and Technology Cooperation Base for Intelligent Physiological Measurement and Clinical Transformation, Beijing University of Technology, Beijing, 100124, China.
| |
Collapse
|
44
|
Yang S, Liu X, Ng RT. ProbeRating: a recommender system to infer binding profiles for nucleic acid-binding proteins. Bioinformatics 2021; 36:4797-4804. [PMID: 32573679 PMCID: PMC7750938 DOI: 10.1093/bioinformatics/btaa580] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2019] [Revised: 05/18/2020] [Accepted: 06/18/2020] [Indexed: 12/15/2022] Open
Abstract
MOTIVATION The interaction between proteins and nucleic acids plays a crucial role in gene regulation and cell function. Determining the binding preferences of nucleic acid-binding proteins (NBPs), namely RNA-binding proteins (RBPs) and transcription factors (TFs), is the key to decipher the protein-nucleic acids interaction code. Today, available NBP binding data from in vivo or in vitro experiments are still limited, which leaves a large portion of NBPs uncovered. Unfortunately, existing computational methods that model the NBP binding preferences are mostly protein specific: they need the experimental data for a specific protein in interest, and thus only focus on experimentally characterized NBPs. The binding preferences of experimentally unexplored NBPs remain largely unknown. RESULTS Here, we introduce ProbeRating, a nucleic acid recommender system that utilizes techniques from deep learning and word embeddings of natural language processing. ProbeRating is developed to predict binding profiles for unexplored or poorly studied NBPs by exploiting their homologs NBPs which currently have available binding data. Requiring only sequence information as input, ProbeRating adapts FastText from Facebook AI Research to extract biological features. It then builds a neural network-based recommender system. We evaluate the performance of ProbeRating on two different tasks: one for RBP and one for TF. As a result, ProbeRating outperforms previous methods on both tasks. The results show that ProbeRating can be a useful tool to study the binding mechanism for the many NBPs that lack direct experimental evidence. and implementation. AVAILABILITY AND IMPLEMENTATION The source code is freely available at <https://github.com/syang11/ProbeRating>. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Shu Yang
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| | - Xiaoxi Liu
- RIKEN Center for Integrative Medical Sciences (IMS), Yokohama 230-0045, Japan
| | - Raymond T Ng
- Department of Computer Science, University of British Columbia, Vancouver, BC V6T1Z4, Canada
| |
Collapse
|
45
|
Zhang Q, Liu P, Wang X, Zhang Y, Han Y, Yu B. StackPDB: Predicting DNA-binding proteins based on XGB-RFE feature optimization and stacked ensemble classifier. Appl Soft Comput 2021. [DOI: 10.1016/j.asoc.2020.106921] [Citation(s) in RCA: 12] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/08/2023]
|
46
|
Shaw D, Chen H, Xie M, Jiang T. DeepLPI: a multimodal deep learning method for predicting the interactions between lncRNAs and protein isoforms. BMC Bioinformatics 2021; 22:24. [PMID: 33461501 PMCID: PMC7814738 DOI: 10.1186/s12859-020-03914-7] [Citation(s) in RCA: 13] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/15/2020] [Accepted: 11/30/2020] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND Long non-coding RNAs (lncRNAs) regulate diverse biological processes via interactions with proteins. Since the experimental methods to identify these interactions are expensive and time-consuming, many computational methods have been proposed. Although these computational methods have achieved promising prediction performance, they neglect the fact that a gene may encode multiple protein isoforms and different isoforms of the same gene may interact differently with the same lncRNA. RESULTS In this study, we propose a novel method, DeepLPI, for predicting the interactions between lncRNAs and protein isoforms. Our method uses sequence and structure data to extract intrinsic features and expression data to extract topological features. To combine these different data, we adopt a hybrid framework by integrating a multimodal deep learning neural network and a conditional random field. To overcome the lack of known interactions between lncRNAs and protein isoforms, we apply a multiple instance learning (MIL) approach. In our experiment concerning the human lncRNA-protein interactions in the NPInter v3.0 database, DeepLPI improved the prediction performance by 4.7% in term of AUC and 5.9% in term of AUPRC over the state-of-the-art methods. Our further correlation analyses between interactive lncRNAs and protein isoforms also illustrated that their co-expression information helped predict the interactions. Finally, we give some examples where DeepLPI was able to outperform the other methods in predicting mouse lncRNA-protein interactions and novel human lncRNA-protein interactions. CONCLUSION Our results demonstrated that the use of isoforms and MIL contributed significantly to the improvement of performance in predicting lncRNA and protein interactions. We believe that such an approach would find more applications in predicting other functional roles of RNAs and proteins.
Collapse
Affiliation(s)
- Dipan Shaw
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Hao Chen
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
| | - Minzhu Xie
- College of Information Science and Engineering, Hunan Normal University, Changsha, China
| | - Tao Jiang
- Department of Computer Science and Engineering, University of California, Riverside, CA 92521 USA
- Bioinformatics Division, BNRIST/Department of Computer Science and Technology, Tsinghua University, Beijing, China
| |
Collapse
|
47
|
Choudhury A, Das NC, Patra R, Mukherjee S. In silico analyses on the comparative sensing of SARS-CoV-2 mRNA by the intracellular TLRs of humans. J Med Virol 2021; 93:2476-2486. [PMID: 33404091 DOI: 10.1002/jmv.26776] [Citation(s) in RCA: 67] [Impact Index Per Article: 16.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/19/2020] [Revised: 12/24/2020] [Accepted: 01/01/2021] [Indexed: 12/16/2022]
Abstract
The coronavirus disease-2019 (COVID-19) pandemic caused by severe acute respiratory syndrome coronavirus-2 (SARS-CoV-2) has already resulted in a huge setback to mankind in terms of millions of deaths, while the unavailability of an appropriate therapeutic strategy has made the scenario much more severe. Toll-like receptors (TLRs) are crucial mediators and regulators of host immunity and the role of human cell surface TLRs in SARS-CoV-2 induced inflammatory pathogenesis has been demonstrated recently. However, the functional significance of the human intracellular TLRs including TLR3, 7, 8, and 9 is yet unclear. Hitherto, the involvement of these intracellular TLRs in inducing pro-inflammatory responses in COVID-19 has been reported but the identity of the interacting viral RNA molecule(s) and the corresponding TLRs have not been explored. This study hopes to rationalize the comparative binding of the major SARS-CoV-2 mRNAs to the intracellular TLRs, considering the solvent-based force-fields operational in the cytosolic aqueous microenvironment that predominantly drives these interactions. Our in silico study on the binding of all mRNAs with the intracellular TLRs depicts that the mRNA of NSP10, S2, and E proteins of SARS-CoV-2 are possible virus-associated molecular patterns that bind to TLR3, TLR9, and TLR7, respectively, and trigger downstream cascade reactions. Intriguingly, binding of the viral mRNAs resulted in variable degrees of conformational changes in the ligand-binding domain of the TLRs ratifying the activation of the downstream inflammatory signaling cascade. Taken together, the current study is the maiden report to describe the role of TLR3, 7, and 9 in COVID-19 immunobiology and these could serve as useful targets for the conception of a therapeutic strategy against the pandemic.
Collapse
Affiliation(s)
- Abhigyan Choudhury
- Integrative Biochemistry & Immunology Laboratory, Department of Animal Science, Kazi Nazrul University, Asansol, West Bengal, India
| | - Nabarun Chandra Das
- Integrative Biochemistry & Immunology Laboratory, Department of Animal Science, Kazi Nazrul University, Asansol, West Bengal, India
| | - Ritwik Patra
- Integrative Biochemistry & Immunology Laboratory, Department of Animal Science, Kazi Nazrul University, Asansol, West Bengal, India
| | - Suprabhat Mukherjee
- Integrative Biochemistry & Immunology Laboratory, Department of Animal Science, Kazi Nazrul University, Asansol, West Bengal, India
| |
Collapse
|
48
|
Alam T, Al-Absi HRH, Schmeier S. Deep Learning in LncRNAome: Contribution, Challenges, and Perspectives. Noncoding RNA 2020; 6:E47. [PMID: 33266128 PMCID: PMC7711891 DOI: 10.3390/ncrna6040047] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2020] [Revised: 10/27/2020] [Accepted: 11/06/2020] [Indexed: 12/11/2022] Open
Abstract
Long non-coding RNAs (lncRNA), the pervasively transcribed part of the mammalian genome, have played a significant role in changing our protein-centric view of genomes. The abundance of lncRNAs and their diverse roles across cell types have opened numerous avenues for the research community regarding lncRNAome. To discover and understand lncRNAome, many sophisticated computational techniques have been leveraged. Recently, deep learning (DL)-based modeling techniques have been successfully used in genomics due to their capacity to handle large amounts of data and produce relatively better results than traditional machine learning (ML) models. DL-based modeling techniques have now become a choice for many modeling tasks in the field of lncRNAome as well. In this review article, we summarized the contribution of DL-based methods in nine different lncRNAome research areas. We also outlined DL-based techniques leveraged in lncRNAome, highlighting the challenges computational scientists face while developing DL-based models for lncRNAome. To the best of our knowledge, this is the first review article that summarizes the role of DL-based techniques in multiple areas of lncRNAome.
Collapse
Affiliation(s)
- Tanvir Alam
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Hamada R. H. Al-Absi
- College of Science and Engineering, Hamad Bin Khalifa University, Doha 34110, Qatar;
| | - Sebastian Schmeier
- School of Natural and Computational Sciences, Massey University, Auckland 0632, New Zealand;
| |
Collapse
|
49
|
Zhong L, Zhen M, Sun J, Zhao Q. Recent advances on the machine learning methods in predicting ncRNA-protein interactions. Mol Genet Genomics 2020; 296:243-258. [PMID: 33006667 DOI: 10.1007/s00438-020-01727-0] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/01/2020] [Accepted: 09/17/2020] [Indexed: 12/22/2022]
Abstract
Recent transcriptomics and bioinformatics studies have shown that ncRNAs can affect chromosome structure and gene transcription, participate in the epigenetic regulation, and take part in diseases such as tumorigenesis. Biologists have found that most ncRNAs usually work by interacting with the corresponding RNA-binding proteins. Therefore, ncRNA-protein interaction is a very popular study in both the biological and medical fields. However, due to the limitations of manual experiments in the laboratory, machine-learning methods for predicting ncRNA-protein interactions are increasingly favored by the researchers. In this review, we summarize several machine learning predictive models of ncRNA-protein interactions over the past few years, and briefly describe the characteristics of these machine learning models. In order to optimize the performance of machine learning models to better predict ncRNA-protein interactions, we give some promising future computational directions at the end.
Collapse
Affiliation(s)
- Lin Zhong
- School of Mathematics, Liaoning University, Shenyang, 110036, China
| | - Meiqin Zhen
- Beijing Chest Hospital, Capital Medical University/Beijing Tuberculosis and Thoracic Tumor Research Institute, Beijing, 101149, China
| | - Jianqiang Sun
- School of Automation and Electrical Engineering, Linyi University, Linyi, 276000, China
| | - Qi Zhao
- School of Computer Science and Software Engineering, University of Science and Technology Liaoning, Anshan, 114051, China.
| |
Collapse
|
50
|
Li J, Shi X, You ZH, Yi HC, Chen Z, Lin Q, Fang M. Using Weighted Extreme Learning Machine Combined With Scale-Invariant Feature Transform to Predict Protein-Protein Interactions From Protein Evolutionary Information. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2020; 17:1546-1554. [PMID: 31940546 DOI: 10.1109/tcbb.2020.2965919] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/10/2023]
Abstract
Protein-Protein Interactions (PPIs) play an irreplaceable role in biological activities of organisms. Although many high-throughput methods are used to identify PPIs from different kinds of organisms, they have some shortcomings, such as high cost and time-consuming. To solve the above problems, computational methods are developed to predict PPIs. Thus, in this paper, we present a method to predict PPIs using protein sequences. First, protein sequences are transformed into Position Weight Matrix (PWM), in which Scale-Invariant Feature Transform (SIFT) algorithm is used to extract features. Then Principal Component Analysis (PCA) is applied to reduce the dimension of features. At last, Weighted Extreme Learning Machine (WELM) classifier is employed to predict PPIs and a series of evaluation results are obtained. In our method, since SIFT and WELM are used to extract features and classify respectively, we called the proposed method SIFT-WELM. When applying the proposed method on three well-known PPIs datasets of Yeast, Human and Helicobacter.pylori, the average accuracies of our method using five-fold cross validation are obtained as high as 94.83, 97.60 and 83.64 percent, respectively. In order to evaluate the proposed approach properly, we compare it with Support Vector Machine (SVM) classifier and other recent-developed methods in different aspects. Moreover, the training time of our method is greatly shortened, which is obviously superior to the previous methods, such as SVM, ACC, PCVMZM and so on.
Collapse
|