1
|
Raymond WS, DeRoo J, Munsky B. Identification of potential riboswitch elements in Homo sapiens mRNA 5'UTR sequences using positive-unlabeled machine learning. PLoS One 2025; 20:e0320282. [PMID: 40273288 PMCID: PMC12021280 DOI: 10.1371/journal.pone.0320282] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2025] [Accepted: 02/17/2025] [Indexed: 04/26/2025] Open
Abstract
Riboswitches are a class of noncoding RNA structures that interact with target ligands to cause a conformational change that can then execute some regulatory purpose within the cell. Riboswitches are ubiquitous and well characterized in bacteria and prokaryotes, with additional examples also being found in fungi, plants, and yeast. To date, no purely RNA-small molecule riboswitch has been discovered in Homo Sapiens. Several analogous riboswitch-like mechanisms have been described within the H. Sapiens translatome within the past decade, prompting the question: Is there a H. Sapiens riboswitch dependent on only small molecule ligands? In this work, we set out to train positive unlabeled machine learning classifiers on known riboswitch sequences and apply the classifiers to H. Sapiens mRNA 5'UTR sequences found in the 5'UTR database, UTRdb, in the hope of identifying a set of mRNAs to investigate for riboswitch functionality. 67,683 riboswitch sequences were obtained from RNAcentral and sorted for ligand type and used as positive examples and 48,031 5'UTR sequences were used as unlabeled, unknown examples. Positive examples were sorted by ligand, and 20 positive-unlabeled classifiers were trained on sequence and secondary structure features while withholding one or two ligand classes. Cross validation was then performed on the withheld ligand sets to obtain a validation accuracy range of 75%-99%. The joint sets of 5'UTRs identified as potential riboswitches by the 20 classifiers were then analyzed. 1533 sequences were identified as a riboswitch by one or more classifier(s) and 436 of the H. Sapiens 5'UTRs were labeled as harboring potential riboswitch elements by all 20 classifiers. These 436 sequences were mapped back to the most similar riboswitches within the positive data and examined. An online database of identified and ranked 5'UTRs, their features, and their most similar matches to known riboswitches, is provided to guide future experimental efforts to identify H. Sapiens riboswitches.
Collapse
Affiliation(s)
- William S Raymond
- School of Biomedical Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Jacob DeRoo
- School of Biomedical Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| | - Brian Munsky
- School of Biomedical Engineering, Colorado State University, Fort Collins, Colorado, United States of America
- Chemical and Biological Engineering, Colorado State University, Fort Collins, Colorado, United States of America
| |
Collapse
|
2
|
Guan Z, Jin X, Zhang X. MFF-nDA: A Computational Model for ncRNA-Disease Association Prediction Based on Multimodule Fusion. J Chem Inf Model 2025; 65:3324-3342. [PMID: 40129032 DOI: 10.1021/acs.jcim.5c00174] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 03/26/2025]
Abstract
Noncoding RNAs(ncRNAs), including piwi-interacting RNA(piRNA), long noncoding RNA(lncRNA), microRNA(miRNA), small nucleolar RNA(snoRNA), and circular RNA(circRNA), contribute significantly to gene expression regulation and serve as key factors in disease association studies and health-related exploration. Accurate prediction of ncRNA-disease associations is crucial for elucidating disease mechanisms and advancing therapeutic development. Recently, computational models based on a graph neural network have extensively emerged for identifying associations among various ncRNAs and diseases. However, existing computational models have not fully utilized integrative information on ncRNs and diseases, and reliance on GNN-based models alone may be limited in performance due to oversmoothing issues. On the other hand, existing models are mainly targeted at a specific type of ncRNA and may not be applicable to most ncRNAs. Therefore, to overcome these limitations, we propound a computational model MFF-nDA based on multimodule fusion. Specifically, we first introduce five types of similarity network information, including three types of ncRNA and two types of disease similarity information, in order to fully explore and optimize the multisource feature information on these entities. Subsequently, we establish three modules: heterogeneous network representation module based on Transformer, association network representation module based on graph convolutional network (GCN), and topological structure representation module based on graph attention network (GAT), which capture diverse features of nodes in heterogeneous networks and topological structure information reflected in association networks. The complementary effects of the three modules also help relieve the oversmoothing issue to some extent. By leveraging the multimodule fusion learning to comprehensively capture the diverse features of these entities, our model outperforms the available state-of-the-art methods, achieving an AUC greater than 0.9000 for each dataset. This demonstrates the highest predictive performance, making it a valuable tool for identifying potential ncRNA associated with diseases. The code of MFF-nDA can be accessed at https://github.com/Jack-Cxy/MFF-nDA.
Collapse
Affiliation(s)
- Zhihao Guan
- College of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| | - Xiu Jin
- College of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| | - Xiaodan Zhang
- College of Information and Artificial Intelligence, Anhui Agricultural University, Hefei 230036, China
- Anhui Province Key Laboratory of Smart Agricultural Technology and Equipment, Anhui Agricultural University, Hefei 230036, China
| |
Collapse
|
3
|
Peng Y, Chu S, Huang X, Cheng Y. PPDAMEGCN: Predicting piRNA-Disease Associations Based on Multi-Edge Type Graph Convolutional Network. IET Syst Biol 2025; 19:e70011. [PMID: 40120103 PMCID: PMC11929523 DOI: 10.1049/syb2.70011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/04/2024] [Revised: 01/16/2025] [Accepted: 03/06/2025] [Indexed: 03/25/2025] Open
Abstract
Recently, many studies have proven that Piwi-interacting RNAs (piRNAs) play key roles in various biological processes and also associate with human complicated diseases. Therefore, in order to accelerate the traditional biomedical experimental methods for determining piRNA-disease associations, many computational approaches have been proposed. However, piRNA-disease associations can be classified into known and unknown associations, each of which may provide distinct types of information. Traditional graph convolutional networks (GCNs) typically treat all edges in a graph as identical, overlooking the fact that different edge types may carry different signals and influence the learning process in unique ways. In this study, we also provide a new piRNA-disease association prediction method, called PPDAMEGCN, based on a multi-edge type graph convolutional network. First, we calculate the piRNA sequence similarity based on the piRNA sequence information and Smith-Waterman method. The disease semantic similarity is also computed by disease ontology (DO). In addition, we calculate the Gaussian interaction profile (GIP) kernel similarities of piRNA and diseases through the known piRNA-disease associations. Then, we construct the piRNA similarity network by integrating the piRNA's sequence similarity and GIP similarity. We also construct the disease similarity network by integrating disease's semantic similarity and GIP similarity. Finally, we obtain the piRNA and disease embeddings by the multi-edge type graph convolutional network model on the heterogenous piRNA-disease association network. The piRNA-disease pair association probability score is calculated by a multilayer perceptron (MLP) with its concatenated embedding. We also compare PPDAMEGCN to other piRNA-disease prediction methods. The experimental results show that our method outperforms compared methods.
Collapse
Affiliation(s)
- Yinglong Peng
- School of Information and IntelligenceXiangXi Vocational and Technical College for NationalitiesJishouChina
| | - Shuang Chu
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| | - Xindi Huang
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| | - Yan Cheng
- School of InformaticsHunan University of Chinese MedicineChangshaChina
| |
Collapse
|
4
|
Raymond WS, DeRoo J, Munsky B. Identification of potential riboswitch elements in Homo SapiensmRNA 5'UTR sequences using Positive-Unlabeled machine learning. BIORXIV : THE PREPRINT SERVER FOR BIOLOGY 2024:2023.11.23.568398. [PMID: 39677788 PMCID: PMC11642740 DOI: 10.1101/2023.11.23.568398] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 12/17/2024]
Abstract
Riboswitches are a class of noncoding RNA structures that interact with target ligands to cause a conformational change that can then execute some regulatory purpose within the cell. Riboswitches are ubiquitous and well characterized in bacteria and prokaryotes, with additional examples also being found in fungi, plants, and yeast. To date, no purely RNA-small molecule riboswitch has been discovered in Homo Sapiens. Several analogous riboswitch-like mechanisms have been described within the H. Sapiens translatome within the past decade, prompting the question: Is there a H. Sapiens riboswitch dependent on only small molecule ligands? In this work, we set out to train positive unlabeled machine learning classifiers on known riboswitch sequences and apply the classifiers to H. Sapiens mRNA 5'UTR sequences found in the 5'UTR database, UTRdb, in the hope of identifying a set of mRNAs to investigate for riboswitch functionality. 67,683 riboswitch sequences were obtained from RNAcentral and sorted for ligand type and used as positive examples and 48,031 5'UTR sequences were used as unlabeled, unknown examples. Positive examples were sorted by ligand, and 20 positive-unlabeled classifiers were trained on sequence and secondary structure features while withholding one or two ligand classes. Cross validation was then performed on the withheld ligand sets to obtain a validation accuracy range of 75%-99%. The joint sets of 5'UTRs identified as potential riboswitches by the 20 classifiers were then analyzed. 15333 sequences were identified as a riboswitch by one or more classifier(s) and 436 of the H. Sapiens 5'UTRs were labeled as harboring potential riboswitch elements by all 20 classifiers. These 436 sequences were mapped back to the most similar riboswitches within the positive data and examined. An online database of identified and ranked 5'UTRs, their features, and their most similar matches to known riboswitches, is provided to guide future experimental efforts to identify H. Sapiens riboswitches.
Collapse
Affiliation(s)
- William S. Raymond
- School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
| | - Jacob DeRoo
- School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
| | - Brian Munsky
- School of Biomedical Engineering, Colorado State University Fort Collins, CO 80523, USA
- Chemical and Biological Engineering, Colorado State University Fort Collins, CO 80523, USA
| |
Collapse
|
5
|
Guo C, Wang X, Ren H. Databases and computational methods for the identification of piRNA-related molecules: A survey. Comput Struct Biotechnol J 2024; 23:813-833. [PMID: 38328006 PMCID: PMC10847878 DOI: 10.1016/j.csbj.2024.01.011] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/11/2023] [Revised: 12/31/2023] [Accepted: 01/15/2024] [Indexed: 02/09/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) are a class of small non-coding RNAs (ncRNAs) that plays important roles in many biological processes and major cancer diagnosis and treatment, thus becoming a hot research topic. This study aims to provide an in-depth review of computational piRNA-related research, including databases and computational models. Herein, we perform literature analysis and use comparative evaluation methods to summarize and analyze three aspects of computational piRNA-related research: (i) computational models for piRNA-related molecular identification tasks, (ii) computational models for piRNA-disease association prediction tasks, and (iii) computational resources and evaluation metrics for these tasks. This study shows that computational piRNA-related research has significantly progressed, exhibiting promising performance in recent years, whereas they also suffer from the emerging challenges of inconsistent naming systems and the lack of data. Different from other reviews on piRNA-related identification tasks that focus on the organization of datasets and computational methods, we pay more attention to the analysis of computational models, algorithms, and performances that aim to provide valuable references for computational piRNA-related identification tasks. This study will benefit the theoretical development and practical application of piRNAs by better understanding computational models and resources to investigate the biological functions and clinical implications of piRNA.
Collapse
Affiliation(s)
- Chang Guo
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
| | - Xiaoli Wang
- Institute of Reproductive Health, Tongji Medical College, Huazhong University of Science and Technology, Wuhan 430030, China
| | - Han Ren
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou 510420, China
- Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou 510420, China
| |
Collapse
|
6
|
Parvez A, Ali SD, Tayara H, Chong KT. Stacking based ensemble learning framework for identification of nitrotyrosine sites. Comput Biol Med 2024; 183:109200. [PMID: 39366143 DOI: 10.1016/j.compbiomed.2024.109200] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/10/2024] [Revised: 09/02/2024] [Accepted: 09/22/2024] [Indexed: 10/06/2024]
Abstract
Protein nitrotyrosine is an essential post-translational modification that results from the nitration of tyrosine amino acid residues. This modification is known to be associated with the regulation and characterization of several biological functions and diseases. Therefore, accurate identification of nitrotyrosine sites plays a significant role in the elucidating progress of associated biological signs. In this regard, we reported an accurate computational tool known as iNTyro-Stack for the identification of protein nitrotyrosine sites. iNTyro-Stack is a machine-learning model based on a stacking algorithm. The base classifiers in stacking are selected based on the highest performance. The feature map employed is a linear combination of the amino composition encoding schemes, including the composition of k-spaced amino acid pairs and tri-peptide composition. The recursive feature elimination technique is used for significant feature selection. The performance of the proposed method is evaluated using k-fold cross-validation and independent testing approaches. iNTyro-Stack achieved an accuracy of 86.3% and a Matthews correlation coefficient (MCC) of 72.6% in cross-validation. Its generalization capability was further validated on an imbalanced independent test set, where it attained an accuracy of 69.32%. iNTyro-Stack outperforms existing state-of-the-art methods across both evaluation techniques. The github repository is create to reproduce the method and results of iNTyro-Stack, accessible on: https://github.com/waleed551/iNTyro-Stack/.
Collapse
Affiliation(s)
- Aiman Parvez
- Graduate School of Integrated Energy-AI, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Syed Danish Ali
- Department of Electrical Engineering, The University of Azad Jammu and Kashmir, Muzaffarabad, 13100, Pakistan; Department of Electronics and Information Engineering, Jeonbuk National University, Jeonju, 54896, South Korea.
| | - Hilal Tayara
- Department of International Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea
| | - Kil To Chong
- Department of International Science and Engineering, Jeonbuk National University, Jeonju, 54896, South Korea; Advanced Electronics and Information Research Center, Jeonbuk National University, Jeonju, 54896, South Korea
| |
Collapse
|
7
|
Vaida M, Arumalla KK, Tatikonda PK, Popuri B, Bux RA, Tappia PS, Huang G, Haince JF, Ford WR. Identification of a Novel Biomarker Panel for Breast Cancer Screening. Int J Mol Sci 2024; 25:11835. [PMID: 39519384 PMCID: PMC11546995 DOI: 10.3390/ijms252111835] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/05/2024] [Revised: 10/25/2024] [Accepted: 10/27/2024] [Indexed: 11/16/2024] Open
Abstract
Breast cancer remains a major public health concern, and early detection is crucial for improving survival rates. Metabolomics offers the potential to develop non-invasive screening and diagnostic tools based on metabolic biomarkers. However, the inherent complexity of metabolomic datasets and the high dimensionality of biomarkers complicates the identification of diagnostically relevant features, with multiple studies demonstrating limited consensus on the specific metabolites involved. Unlike previous studies that rely on singular feature selection techniques such as Partial Least Square (PLS) or LASSO regression, this research combines supervised and unsupervised machine learning methods with random sampling strategies, offering a more robust and interpretable approach to feature selection. This study aimed to identify a parsimonious and robust set of biomarkers for breast cancer diagnosis using metabolomics data. Plasma samples from 185 breast cancer patients and 53 controls (from the Cooperative Human Tissue Network, USA) were analyzed. This study also overcomes the common issue of dataset imbalance by using propensity score matching (PSM), which ensures reliable comparisons between cancer and control groups. We employed Univariate Naïve Bayes, L2-regularized Support Vector Classifier (SVC), Principal Component Analysis (PCA), and feature engineering techniques to refine and select the most informative features. Our best-performing feature set comprised 11 biomarkers, including 9 metabolites (SM(OH) C22:2, SM C18:0, C0, C3OH, C14:2OH, C16:2OH, LysoPC a C18:1, PC aa C36:0 and Asparagine), a metabolite ratio (Kynurenine-to-Tryptophan), and 1 demographic variable (Age), achieving an area under the ROC curve (AUC) of 98%. These results demonstrate the potential for a robust, cost-effective, and non-invasive breast cancer screening and diagnostic tool, offering significant clinical value for early detection and personalized patient management.
Collapse
Affiliation(s)
- Maria Vaida
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Kamala K. Arumalla
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Pavan Kumar Tatikonda
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Bharadwaj Popuri
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| | - Rashid A. Bux
- BioMark Diagnostics Inc., Richmond, BC V6X 2W2, Canada;
| | | | - Guoyu Huang
- BioMark Diagnostic Solutions Inc., Quebec City, QC G1P 4P5, Canada; (G.H.); (J.-F.H.)
| | - Jean-François Haince
- BioMark Diagnostic Solutions Inc., Quebec City, QC G1P 4P5, Canada; (G.H.); (J.-F.H.)
| | - W. Randolph Ford
- Department of Analytics, Harrisburg University of Science and Technology, Harrisburg, PA 17101, USA; (M.V.); (K.K.A.); (P.K.T.); (B.P.); (W.R.F.)
| |
Collapse
|
8
|
Liu Y, Zhang F, Ding Y, Fei R, Li J, Wu FX. MRDPDA: A multi-Laplacian regularized deepFM model for predicting piRNA-disease associations. J Cell Mol Med 2024; 28:e70046. [PMID: 39228010 PMCID: PMC11371490 DOI: 10.1111/jcmm.70046] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2024] [Revised: 07/15/2024] [Accepted: 08/16/2024] [Indexed: 09/05/2024] Open
Abstract
PIWI-interacting RNAs (piRNAs) are a typical class of small non-coding RNAs, which are essential for gene regulation, genome stability and so on. Accumulating studies have revealed that piRNAs have significant potential as biomarkers and therapeutic targets for a variety of diseases. However current computational methods face the challenge in effectively capturing piRNA-disease associations (PDAs) from limited data. In this study, we propose a novel method, MRDPDA, for predicting PDAs based on limited data from multiple sources. Specifically, MRDPDA integrates a deep factorization machine (deepFM) model with regularizations derived from multiple yet limited datasets, utilizing separate Laplacians instead of a simple average similarity network. Moreover, a unified objective function to combine embedding loss about similarities is proposed to ensure that the embedding is suitable for the prediction task. In addition, a balanced benchmark dataset based on piRPheno is constructed and a deep autoencoder is applied for creating reliable negative set from the unlabeled dataset. Compared with three latest methods, MRDPDA achieves the best performance on the pirpheno dataset in terms of the five-fold cross validation test and independent test set, and case studies further demonstrate the effectiveness of MRDPDA.
Collapse
Affiliation(s)
- Yajun Liu
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Fan Zhang
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Yulian Ding
- Center for High Performance Computing, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
| | - Rong Fei
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Junhuai Li
- Shaanxi Key Laboratory for Network Computing and Security Technology, School of Computer Science and Engineering, Xi'an University of Technology, Xi'an, China
| | - Fang-Xiang Wu
- Department of Computer Science, Biomedical Engineering and Mechanical Engineering, University of Saskatchewan, Saskatoon, Saskatchewan, Canada
| |
Collapse
|
9
|
Kondratov KA, Artamonov AA, Nikitin YV, Velmiskina AA, Mikhailovskii VY, Mosenko SV, Polkovnikova IA, Asinovskaya AY, Apalko SV, Sushentseva NN, Ivanov AM, Scherbak SG. Revealing differential expression patterns of piRNA in FACS blood cells of SARS-CoV-2 infected patients. BMC Med Genomics 2024; 17:212. [PMID: 39143590 PMCID: PMC11325581 DOI: 10.1186/s12920-024-01982-9] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/11/2024] [Accepted: 08/05/2024] [Indexed: 08/16/2024] Open
Abstract
Non-coding RNA expression has shown to have cell type-specificity. The regulatory characteristics of these molecules are impacted by changes in their expression levels. We performed next-generation sequencing and examined small RNA-seq data obtained from 6 different types of blood cells separated by fluorescence-activated cell sorting of severe COVID-19 patients and healthy control donors. In addition to examining the behavior of piRNA in the blood cells of severe SARS-CoV-2 infected patients, our aim was to present a distinct piRNA differential expression portrait for each separate cell type. We observed that depending on the type of cell, different sorted control cells (erythrocytes, monocytes, lymphocytes, eosinophils, basophils, and neutrophils) have altering piRNA expression patterns. After analyzing the expression of piRNAs in each set of sorted cells from patients with severe COVID-19, we observed 3 significantly elevated piRNAs - piR-33,123, piR-34,765, piR-43,768 and 9 downregulated piRNAs in erythrocytes. In lymphocytes, all 19 piRNAs were upregulated. Monocytes were presented with a larger amount of statistically significant piRNA, 5 upregulated (piR-49039 piR-31623, piR-37213, piR-44721, piR-44720) and 35 downregulated. It has been previously shown that piR-31,623 has been associated with respiratory syncytial virus infection, and taking in account the major role of piRNA in transposon silencing, we presume that the differential expression patterns which we observed could be a signal of indirect antiviral activity or a specific antiviral cell state. Additionally, in lymphocytes, all 19 piRNAs were upregulated.
Collapse
Affiliation(s)
- Kirill A Kondratov
- City Hospital, No. 40 St, Petersburg, 197706, Russia.
- S. M. Kirov Military Medical Academy, St. Petersburg, 194044, Russia.
- Saint-Petersburg State University, St. Petersburg, 199034, Russia.
| | | | - Yuri V Nikitin
- S. M. Kirov Military Medical Academy, St. Petersburg, 194044, Russia
| | - Anastasiya A Velmiskina
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| | | | - Sergey V Mosenko
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| | - Irina A Polkovnikova
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| | - Anna Yu Asinovskaya
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| | - Svetlana V Apalko
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| | | | - Andrey M Ivanov
- S. M. Kirov Military Medical Academy, St. Petersburg, 194044, Russia
| | - Sergey G Scherbak
- City Hospital, No. 40 St, Petersburg, 197706, Russia
- Saint-Petersburg State University, St. Petersburg, 199034, Russia
| |
Collapse
|
10
|
Sun W, Guo C, Wan J, Ren H. piRNA-disease association prediction based on multi-channel graph variational autoencoder. PeerJ Comput Sci 2024; 10:e2216. [PMID: 39145234 PMCID: PMC11323097 DOI: 10.7717/peerj-cs.2216] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2023] [Accepted: 07/03/2024] [Indexed: 08/16/2024]
Abstract
Piwi-interacting RNA (piRNA) is a type of non-coding small RNA that is highly expressed in mammalian testis. PiRNA has been implicated in various human diseases, but the experimental validation of piRNA-disease associations is costly and time-consuming. In this article, a novel computational method for predicting piRNA-disease associations using a multi-channel graph variational autoencoder (MC-GVAE) is proposed. This method integrates four types of similarity networks for piRNAs and diseases, which are derived from piRNA sequences, disease semantics, piRNA Gaussian Interaction Profile (GIP) kernel, and disease GIP kernel, respectively. These networks are modeled by a graph VAE framework, which can learn low-dimensional and informative feature representations for piRNAs and diseases. Then, a multi-channel method is used to fuse the feature representations from different networks. Finally, a three-layer neural network classifier is applied to predict the potential associations between piRNAs and diseases. The method was evaluated on a benchmark dataset containing 5,002 experimentally validated associations with 4,350 piRNAs and 21 diseases, constructed from the piRDisease v1.0 database. It achieved state-of-the-art performance, with an average AUC value of 0.9310 and an AUPR value of 0.9247 under five-fold cross-validation. This demonstrates the method's effectiveness and superiority in piRNA-disease association prediction.
Collapse
Affiliation(s)
- Wei Sun
- School of Information Science and Technology, Qiongtai Normal University, Haikou, China
| | - Chang Guo
- School of Modern Information Industry, Guangzhou College of Commerce, Guangzhou, China
| | - Jing Wan
- Center for Lexicographical Studies, Guangdong University of Foreign Studies, Guangzhou, China
| | - Han Ren
- Laboratory of Language Engineering and Computing, Guangdong University of Foreign Studies, Guangzhou, China
- Laboratory of Language and Artificial Intelligence, Guangdong University of Foreign Studies, Guangzhou, China
| |
Collapse
|
11
|
Chen Q, Zhang L, Liu Y, Qin Z, Zhao T. PUTransGCN: identification of piRNA-disease associations based on attention encoding graph convolutional network and positive unlabelled learning. Brief Bioinform 2024; 25:bbae144. [PMID: 38581419 PMCID: PMC10998538 DOI: 10.1093/bib/bbae144] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2024] [Revised: 02/25/2024] [Accepted: 03/15/2024] [Indexed: 04/08/2024] Open
Abstract
Piwi-interacting RNAs (piRNAs) play a crucial role in various biological processes and are implicated in disease. Consequently, there is an escalating demand for computational tools to predict piRNA-disease interactions. Although there have been computational methods proposed for the detection of piRNA-disease associations, the problem of imbalanced and sparse dataset has brought great challenges to capture the complex relationships between piRNAs and diseases. In response to this necessity, we have developed a novel computational architecture, denoted as PUTransGCN, which uses heterogeneous graph convolutional networks to uncover potential piRNA-disease associations. Additionally, the attention mechanism was used to adjust the weight parameters of aggregation heterogeneous node features automatically. For tackling the imbalanced dataset problem, the combined positive unlabelled learning (PUL) method comprising PU bagging, two-step and spy technique was applied to select reliable negative associations. The features of piRNAs and diseases were derived from three distinct biological sources by PUTransGCN, including information on piRNA sequences, semantic terms related to diseases and the existing network of piRNA-disease associations. In the experiment, PUTransGCN performs in 5-fold cross-validation with an AUC of 0.93 and 0.95 on two datasets, respectively, which outperforms the other six state-of-the-art models. We compared three different PUL methods, and the results of the ablation experiment indicate that the combined PUL method yields the best results. The PUTransGCN could serve as a valuable piRNA-disease prediction tool for upcoming studies in the biomedical field. The code for PUTransGCN is available at https://github.com/chenqiuhao/PUTransGCN.
Collapse
Affiliation(s)
- Qiuhao Chen
- Institute of Bioinformatics, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Liyuan Zhang
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Yaojia Liu
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Zhonghao Qin
- State Key Laboratory of Robotics and System, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| | - Tianyi Zhao
- School of Computer Science and Technology, Harbin Institute of Technology, 150000, Harbin, Heilongjiang, China
| |
Collapse
|
12
|
Wang K, Perera BPU, Morgan RK, Sala-Hamrick K, Geron V, Svoboda LK, Faulk C, Dolinoy DC, Sartor MA. piOxi database: a web resource of germline and somatic tissue piRNAs identified by chemical oxidation. Database (Oxford) 2024; 2024:baad096. [PMID: 38204359 PMCID: PMC10782149 DOI: 10.1093/database/baad096] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/14/2023] [Revised: 11/27/2023] [Accepted: 12/27/2023] [Indexed: 01/12/2024]
Abstract
PIWI-interacting RNAs (piRNAs) are a class of small non-coding RNAs that are highly expressed and extensively studied from the germline. piRNAs associate with PIWI proteins to maintain DNA methylation for transposon silencing and transcriptional gene regulation for genomic stability. Mature germline piRNAs have distinct characteristics including a 24- to 32-nucleotide length and a 2'-O-methylation signature at the 3' end. Although recent studies have identified piRNAs in somatic tissues, they remain poorly characterized. For example, we recently demonstrated notable expression of piRNA in the murine soma, and while overall expression was lower than that of the germline, unique characteristics suggested tissue-specific functions of this class. While currently available databases commonly use length and association with PIWI proteins to identify piRNA, few have included a chemical oxidation method that detects piRNA based on its 3' modification. This method leads to reproducible and rigorous data processing when coupled with next-generation sequencing and bioinformatics analysis. Here, we introduce piOxi DB, a user-friendly web resource that provides a comprehensive analysis of piRNA, generated exclusively through sodium periodate treatment of small RNA. The current version of piOxi DB includes 435 749 germline and 9828 somatic piRNA sequences robustly identified from M. musculus, M. fascicularis and H. sapiens. The database provides species- and tissue-specific data that are further analyzed according to chromosome location and correspondence to gene and repetitive elements. piOxi DB is an informative tool to assist broad research applications in the fields of RNA biology, cancer biology, environmental toxicology and beyond. Database URL: https://pioxidb.dcmb.med.umich.edu/.
Collapse
Affiliation(s)
| | - Bambarendage P U Perera
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| | - Rachel K Morgan
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| | - Kimberley Sala-Hamrick
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| | - Viviana Geron
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| | - Laurie K Svoboda
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
- Department of Pharmacology, School of Medicine, University of Michigan, 1150 W. Medical Center Drive, Ann Arbor, MI 48109, USA
| | - Christopher Faulk
- Department of Animal Science, College of Food, Agricultural and Natural Resource Sciences, University of Minnesota, 1988 Fitch Avenue, Saint Paul, MN 55108, USA
| | - Dana C Dolinoy
- Department of Environmental Health Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
- Department of Nutritional Sciences, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
- Department of Computational Medicine and Bioinformatics, School of Medicine, University of Michigan, 100 Washtenaw Ave, Ann Arbor, MI 48109, USA
| | - Maureen A Sartor
- Department of Computational Medicine and Bioinformatics, School of Medicine, University of Michigan, 100 Washtenaw Ave, Ann Arbor, MI 48109, USA
- Department of Biostatistics, School of Public Health, University of Michigan, 1415 Washington Heights, Ann Arbor, MI 48109, USA
| |
Collapse
|
13
|
Zhang J, Lang M, Zhou Y, Zhang Y. Predicting RNA structures and functions by artificial intelligence. Trends Genet 2023; 40:S0168-9525(23)00229-9. [PMID: 39492264 DOI: 10.1016/j.tig.2023.10.001] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2023] [Revised: 08/22/2023] [Accepted: 10/03/2023] [Indexed: 11/05/2024]
Abstract
RNA functions by interacting with its intended targets structurally. However, due to the dynamic nature of RNA molecules, RNA structures are difficult to determine experimentally or predict computationally. Artificial intelligence (AI) has revolutionized many biomedical fields and has been progressively utilized to deduce RNA structures, target binding, and associated functionality. Integrating structural and target binding information could also help improve the robustness of AI-based RNA function prediction and RNA design. Given the rapid development of deep learning (DL) algorithms, AI will provide an unprecedented opportunity to elucidate the sequence-structure-function relation of RNAs.
Collapse
Affiliation(s)
- Jun Zhang
- National Engineering Laboratory for Big Data System Computing Technology, College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, 518060, China
| | - Mei Lang
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, Guangdong, 518106, China
| | - Yaoqi Zhou
- Institute of Systems and Physical Biology, Shenzhen Bay Laboratory, Shenzhen, Guangdong, 518106, China.
| | - Yang Zhang
- School of Science, Harbin Institute of Technology, Shenzhen, Guangdong, 518055, China.
| |
Collapse
|
14
|
Hou J, Wei H, Liu B. iPiDA-SWGCN: Identification of piRNA-disease associations based on Supplementarily Weighted Graph Convolutional Network. PLoS Comput Biol 2023; 19:e1011242. [PMID: 37339125 DOI: 10.1371/journal.pcbi.1011242] [Citation(s) in RCA: 8] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/23/2022] [Accepted: 06/05/2023] [Indexed: 06/22/2023] Open
Abstract
Accurately identifying potential piRNA-disease associations is of great importance in uncovering the pathogenesis of diseases. Recently, several machine-learning-based methods have been proposed for piRNA-disease association detection. However, they are suffering from the high sparsity of piRNA-disease association network and the Boolean representation of piRNA-disease associations ignoring the confidence coefficients. In this study, we propose a supplementarily weighted strategy to solve these disadvantages. Combined with Graph Convolutional Networks (GCNs), a novel predictor called iPiDA-SWGCN is proposed for piRNA-disease association prediction. There are three main contributions of iPiDA-SWGCN: (i) Potential piRNA-disease associations are preliminarily supplemented in the sparse piRNA-disease network by integrating various basic predictors to enrich network structure information. (ii) The original Boolean piRNA-disease associations are assigned with different relevance confidence to learn node representations from neighbour nodes in varying degrees. (iii) The experimental results show that iPiDA-SWGCN achieves the best performance compared with the other state-of-the-art methods, and can predict new piRNA-disease associations.
Collapse
Affiliation(s)
- Jialu Hou
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
| | - Hang Wei
- School of Computer Science and Technology, Xidian University, Xi'an, Shaanxi, China
| | - Bin Liu
- School of Computer Science and Technology, Beijing Institute of Technology, Beijing, China
- Advanced Research Institute of Multidisciplinary Science, Beijing Institute of Technology, Beijing, China
| |
Collapse
|
15
|
Meng X, Shang J, Ge D, Yang Y, Zhang T, Liu JX. ETGPDA: identification of piRNA-disease associations based on embedding transformation graph convolutional network. BMC Genomics 2023; 24:279. [PMID: 37226081 PMCID: PMC10210294 DOI: 10.1186/s12864-023-09380-8] [Citation(s) in RCA: 6] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/24/2023] [Accepted: 05/15/2023] [Indexed: 05/26/2023] Open
Abstract
BACKGROUND Piwi-interacting RNAs (piRNAs) have been proven to be closely associated with human diseases. The identification of the potential associations between piRNA and disease is of great significance for complex diseases. Traditional "wet experiment" is time-consuming and high-priced, predicting the piRNA-disease associations by computational methods is of great significance. METHODS In this paper, a method based on the embedding transformation graph convolution network is proposed to predict the piRNA-disease associations, named ETGPDA. Specifically, a heterogeneous network is constructed based on the similarity information of piRNA and disease, as well as the known piRNA-disease associations, which is applied to extract low-dimensional embeddings of piRNA and disease based on graph convolutional network with an attention mechanism. Furthermore, the embedding transformation module is developed for the problem of embedding space inconsistency, which is lightweighter, stronger learning ability and higher accuracy. Finally, the piRNA-disease association score is calculated by the similarity of the piRNA and disease embedding. RESULTS Evaluated by fivefold cross-validation, the AUC of ETGPDA achieves 0.9603, which is better than the other five selected computational models. The case studies based on Head and neck squamous cell carcinoma and Alzheimer's disease further prove the superior performance of ETGPDA. CONCLUSIONS Hence, the ETGPDA is an effective method for predicting the hidden piRNA-disease associations.
Collapse
Affiliation(s)
- Xianghan Meng
- School of Computer Science, Qufu Normal University, Rizhao, 276826 China
| | - Junliang Shang
- School of Computer Science, Qufu Normal University, Rizhao, 276826 China
| | - Daohui Ge
- School of Computer Science, Qufu Normal University, Rizhao, 276826 China
| | - Yi Yang
- School of Computer Science, Qufu Normal University, Rizhao, 276826 China
| | - Tongdui Zhang
- Science and Technology Innovation Service Institution of Rizhao, Rizhao, 276826 China
| | - Jin-Xing Liu
- School of Computer Science, Qufu Normal University, Rizhao, 276826 China
| |
Collapse
|
16
|
The epigenetic regulatory mechanism of PIWI/piRNAs in human cancers. Mol Cancer 2023; 22:45. [PMID: 36882835 PMCID: PMC9990219 DOI: 10.1186/s12943-023-01749-3] [Citation(s) in RCA: 23] [Impact Index Per Article: 11.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/25/2022] [Accepted: 02/16/2023] [Indexed: 03/09/2023] Open
Abstract
PIWI proteins have a strong correlation with PIWI-interacting RNAs (piRNAs), which are significant in development and reproduction of organisms. Recently, emerging evidences have indicated that apart from the reproductive function, PIWI/piRNAs with abnormal expression, also involve greatly in varieties of human cancers. Moreover, human PIWI proteins are usually expressed only in germ cells and hardly in somatic cells, so the abnormal expression of PIWI proteins in different types of cancer offer a promising opportunity for precision medicine. In this review, we discussed current researches about the biogenesis of piRNA, its epigenetic regulatory mechanisms in human cancers, such as N6-methyladenosine (m6A) methylation, histone modifications, DNA methylation and RNA interference, providing novel insights into the markers for clinical diagnosis, treatment and prognosis in human cancers.
Collapse
|
17
|
Zheng K, Zhang XL, Wang L, You ZH, Zhan ZH, Li HY. Line graph attention networks for predicting disease-associated Piwi-interacting RNAs. Brief Bioinform 2022; 23:6748487. [PMID: 36198846 DOI: 10.1093/bib/bbac393] [Citation(s) in RCA: 8] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/06/2022] [Revised: 08/08/2022] [Accepted: 08/12/2022] [Indexed: 12/14/2022] Open
Abstract
PIWI proteins and Piwi-Interacting RNAs (piRNAs) are commonly detected in human cancers, especially in germline and somatic tissues, and correlate with poorer clinical outcomes, suggesting that they play a functional role in cancer. As the problem of combinatorial explosions between ncRNA and disease exposes gradually, new bioinformatics methods for large-scale identification and prioritization of potential associations are therefore of interest. However, in the real world, the network of interactions between molecules is enormously intricate and noisy, which poses a problem for efficient graph mining. Line graphs can extend many heterogeneous networks to replace dichotomous networks. In this study, we present a new graph neural network framework, line graph attention networks (LGAT). And we apply it to predict PiRNA disease association (GAPDA). In the experiment, GAPDA performs excellently in 5-fold cross-validation with an AUC of 0.9038. Not only that, it still has superior performance compared with methods based on collaborative filtering and attribute features. The experimental results show that GAPDA ensures the prospect of the graph neural network on such problems and can be an excellent supplement for future biomedical research.
Collapse
Affiliation(s)
- Kai Zheng
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| | | | - Lei Wang
- College of Information Science and Engineering, Zaozhuang University, Shandong 277100, China.,Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhu-Hong You
- Big Data and Intelligent Computing Research Center, Guangxi Academy of Sciences, Nanning 530007, China
| | - Zhao-Hui Zhan
- Department of Computer Science, City University of Hong Kong, Kowloon, Hong Kong
| | - Hao-Yuan Li
- Hunan Provincial Key Lab on Bioinformatics, School of Computer Science and Engineering, Central South University, Changsha, 410083, China
| |
Collapse
|
18
|
iPiDA-GCN: Identification of piRNA-disease associations based on Graph Convolutional Network. PLoS Comput Biol 2022; 18:e1010671. [DOI: 10.1371/journal.pcbi.1010671] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/07/2022] [Revised: 11/14/2022] [Accepted: 10/20/2022] [Indexed: 11/15/2022] Open
Abstract
Motivation
Piwi-interacting RNAs (piRNAs) play a critical role in the progression of various diseases. Accurately identifying the associations between piRNAs and diseases is important for diagnosing and prognosticating diseases. Although some computational methods have been proposed to detect piRNA-disease associations, it is challenging for these methods to effectively capture nonlinear and complex relationships between piRNAs and diseases because of the limited training data and insufficient association representation.
Results
With the growth of piRNA-disease association data, it is possible to design a more complex machine learning method to solve this problem. In this study, we propose a computational method called iPiDA-GCN for piRNA-disease association identification based on graph convolutional networks (GCNs). The iPiDA-GCN predictor constructs the graphs based on piRNA sequence information, disease semantic information and known piRNA-disease associations. Two GCNs (Asso-GCN and Sim-GCN) are used to extract the features of both piRNAs and diseases by capturing the association patterns from piRNA-disease interaction network and two similarity networks. GCNs can capture complex network structure information from these networks, and learn discriminative features. Finally, the full connection networks and inner production are utilized as the output module to predict piRNA-disease association scores. Experimental results demonstrate that iPiDA-GCN achieves better performance than the other state-of-the-art methods, benefitted from the discriminative features extracted by Asso-GCN and Sim-GCN. The iPiDA-GCN predictor is able to detect new piRNA-disease associations to reveal the potential pathogenesis at the RNA level. The data and source code are available at http://bliulab.net/iPiDA-GCN/.
Collapse
|
19
|
Ali SD, Tayara H, Chong KT. Interpretable machine learning identification of arginine methylation sites. Comput Biol Med 2022; 147:105767. [DOI: 10.1016/j.compbiomed.2022.105767] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2022] [Revised: 06/12/2022] [Accepted: 06/18/2022] [Indexed: 11/25/2022]
|
20
|
Zhang T, Chen L, Li R, Liu N, Huang X, Wong G. PIWI-interacting RNAs in human diseases: databases and computational models. Brief Bioinform 2022; 23:6603448. [PMID: 35667080 DOI: 10.1093/bib/bbac217] [Citation(s) in RCA: 11] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2022] [Revised: 04/24/2022] [Accepted: 05/09/2022] [Indexed: 11/12/2022] Open
Abstract
PIWI-interacting RNAs (piRNAs) are short 21-35 nucleotide molecules that comprise the largest class of non-coding RNAs and found in a large diversity of species including yeast, worms, flies, plants and mammals including humans. The most well-understood function of piRNAs is to monitor and protect the genome from transposons particularly in germline cells. Recent data suggest that piRNAs may have additional functions in somatic cells although they are expressed there in far lower abundance. Compared with microRNAs (miRNAs), piRNAs have more limited bioinformatics resources available. This review collates 39 piRNA specific and non-specific databases and bioinformatics resources, describes and compares their utility and attributes and provides an overview of their place in the field. In addition, we review 33 computational models based upon function: piRNA prediction, transposon element and mRNA-related piRNA prediction, cluster prediction, signature detection, target prediction and disease association. Based on the collection of databases and computational models, we identify trends and potential gaps in tool development. We further analyze the breadth and depth of piRNA data available in public sources, their contribution to specific human diseases, particularly in cancer and neurodegenerative conditions, and highlight a few specific piRNAs that appear to be associated with these diseases. This briefing presents the most recent and comprehensive mapping of piRNA bioinformatics resources including databases, models and tools for disease associations to date. Such a mapping should facilitate and stimulate further research on piRNAs.
Collapse
Affiliation(s)
- Tianjiao Zhang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Liang Chen
- Department of Computer Science, School of Engineering, Shantou University, Shantou, China
| | - Rongzhen Li
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Ning Liu
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Xiaobing Huang
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| | - Garry Wong
- Faculty of Health Sciences, University of Macau, Taipa, Macau S.A.R. 999078, China
| |
Collapse
|