1
|
Small RNA Targets: Advances in Prediction Tools and High-Throughput Profiling. BIOLOGY 2022; 11:biology11121798. [PMID: 36552307 PMCID: PMC9775672 DOI: 10.3390/biology11121798] [Citation(s) in RCA: 3] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Subscribe] [Scholar Register] [Received: 11/14/2022] [Revised: 11/27/2022] [Accepted: 12/08/2022] [Indexed: 12/14/2022]
Abstract
MicroRNAs (miRNAs) are an abundant class of small non-coding RNAs that regulate gene expression at the post-transcriptional level. They are suggested to be involved in most biological processes of the cell primarily by targeting messenger RNAs (mRNAs) for cleavage or translational repression. Their binding to their target sites is mediated by the Argonaute (AGO) family of proteins. Thus, miRNA target prediction is pivotal for research and clinical applications. Moreover, transfer-RNA-derived fragments (tRFs) and other types of small RNAs have been found to be potent regulators of Ago-mediated gene expression. Their role in mRNA regulation is still to be fully elucidated, and advancements in the computational prediction of their targets are in their infancy. To shed light on these complex RNA-RNA interactions, the availability of good quality high-throughput data and reliable computational methods is of utmost importance. Even though the arsenal of computational approaches in the field has been enriched in the last decade, there is still a degree of discrepancy between the results they yield. This review offers an overview of the relevant advancements in the field of bioinformatics and machine learning and summarizes the key strategies utilized for small RNA target prediction. Furthermore, we report the recent development of high-throughput sequencing technologies, and explore the role of non-miRNA AGO driver sequences.
Collapse
|
2
|
Mokhtaridoost M, Maass PG, Gönen M. Identifying Tissue- and Cohort-Specific RNA Regulatory Modules in Cancer Cells Using Multitask Learning. Cancers (Basel) 2022; 14:cancers14194939. [PMID: 36230862 PMCID: PMC9563725 DOI: 10.3390/cancers14194939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/24/2022] Open
Abstract
Simple Summary Understanding the underlying biological mechanisms of primary tumors is crucial for predicting how tumors respond to therapies and exploring accurate treatment strategies. miRNA–mRNA interactions have a major effect on many biological processes that are important in the formation and progression of cancer. In this study, we introduced a computational pipeline to extract tissue- and cohort-specific miRNA–mRNA regulatory modules of multiple cancer types from the same origin using miRNA and mRNA expression profiles of primary tumors. Our model identified regulatory modules of underlying cancer types (i.e., cohort-specific) and shared regulatory modules between cohorts (i.e., tissue-specific). Abstract MicroRNA (miRNA) alterations significantly impact the formation and progression of human cancers. miRNAs interact with messenger RNAs (mRNAs) to facilitate degradation or translational repression. Thus, identifying miRNA–mRNA regulatory modules in cohorts of primary tumor tissues are fundamental for understanding the biology of tumor heterogeneity and precise diagnosis and treatment. We established a multitask learning sparse regularized factor regression (MSRFR) method to determine key tissue- and cohort-specific miRNA–mRNA regulatory modules from expression profiles of tumors. MSRFR simultaneously models the sparse relationship between miRNAs and mRNAs and extracts tissue- and cohort-specific miRNA–mRNA regulatory modules separately. We tested the model’s ability to determine cohort-specific regulatory modules of multiple cancer cohorts from the same tissue and their underlying tissue-specific regulatory modules by extracting similarities between cancer cohorts (i.e., blood, kidney, and lung). We also detected tissue-specific and cohort-specific signatures in the corresponding regulatory modules by comparing our findings from various other tissues. We show that MSRFR effectively determines cancer-related miRNAs in cohort-specific regulatory modules, distinguishes tissue- and cohort-specific regulatory modules from each other, and extracts tissue-specific information from different cohorts of disease-related tissue. Our findings indicate that the MSRFR model can support current efforts in precision medicine to define tumor-specific miRNA–mRNA signatures.
Collapse
Affiliation(s)
- Milad Mokhtaridoost
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
- Graduate School of Sciences and Engineering, Koç University, İstanbul 34450, Turkey
| | - Philipp G. Maass
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, İstanbul 34450, Turkey
- School of Medicine, Koç University, İstanbul 34450, Turkey
- Correspondence: ; Tel.: +90-212-338-1813
| |
Collapse
|
3
|
Feitosa RM, Prieto-Oliveira P, Brentani H, Machado-Lima A. MicroRNA target prediction tools for animals: Where we are at and where we are going to - A systematic review. Comput Biol Chem 2022; 100:107729. [DOI: 10.1016/j.compbiolchem.2022.107729] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/05/2021] [Revised: 07/08/2022] [Accepted: 07/09/2022] [Indexed: 11/26/2022]
|
4
|
Liñares-Blanco J, Pazos A, Fernandez-Lozano C. Machine learning analysis of TCGA cancer data. PeerJ Comput Sci 2021; 7:e584. [PMID: 34322589 PMCID: PMC8293929 DOI: 10.7717/peerj-cs.584] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/03/2021] [Accepted: 05/17/2021] [Indexed: 06/13/2023]
Abstract
In recent years, machine learning (ML) researchers have changed their focus towards biological problems that are difficult to analyse with standard approaches. Large initiatives such as The Cancer Genome Atlas (TCGA) have allowed the use of omic data for the training of these algorithms. In order to study the state of the art, this review is provided to cover the main works that have used ML with TCGA data. Firstly, the principal discoveries made by the TCGA consortium are presented. Once these bases have been established, we begin with the main objective of this study, the identification and discussion of those works that have used the TCGA data for the training of different ML approaches. After a review of more than 100 different papers, it has been possible to make a classification according to following three pillars: the type of tumour, the type of algorithm and the predicted biological problem. One of the conclusions drawn in this work shows a high density of studies based on two major algorithms: Random Forest and Support Vector Machines. We also observe the rise in the use of deep artificial neural networks. It is worth emphasizing, the increase of integrative models of multi-omic data analysis. The different biological conditions are a consequence of molecular homeostasis, driven by both protein coding regions, regulatory elements and the surrounding environment. It is notable that a large number of works make use of genetic expression data, which has been found to be the preferred method by researchers when training the different models. The biological problems addressed have been classified into five types: prognosis prediction, tumour subtypes, microsatellite instability (MSI), immunological aspects and certain pathways of interest. A clear trend was detected in the prediction of these conditions according to the type of tumour. That is the reason for which a greater number of works have focused on the BRCA cohort, while specific works for survival, for example, were centred on the GBM cohort, due to its large number of events. Throughout this review, it will be possible to go in depth into the works and the methodologies used to study TCGA cancer data. Finally, it is intended that this work will serve as a basis for future research in this field of study.
Collapse
Affiliation(s)
- Jose Liñares-Blanco
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
| | - Alejandro Pazos
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| | - Carlos Fernandez-Lozano
- CITIC-Research Center of Information and Communication Technologies, University of A Coruna, A Coruña, Spain
- Department of Computer Science and Information Technologies, Faculty of Computer Science, University of A Coruna, A Coruña, Spain
- Grupo de Redes de Neuronas Artificiales y Sistemas Adaptativos. Imagen Médica y Diagnóstico Radiológico (RNASA-IMEDIR). Complexo Hospitalario Universitario de A Coruña (CHUAC), SERGAS, Universidade da Coruña, Instituto de Investigación Biomédica de A Coruña (INIBIC), A Coruña, Spain
| |
Collapse
|
5
|
Simultaneous learning of individual microRNA-gene interactions and regulatory comodules. BMC Bioinformatics 2021; 22:237. [PMID: 33971820 DOI: 10.1186/s12859-021-04151-2] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/06/2020] [Accepted: 04/23/2021] [Indexed: 11/10/2022] Open
Abstract
BACKGROUND MicroRNAs (miRNAs) function in post-transcriptional regulation of gene expression by binding to target messenger RNAs (mRNAs). Because of the key part that miRNAs play, understanding the correct regulatory role of miRNAs in diverse patho-physiological conditions is of great interest. Although it is known that miRNAs act combinatorially to regulate genes, precise identification of miRNA-gene interactions and their specific functional roles in regulatory comodules remains a challenge. We developed THEIA, an effective method for simultaneously predicting miRNA-gene interactions and regulatory comodules, which group functionally related miRNAs and genes via non-negative matrix factorization (NMF). RESULTS We apply THEIA to RNA sequencing data from breast invasive carcinoma samples and demonstrate its effectiveness in discovering biologically significant regulatory comodules that are significantly enriched in spatial miRNA clusters, biological pathways, and various cancers. CONCLUSIONS THEIA is a theoretically rigorous optimization algorithm that simultaneously predicts the strength and direction (i.e., up-regulation or down-regulation) of the effect of modules of miRNAs on a gene. We posit that if THEIA is capable of recovering known clusters of genes and miRNA, then the clusters found by our method not previously identified by literature are also likely to have biological significance. We believe that these novel regulatory comodules found by our method will be a springboard for further research into the specific functional roles of these new functional ensembles of miRNAs and genes,especially those related to diseases like breast cancer.
Collapse
|
6
|
Deng F, Huang J, Yuan X, Cheng C, Zhang L. Performance and efficiency of machine learning algorithms for analyzing rectangular biomedical data. J Transl Med 2021; 101:430-441. [PMID: 33574440 DOI: 10.1038/s41374-020-00525-x] [Citation(s) in RCA: 14] [Impact Index Per Article: 4.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/31/2020] [Revised: 10/20/2020] [Accepted: 12/02/2020] [Indexed: 12/13/2022] Open
Abstract
Most biomedical datasets, including those of 'omics, population studies, and surveys, are rectangular in shape and have few missing data. Recently, their sample sizes have grown significantly. Rigorous analyses on these large datasets demand considerably more efficient and more accurate algorithms. Machine learning (ML) algorithms have been used to classify outcomes in biomedical datasets, including random forests (RF), decision tree (DT), artificial neural networks (ANN), and support vector machine (SVM). However, their performance and efficiency in classifying multi-category outcomes of rectangular data are poorly understood. Therefore, we compared these metrics among the 4 ML algorithms. As an example, we created a large rectangular dataset using the female breast cancers in the surveillance, epidemiology, and end results-18 database, which were diagnosed in 2004 and followed up until December 2016. The outcome was the five-category cause of death, namely alive, non-breast cancer, breast cancer, cardiovascular disease, and other cause. We analyzed the 54 dichotomized features from ~45,000 patients using MatLab (version 2018a) and the tenfold cross-validation approach. The accuracy in classifying five-category cause of death with DT, RF, ANN, and SVM was 69.21%, 70.23%, 70.16%, and 69.06%, respectively, which was higher than the accuracy of 68.12% with multinomial logistic regression. Based on the features' information entropy, we optimized dimension reduction (i.e., reduce the number of features in models). We found 32 or more features were required to maintain similar accuracy, while the running time decreased from 55.57 s for 54 features to 25.99 s for 32 features in RF, from 12.92 s to 10.48 s in ANN, and from 175.50 s to 67.81 s in SVM. In summary, we here show that RF, DT, ANN, and SVM had similar accuracy for classifying multi-category outcomes in this large rectangular dataset. Dimension reduction based on information gain will increase the model's efficiency while maintaining classification accuracy.
Collapse
Affiliation(s)
- Fei Deng
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Jibing Huang
- School of Electrical and Electronic Engineering, Shanghai Institute of Technology, Shanghai, China
| | - Xiaoling Yuan
- Department of Infectious Disease, Shanghai Ninth People's Hospital, Shanghai Jiao Tong University School of Medicine Shanghai, Shanghai, China
| | - Chao Cheng
- Department of Medicine, Baylor College of Medicine, Houston, TX, 77030, USA
- The Institute for Clinical and Translational Research, Baylor College of Medicine, Houston, TX, 77030, USA
| | - Lanjing Zhang
- Department of Pathology, Princeton Medical Center, Plainsboro, NJ, USA.
- Department of Biological Sciences, Rutgers University, Newark, NJ, USA.
- Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, USA.
- Department of Chemical Biology, Ernest Mario School of Pharmacy, Rutgers University, Piscataway, NJ, USA.
| |
Collapse
|
7
|
Mokhtaridoost M, Gönen M. An efficient framework to identify key miRNA-mRNA regulatory modules in cancer. Bioinformatics 2020; 36:i592-i600. [PMID: 33381822 DOI: 10.1093/bioinformatics/btaa798] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/12/2022] Open
Abstract
MOTIVATION Micro-RNAs (miRNAs) are known as the important components of RNA silencing and post-transcriptional gene regulation, and they interact with messenger RNAs (mRNAs) either by degradation or by translational repression. miRNA alterations have a significant impact on the formation and progression of human cancers. Accordingly, it is important to establish computational methods with high predictive performance to identify cancer-specific miRNA-mRNA regulatory modules. RESULTS We presented a two-step framework to model miRNA-mRNA relationships and identify cancer-specific modules between miRNAs and mRNAs from their matched expression profiles of more than 9000 primary tumors. We first estimated the regulatory matrix between miRNA and mRNA expression profiles by solving multiple linear programming problems. We then formulated a unified regularized factor regression (RFR) model that simultaneously estimates the effective number of modules (i.e. latent factors) and extracts modules by decomposing regulatory matrix into two low-rank matrices. Our RFR model groups correlated miRNAs together and correlated mRNAs together, and also controls sparsity levels of both matrices. These attributes lead to interpretable results with high predictive performance. We applied our method on a very comprehensive data collection by including 32 TCGA cancer types. To find the biological relevance of our approach, we performed functional gene set enrichment and survival analyses. A large portion of the identified modules are significantly enriched in Hallmark, PID and KEGG pathways/gene sets. To validate the identified modules, we also performed literature validation as well as validation using experimentally supported miRTarBase database. AVAILABILITY AND IMPLEMENTATION Our implementation of proposed two-step RFR algorithm in R is available at https://github.com/MiladMokhtaridoost/2sRFR together with the scripts that replicate the reported experiments. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, İstanbul 34450, Turkey.,School of Medicine, Koç University, İstanbul 34450, Turkey.,Department of Biomedical Engineering, School of Medicine, Oregon Health & Science University, Portland, OR 97239, USA
| |
Collapse
|
8
|
He L, Wang Z, Zhou R, Xiong W, Yang Y, Song N, Qian J. Dexmedetomidine exerts cardioprotective effect through miR-146a-3p targeting IRAK1 and TRAF6 via inhibition of the NF-κB pathway. Biomed Pharmacother 2020; 133:110993. [PMID: 33220608 DOI: 10.1016/j.biopha.2020.110993] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/08/2020] [Revised: 10/28/2020] [Accepted: 11/01/2020] [Indexed: 12/11/2022] Open
Abstract
BACKGROUND Myocardial ischemia/reperfusion (I/R) injury is a common cause of mortality. Cardiac miR-146a is emerging as a potent regulator of myocardial function. Dexmedetomidine preconditioning provides cardioprotective effects, of which mechanisms related to miR-146a-3p are unclear. METHODS A myocardial I/R model in rats and a cellular anoxia/reoxygenation (A/R) model in H9C2 cells were established and preconditioned with dexmedetomidine or not. H9C2 cells were transfected with mimics, inhibitor, or negative controls of miR-146a-3p, and siRNAs of IRAK1 or TRAF6. Relative expressions of miR-146a-3p were determined by quantitative real-time polymerase chain reaction. The apoptosis rates and reactive oxygen species (ROS) levels in H9C2 cells were examined by flow cytometry. Protein expressions of IRAK1, TRAF6, cleaved Caspase-3, BAX, BCL-2, NF-κB p65, phosphorylated NF-κB p65 (p-NF-κB p65), IκBα, and phosphorylated IκBα (p-IκBα) in H9C2 cells were detected by Western blot. RESULTS Dexmedetomidine decreased myocardial infarction size and apoptosis rates of H9C2 cells. Dexmedetomidine upregulated expression of miR-146a-3p. Dexmedetomidine significantly decreased protein expressions of IRAK1, TRAF6, cleaved Caspase-3, BAX, and NF-κB p65, but increased expressions of BCL-2 in H9C2 cells. miR-146a-3p overexpression strengthened the anti-apoptotic effect induced by dexmedetomidine in H9C2 cells via decreasing protein levels of IRAK1, TRAF6, cleaved Caspase-3, BAX, NF-κB p65, p-NF-κB p65, and p-IκBα and increasing protein level of BCL-2. Downregulation of miR-146a-3p reversed the changes in these proteins in H9C2 cells. Expressions of NF-κB p65 and p-NF-κB p65 were further decreased following knockdown of IRAK1 or TRAF6. ROS emission was significantly increased after A/R, while significantly decreased following dexmedetomidine preconditioning in H9C2 cells transfected with siIRAK1 or siTRAF6. CONCLUSION miR-146a-3p targeting IRAK1 and TRAF6 through inhibition of NF-κB signaling pathway and ROS emission is involved in cardioprotection induced by dexmedetomidine pretreatment.
Collapse
Affiliation(s)
- Liang He
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China; Department of Anesthesiology, Yan'an Hospital of Kunming City, Kunming Medical University, Kunming, Yunnan Province, 650051, China
| | - Zhuoran Wang
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Rui Zhou
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Wei Xiong
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Yuqiao Yang
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Ning Song
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China
| | - Jinqiao Qian
- Department of Anesthesiology, First Affiliated Hospital of Kunming Medical University, Kunming, Yunnan Province, 650032, China.
| |
Collapse
|
9
|
Cui J, Shu J. Circulating microRNA trafficking and regulation: computational principles and practice. Brief Bioinform 2020; 21:1313-1326. [PMID: 31504144 PMCID: PMC7412956 DOI: 10.1093/bib/bbz079] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/18/2019] [Revised: 06/07/2019] [Accepted: 06/07/2019] [Indexed: 01/18/2023] Open
Abstract
Rapid advances in genomics discovery tools and a growing realization of microRNA's implication in intercellular communication have led to a proliferation of studies of circulating microRNA sorting and regulation across cells and different species. Although sometimes, reaching controversial scientific discoveries and conclusions, these studies have yielded new insights in the functional roles of circulating microRNA and a plethora of analytical methods and tools. Here, we consider this body of work in light of key computational principles underpinning discovery of circulating microRNAs in terms of their sorting and targeting, with the goal of providing practical guidance for applications that is focused on the design and analysis of circulating microRNAs and their context-dependent regulation. We survey a broad range of informatics methods and tools that are available to the researcher, discuss their key features, applications and various unsolved problems and close this review with prospects and broader implication of this field.
Collapse
Affiliation(s)
- Juan Cui
- Systems Biology and Biomedical Informatics Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| | - Jiang Shu
- Systems Biology and Biomedical Informatics Laboratory, Department of Computer Science and Engineering, University of Nebraska-Lincoln, Lincoln, NE, USA
| |
Collapse
|
10
|
Cava C, Novello C, Martelli C, Lodico A, Ottobrini L, Piccotti F, Truffi M, Corsi F, Bertoli G, Castiglioni I. Theranostic application of miR-429 in HER2+ breast cancer. Am J Cancer Res 2020; 10:50-61. [PMID: 31903105 PMCID: PMC6929607 DOI: 10.7150/thno.36274] [Citation(s) in RCA: 17] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2019] [Accepted: 09/03/2019] [Indexed: 12/12/2022] Open
Abstract
Human epidermal growth factor receptor 2 (HER2) is overexpressed/amplified in one third of breast cancers (BCs), and is associated with the poorer prognosis and the higher metastatic potential in BC. Emerging evidences highlight the role of microRNAs (miRNAs) in the regulation of several cellular processes, including BC. Methods: Here we identified, by in silico approach, a group of three miRNAs with central biological role (high degree centrality) in HER2+ BC. We validated their dysregulation in HER2+ BC and we analysed their functional role by in vitro approaches on selected cell lines and by in vivo experiments in an animal model. Results: We found that their expression is dysregulated in both HER2+ BC cell lines and human samples. Focusing our study on the only upregulated miRNA, miR-429, we discovered that it acts as an oncogene and its upregulation is required for HER2+ cell proliferation. It controls the metastatic potential of HER2+ BC subtype by regulating migration and invasion of the cell. Conclusions: In HER2+ BC oncogenic miR-429 is able to regulate HIF1α pathway by directly targeting VHL mRNA, a molecule important for the degradation of HIF1α. The overexpression of miR-429, observed in HER2+ BC, causes increased proliferation and migration of the BC cells. More important, silencing miR-429 succeeds in delaying tumor growth, thus miR-429 could be proposed as a therapeutic probe in HER2+ BC tumors.
Collapse
|
11
|
Yoon S, Nguyen HCT, Jo W, Kim J, Chi SM, Park J, Kim SY, Nam D. Biclustering analysis of transcriptome big data identifies condition-specific microRNA targets. Nucleic Acids Res 2019; 47:e53. [PMID: 30820547 PMCID: PMC6511842 DOI: 10.1093/nar/gkz139] [Citation(s) in RCA: 12] [Impact Index Per Article: 2.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2018] [Accepted: 02/19/2019] [Indexed: 12/26/2022] Open
Abstract
We present a novel approach to identify human microRNA (miRNA) regulatory modules (mRNA targets and relevant cell conditions) by biclustering a large collection of mRNA fold-change data for sequence-specific targets. Bicluster targets were assessed using validated messenger RNA (mRNA) targets and exhibited on an average 17.0% (median 19.4%) improved gain in certainty (sensitivity + specificity). The net gain was further increased up to 32.0% (median 33.4%) by incorporating functional networks of targets. We analyzed cancer-specific biclusters and found that the PI3K/Akt signaling pathway is strongly enriched with targets of a few miRNAs in breast cancer and diffuse large B-cell lymphoma. Indeed, five independent prognostic miRNAs were identified, and repression of bicluster targets and pathway activity by miR-29 was experimentally validated. In total, 29 898 biclusters for 459 human miRNAs were collected in the BiMIR database where biclusters are searchable for miRNAs, tissues, diseases, keywords and target genes.
Collapse
Affiliation(s)
- Sora Yoon
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Hai C T Nguyen
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Woobeen Jo
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Jinhwan Kim
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Sang-Mun Chi
- School of Computer Science and Engineering, Kyungsung University, Busan 48434, Republic of Korea
| | - Jiyoung Park
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| | - Seon-Young Kim
- Department of Functional Genomics, University of Science and Technology (UST), Daejeon 34141, Republic of Korea.,Genome Editing Research Center, Personalized Genomic Medicine Research Center, Korea Research Institute of Bioscience and Biotechnology (KRIBB), Daejeon 34141, Republic of Korea
| | - Dougu Nam
- School of Life Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea.,Department of Mathematical Sciences, Ulsan National Institute of Science and Technology, Ulsan 44919, Republic of Korea
| |
Collapse
|
12
|
Fang CH, Theera-Ampornpunt N, Roth MA, Grama A, Chaterji S. AIKYATAN: mapping distal regulatory elements using convolutional learning on GPU. BMC Bioinformatics 2019; 20:488. [PMID: 31590652 PMCID: PMC6781298 DOI: 10.1186/s12859-019-3049-1] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/02/2019] [Accepted: 08/22/2019] [Indexed: 12/02/2022] Open
Abstract
Background The data deluge can leverage sophisticated ML techniques for functionally annotating the regulatory non-coding genome. The challenge lies in selecting the appropriate classifier for the specific functional annotation problem, within the bounds of the hardware constraints and the model’s complexity. In our system Aikyatan, we annotate distal epigenomic regulatory sites, e.g., enhancers. Specifically, we develop a binary classifier that classifies genome sequences as distal regulatory regions or not, given their histone modifications’ combinatorial signatures. This problem is challenging because the regulatory regions are distal to the genes, with diverse signatures across classes (e.g., enhancers and insulators) and even within each class (e.g., different enhancer sub-classes). Results We develop a suite of ML models, under the banner Aikyatan, including SVM models, random forest variants, and deep learning architectures, for distal regulatory element (DRE) detection. We demonstrate, with strong empirical evidence, deep learning approaches have a computational advantage. Plus, convolutional neural networks (CNN) provide the best-in-class accuracy, superior to the vanilla variant. With the human embryonic cell line H1, CNN achieves an accuracy of 97.9% and an order of magnitude lower runtime than the kernel SVM. Running on a GPU, the training time is sped up 21x and 30x (over CPU) for DNN and CNN, respectively. Finally, our CNN model enjoys superior prediction performance vis-‘a-vis the competition. Specifically, Aikyatan-CNN achieved 40% higher validation rate versus CSIANN and the same accuracy as RFECS. Conclusions Our exhaustive experiments using an array of ML tools validate the need for a model that is not only expressive but can scale with increasing data volumes and diversity. In addition, a subset of these datasets have image-like properties and benefit from spatial pooling of features. Our Aikyatan suite leverages diverse epigenomic datasets that can then be modeled using CNNs with optimized activation and pooling functions. The goal is to capture the salient features of the integrated epigenomic datasets for deciphering the distal (non-coding) regulatory elements, which have been found to be associated with functional variants. Our source code will be made publicly available at: https://bitbucket.org/cellsandmachines/aikyatan. Electronic supplementary material The online version of this article (10.1186/s12859-019-3049-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Chih-Hao Fang
- Department of Ag. and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | | | | | - Ananth Grama
- Department of Ag. and Biological Engineering, Purdue University, West Lafayette, IN, USA
| | - Somali Chaterji
- Department of Ag. and Biological Engineering, Purdue University, Purdue University, IN, USA.
| |
Collapse
|
13
|
Zhang P, Wu W, Chen Q, Chen M. Non-Coding RNAs and their Integrated Networks. J Integr Bioinform 2019; 16:/j/jib.2019.16.issue-3/jib-2019-0027/jib-2019-0027.xml. [PMID: 31301674 PMCID: PMC6798851 DOI: 10.1515/jib-2019-0027] [Citation(s) in RCA: 319] [Impact Index Per Article: 63.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2019] [Revised: 05/02/2019] [Accepted: 05/21/2019] [Indexed: 12/31/2022] Open
Abstract
Eukaryotic genomes are pervasively transcribed. Besides protein-coding RNAs, there are different types of non-coding RNAs that modulate complex molecular and cellular processes. RNA sequencing technologies and bioinformatics methods greatly promoted the study of ncRNAs, which revealed ncRNAs' essential roles in diverse aspects of biological functions. As important key players in gene regulatory networks, ncRNAs work with other biomolecules, including coding and non-coding RNAs, DNAs and proteins. In this review, we discuss the distinct types of ncRNAs, including housekeeping ncRNAs and regulatory ncRNAs, their versatile functions and interactions, transcription, translation, and modification. Moreover, we summarize the integrated networks of ncRNA interactions, providing a comprehensive landscape of ncRNAs regulatory roles.
Collapse
Affiliation(s)
- Peijing Zhang
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Wenyi Wu
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Qi Chen
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
| | - Ming Chen
- Department of Bioinformatics, State Key Laboratory of Plant Physiology and Biochemistry, College of Life Sciences, Zhejiang University, Hangzhou 310058, China
- James D. Watson Institute of Genome Sciences, Zhejiang University, Hangzhou 310058, China
| |
Collapse
|
14
|
Saçar Demirci MD, Yousef M, Allmer J. Computational Prediction of Functional MicroRNA-mRNA Interactions. Methods Mol Biol 2019; 1912:175-196. [PMID: 30635894 DOI: 10.1007/978-1-4939-8982-9_7] [Citation(s) in RCA: 17] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/20/2022]
Abstract
Proteins have a strong influence on the phenotype and their aberrant expression leads to diseases. MicroRNAs (miRNAs) are short RNA sequences which posttranscriptionally regulate protein expression. This regulation is driven by miRNAs acting as recognition sequences for their target mRNAs within a larger regulatory machinery. A miRNA can have many target mRNAs and an mRNA can be targeted by many miRNAs which makes it difficult to experimentally discover all miRNA-mRNA interactions. Therefore, computational methods have been developed for miRNA detection and miRNA target prediction. An abundance of available computational tools makes selection difficult. Additionally, interactions are not currently the focus of investigation although they more accurately define the regulation than pre-miRNA detection or target prediction could perform alone. We define an interaction including the miRNA source and the mRNA target. We present computational methods allowing the investigation of these interactions as well as how they can be used to extend regulatory pathways. Finally, we present a list of points that should be taken into account when investigating miRNA-mRNA interactions. In the future, this may lead to better understanding of functional interactions which may pave the way for disease marker discovery and design of miRNA-based drugs.
Collapse
Affiliation(s)
| | - Malik Yousef
- Department of Community Information Systems, Zefat Academic College, Zefat, Israel
| | - Jens Allmer
- Applied Bioinformatics, Bioscience, Wageningen University & Research, Wageningen, The Netherlands.
| |
Collapse
|
15
|
Ghoshal A, Zhang J, Roth MA, Xia KM, Grama A, Chaterji S. A Distributed Classifier for MicroRNA Target Prediction with Validation Through TCGA Expression Data. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:1037-1051. [PMID: 29993641 PMCID: PMC6175706 DOI: 10.1109/tcbb.2018.2828305] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/08/2023]
Abstract
BACKGROUND MicroRNAs (miRNAs) are approximately 22-nucleotide long regulatory RNA that mediate RNA interference by binding to cognate mRNA target regions. Here, we present a distributed kernel SVM-based binary classification scheme to predict miRNA targets. It captures the spatial profile of miRNA-mRNA interactions via smooth B-spline curves. This is accomplished separately for various input features, such as thermodynamic and sequence-based features. Further, we use a principled approach to uniformly model both canonical and non-canonical seed matches, using a novel seed enrichment metric. Finally, we verify our miRNA-mRNA pairings using an Elastic Net-based regression model on TCGA expression data for four cancer types to estimate the miRNAs that together regulate any given mRNA. RESULTS We present a suite of algorithms for miRNA target prediction, under the banner Avishkar, with superior prediction performance over the competition. Specifically, our final kernel SVM model, with an Apache Spark backend, achieves an average true positive rate (TPR) of more than 75 percent, when keeping the false positive rate of 20 percent, for non-canonical human miRNA target sites. This is an improvement of over 150 percent in the TPR for non-canonical sites, over the best-in-class algorithm. We are able to achieve such superior performance by representing the thermodynamic and sequence profiles of miRNA-mRNA interaction as curves, devising a novel seed enrichment metric, and learning an ensemble of miRNA family-specific kernel SVM classifiers. We provide an easy-to-use system for large-scale interactive analysis and prediction of miRNA targets. All operations in our system, namely candidate set generation, feature generation and transformation, training, prediction, and computing performance metrics are fully distributed and are scalable. CONCLUSIONS We have developed an efficient SVM-based model for miRNA target prediction using recent CLIP-seq data, demonstrating superior performance, evaluated using ROC curves for different species (human or mouse), or different target types (canonical or non-canonical). We analyzed the agreement between the target pairings using CLIP-seq data and using expression data from four cancer types. To the best of our knowledge, we provide the first distributed framework for miRNA target prediction based on Apache Hadoop and Spark. AVAILABILITY All source code and sample data are publicly available at https://bitbucket.org/cellsandmachines/avishkar. Our scalable implementation of kernel SVM using Apache Spark, which can be used to solve large-scale non-linear binary classification problems, is available at https://bitbucket.org/cellsandmachines/kernelsvmspark.
Collapse
Affiliation(s)
- Asish Ghoshal
- Department of Computer Science, Purdue University, West Lafayette, IN.
| | - Jinyi Zhang
- Department of Computer Science, Columbia University, New York City, NY.
| | - Michael A. Roth
- Department of Computer Science, Purdue University, West Lafayette, IN.
| | - Kevin Muyuan Xia
- Department of Computer Science, Purdue University, West Lafayette, IN.
| | - Ananth Grama
- Department of Computer Science, Purdue University, West Lafayette, IN.
| | - Somali Chaterji
- Department of Computer Science, Purdue University, West Lafayette, IN.
| |
Collapse
|