1
|
Zhang J, Liu L, Wei X, Zhao C, Luo Y, Li J, Le TD. Scanning sample-specific miRNA regulation from bulk and single-cell RNA-sequencing data. BMC Biol 2024; 22:218. [PMID: 39334271 PMCID: PMC11438147 DOI: 10.1186/s12915-024-02020-x] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2024] [Accepted: 09/24/2024] [Indexed: 09/30/2024] Open
Abstract
BACKGROUND RNA-sequencing technology provides an effective tool for understanding miRNA regulation in complex human diseases, including cancers. A large number of computational methods have been developed to make use of bulk and single-cell RNA-sequencing data to identify miRNA regulations at the resolution of multiple samples (i.e. group of cells or tissues). However, due to the heterogeneity of individual samples, there is a strong need to infer miRNA regulation specific to individual samples to uncover miRNA regulation at the single-sample resolution level. RESULTS Here, we develop a framework, Scan, for scanning sample-specific miRNA regulation. Since a single network inference method or strategy cannot perform well for all types of new data, Scan incorporates 27 network inference methods and two strategies to infer tissue-specific or cell-specific miRNA regulation from bulk or single-cell RNA-sequencing data. Results on bulk and single-cell RNA-sequencing data demonstrate the effectiveness of Scan in inferring sample-specific miRNA regulation. Moreover, we have found that incorporating the prior information of miRNA targets can generally improve the accuracy of miRNA target prediction. In addition, Scan can contribute to construct cell/tissue correlation networks and recover aggregate miRNA regulatory networks. Finally, the comparison results have shown that the performance of network inference methods is likely to be data-specific, and selecting optimal network inference methods is required for more accurate prediction of miRNA targets. CONCLUSIONS Scan provides a useful method to help infer sample-specific miRNA regulation for new data, benchmark new network inference methods and deepen the understanding of miRNA regulation at the resolution of individual samples.
Collapse
Affiliation(s)
- Junpeng Zhang
- School of Engineering, Dali University, Dali, 671003, Yunnan, China.
| | - Lin Liu
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Xuemei Wei
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Chunwen Zhao
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Yanbi Luo
- School of Engineering, Dali University, Dali, 671003, Yunnan, China
| | - Jiuyong Li
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia
| | - Thuc Duy Le
- UniSA STEM, University of South Australia, Mawson Lakes, SA, 5095, Australia.
| |
Collapse
|
2
|
Jiang H, Wang Y, Yin C, Pan H, Chen L, Feng K, Chang Y, Sun H. SLIVER: Unveiling large scale gene regulatory networks of single-cell transcriptomic data through causal structure learning and modules aggregation. Comput Biol Med 2024; 178:108690. [PMID: 38879931 DOI: 10.1016/j.compbiomed.2024.108690] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/04/2024] [Revised: 05/19/2024] [Accepted: 06/01/2024] [Indexed: 06/18/2024]
Abstract
Prevalent Gene Regulatory Network (GRN) construction methods rely on generalized correlation analysis. However, in biological systems, regulation is essentially a causal relationship that cannot be adequately captured solely through correlation. Therefore, it is more reasonable to infer GRNs from a causal perspective. Existing causal discovery algorithms typically rely on Directed Acyclic Graphs (DAGs) to model causal relationships, but it often requires traversing the entire network, which result in computational demands skyrocketing as the number of nodes grows and make causal discovery algorithms only suitable for small networks with one or two hundred nodes or fewer. In this study, we propose the SLIVER (cauSaL dIscovery Via dimEnsionality Reduction) algorithm which integrates causal structural equation model and graph decomposition. SLIVER introduces a set of factor nodes, serving as abstractions of different functional modules to integrate the regulatory relationships between genes based on their respective functions or pathways, thus reducing the GRN to the product of two low-dimensional matrices. Subsequently, we employ the structural causal model (SCM) to learn the GRN within the gene node space, enforce the DAG constraint in the low-dimensional space, and guide each factor to aggregate various functions through cosine similarity. We evaluate the performance of the SLIVER algorithm on 12 real single cell transcriptomic datasets, and demonstrate it outperforms other 12 widely used methods both in GRN inference performance and computational resource usage. The analysis of the gene information integrated by factor nodes also demonstrate the biological explanation of factor nodes in GRNs. We apply it to scRNA-seq of Type 2 diabetes mellitus to capture the transcriptional regulatory structural changes of β cells under high insulin demand.
Collapse
Affiliation(s)
- Hongyang Jiang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yuezhu Wang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Chaoyi Yin
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Hao Pan
- College of Software, Jilin University, Changchun, 130012, China
| | - Liqun Chen
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Ke Feng
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China
| | - Yi Chang
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China
| | - Huiyan Sun
- School of Artificial Intelligence, Jilin University, Changchun, 130012, China; International Center of Future Science, Jilin University, Changchun, China; Engineering Research Center of Knowledge-Driven Human-Machine Intelligence, MOE, China.
| |
Collapse
|
3
|
Fan Z, Kernan KF, Sriram A, Benos PV, Canna SW, Carcillo JA, Kim S, Park HJ. Deep neural networks with knockoff features identify nonlinear causal relations and estimate effect sizes in complex biological systems. Gigascience 2022; 12:giad044. [PMID: 37395630 PMCID: PMC10316696 DOI: 10.1093/gigascience/giad044] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2022] [Revised: 01/31/2023] [Accepted: 05/29/2023] [Indexed: 07/04/2023] Open
Abstract
BACKGROUND Learning the causal structure helps identify risk factors, disease mechanisms, and candidate therapeutics for complex diseases. However, although complex biological systems are characterized by nonlinear associations, existing bioinformatic methods of causal inference cannot identify the nonlinear relationships and estimate their effect size. RESULTS To overcome these limitations, we developed the first computational method that explicitly learns nonlinear causal relations and estimates the effect size using a deep neural network approach coupled with the knockoff framework, named causal directed acyclic graphs using deep learning variable selection (DAG-deepVASE). Using simulation data of diverse scenarios and identifying known and novel causal relations in molecular and clinical data of various diseases, we demonstrated that DAG-deepVASE consistently outperforms existing methods in identifying true and known causal relations. In the analyses, we also illustrate how identifying nonlinear causal relations and estimating their effect size help understand the complex disease pathobiology, which is not possible using other methods. CONCLUSIONS With these advantages, the application of DAG-deepVASE can help identify driver genes and therapeutic agents in biomedical studies and clinical trials.
Collapse
Affiliation(s)
- Zhenjiang Fan
- Department of Computer Science, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Kate F Kernan
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Aditya Sriram
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| | - Panayiotis V Benos
- Department of Epidemiology, University of Florida, Gainesville, FL 32610, USA
| | - Scott W Canna
- Pediatric Rheumatology, The Children's Hospital of Philadelphia, Philadelphia, PA 19104, USA
| | - Joseph A Carcillo
- Division of Pediatric Critical Care Medicine, Department of Critical Care Medicine, Children's Hospital of Pittsburgh, Center for Critical Care Nephrology and Clinical Research Investigation and Systems Modeling of Acute Illness Center, University of Pittsburgh, Pittsburgh, PA 15260,USA
| | - Soyeon Kim
- Division of Pediatric Pulmonary Medicine, Children's Hospital of Pittsburgh, Pittsburgh, PA 15224, USA
- Department of Pediatrics, School of Medicine, University of Pittsburgh, Pittsburgh, PA 15224, USA
| | - Hyun Jung Park
- Department of Human Genetics, University of Pittsburgh, Pittsburgh, PA 15213, USA
| |
Collapse
|
4
|
Mokhtaridoost M, Maass PG, Gönen M. Identifying Tissue- and Cohort-Specific RNA Regulatory Modules in Cancer Cells Using Multitask Learning. Cancers (Basel) 2022; 14:cancers14194939. [PMID: 36230862 PMCID: PMC9563725 DOI: 10.3390/cancers14194939] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2022] [Revised: 09/30/2022] [Accepted: 10/06/2022] [Indexed: 11/24/2022] Open
Abstract
Simple Summary Understanding the underlying biological mechanisms of primary tumors is crucial for predicting how tumors respond to therapies and exploring accurate treatment strategies. miRNA–mRNA interactions have a major effect on many biological processes that are important in the formation and progression of cancer. In this study, we introduced a computational pipeline to extract tissue- and cohort-specific miRNA–mRNA regulatory modules of multiple cancer types from the same origin using miRNA and mRNA expression profiles of primary tumors. Our model identified regulatory modules of underlying cancer types (i.e., cohort-specific) and shared regulatory modules between cohorts (i.e., tissue-specific). Abstract MicroRNA (miRNA) alterations significantly impact the formation and progression of human cancers. miRNAs interact with messenger RNAs (mRNAs) to facilitate degradation or translational repression. Thus, identifying miRNA–mRNA regulatory modules in cohorts of primary tumor tissues are fundamental for understanding the biology of tumor heterogeneity and precise diagnosis and treatment. We established a multitask learning sparse regularized factor regression (MSRFR) method to determine key tissue- and cohort-specific miRNA–mRNA regulatory modules from expression profiles of tumors. MSRFR simultaneously models the sparse relationship between miRNAs and mRNAs and extracts tissue- and cohort-specific miRNA–mRNA regulatory modules separately. We tested the model’s ability to determine cohort-specific regulatory modules of multiple cancer cohorts from the same tissue and their underlying tissue-specific regulatory modules by extracting similarities between cancer cohorts (i.e., blood, kidney, and lung). We also detected tissue-specific and cohort-specific signatures in the corresponding regulatory modules by comparing our findings from various other tissues. We show that MSRFR effectively determines cancer-related miRNAs in cohort-specific regulatory modules, distinguishes tissue- and cohort-specific regulatory modules from each other, and extracts tissue-specific information from different cohorts of disease-related tissue. Our findings indicate that the MSRFR model can support current efforts in precision medicine to define tumor-specific miRNA–mRNA signatures.
Collapse
Affiliation(s)
- Milad Mokhtaridoost
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
- Graduate School of Sciences and Engineering, Koç University, İstanbul 34450, Turkey
| | - Philipp G. Maass
- Genetics & Genome Biology Program, The Hospital for Sick Children, Toronto, ON M5G 1X8, Canada
- Department of Molecular Genetics, University of Toronto, Toronto, ON M5S 1A8, Canada
| | - Mehmet Gönen
- Department of Industrial Engineering, College of Engineering, Koç University, İstanbul 34450, Turkey
- School of Medicine, Koç University, İstanbul 34450, Turkey
- Correspondence: ; Tel.: +90-212-338-1813
| |
Collapse
|
5
|
Mégret L, Mendoza C, Arrieta Lobo M, Brouillet E, Nguyen TTY, Bouaziz O, Chambaz A, Néri C. Precision machine learning to understand micro-RNA regulation in neurodegenerative diseases. Front Mol Neurosci 2022; 15:914830. [PMID: 36157078 PMCID: PMC9500540 DOI: 10.3389/fnmol.2022.914830] [Citation(s) in RCA: 4] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2022] [Accepted: 08/19/2022] [Indexed: 11/13/2022] Open
Abstract
Micro-RNAs (miRNAs) are short (∼21 nt) non-coding RNAs that regulate gene expression through the degradation or translational repression of mRNAs. Accumulating evidence points to a role of miRNA regulation in the pathogenesis of a wide range of neurodegenerative (ND) diseases such as, for example, Alzheimer’s disease, Parkinson’s disease, amyotrophic lateral sclerosis and Huntington disease (HD). Several systems level studies aimed to explore the role of miRNA regulation in NDs, but these studies remain challenging. Part of the problem may be related to the lack of sufficiently rich or homogeneous data, such as time series or cell-type-specific data obtained in model systems or human biosamples, to account for context dependency. Part of the problem may also be related to the methodological challenges associated with the accurate system-level modeling of miRNA and mRNA data. Here, we critically review the main families of machine learning methods used to analyze expression data, highlighting the added value of using shape-analysis concepts as a solution for precisely modeling highly dimensional miRNA and mRNA data such as the ones obtained in the study of the HD process, and elaborating on the potential of these concepts and methods for modeling complex omics data.
Collapse
Affiliation(s)
- Lucile Mégret
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
- *Correspondence: Lucile Mégret,
| | - Cloé Mendoza
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Maialen Arrieta Lobo
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Emmanuel Brouillet
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
| | - Thi-Thanh-Yen Nguyen
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Olivier Bouaziz
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Antoine Chambaz
- Université Paris Cité, MAP5 (Centre National de la Recherche Scientifique UMR 8145), Paris, France
| | - Christian Néri
- Sorbonne Université, Centre National de la Recherche Scientifique UMR 8256, Paris, France
- Christian Néri,
| |
Collapse
|
6
|
Targeting miRNA by Natural Products: A Novel Therapeutic Approach for Nonalcoholic Fatty Liver. EVIDENCE-BASED COMPLEMENTARY AND ALTERNATIVE MEDICINE 2021; 2021:6641031. [PMID: 34426744 PMCID: PMC8380168 DOI: 10.1155/2021/6641031] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/16/2020] [Accepted: 08/02/2021] [Indexed: 02/07/2023]
Abstract
The increasing prevalence of nonalcoholic fatty liver disease (NAFLD) as multifactorial chronic liver disease and the lack of a specific treatment have begun a new era in its treatment using gene expression changes and microRNAs. This study aimed to investigate the potential therapeutic effects of natural compounds in NAFLD by regulating miRNA expression. MicroRNAs play essential roles in regulating the cell's biological processes, such as apoptosis, migration, lipid metabolism, insulin resistance, and adipocyte differentiation, by controlling the posttranscriptional gene expression level. The impact of current NAFLD pharmacological management, including drug and biological therapies, is uncertain. In this context, various dietary fruits or medicinal herbal sources have received worldwide attention versus NAFLD development. Natural ingredients such as berberine, lychee pulp, grape seed, and rosemary possess protective and therapeutic effects against NAFLD by modifying the gene's expression and noncoding RNAs, especially miRNAs.
Collapse
|
7
|
Sarkar JP, Saha I, Lancucki A, Ghosh N, Wlasnowolski M, Bokota G, Dey A, Lipinski P, Plewczynski D. Identification of miRNA Biomarkers for Diverse Cancer Types Using Statistical Learning Methods at the Whole-Genome Scale. Front Genet 2020; 11:982. [PMID: 33281862 PMCID: PMC7691578 DOI: 10.3389/fgene.2020.00982] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2019] [Accepted: 08/03/2019] [Indexed: 11/13/2022] Open
Abstract
Genome-wide analysis of miRNA molecules can reveal important information for understanding the biology of cancer. Typically, miRNAs are used as features in statistical learning methods in order to train learning models to predict cancer. This motivates us to propose a method that integrates clustering and classification techniques for diverse cancer types with survival analysis via regression to identify miRNAs that can potentially play a crucial role in the prediction of different types of tumors. Our method has two parts. The first part is a feature selection procedure, called the stochastic covariance evolutionary strategy with forward selection (SCES-FS), which is developed by integrating stochastic neighbor embedding (SNE), the covariance matrix adaptation evolutionary strategy (CMA-ES), and classifiers, with the primary objective of selecting biomarkers. SNE is used to reorder the features by performing an implicit clustering with highly correlated neighboring features. A subset of features is selected heuristically to perform multi-class classification for diverse cancer types. In the second part of our method, the most important features identified in the first part are used to perform survival analysis via Cox regression, primarily to examine the effectiveness of the selected features. For this purpose, we have analyzed next generation sequencing data from The Cancer Genome Atlas in form of miRNA expression of 1,707 samples of 10 different cancer types and 333 normal samples. The SCES-FS method is compared with well-known feature selection methods and it is found to perform better in multi-class classification for the 17 selected miRNAs, achieving an accuracy of 96%. Moreover, the biological significance of the selected miRNAs is demonstrated with the help of network analysis, expression analysis using hierarchical clustering, KEGG pathway analysis, GO enrichment analysis, and protein-protein interaction analysis. Overall, the results indicate that the 17 selected miRNAs are associated with many key cancer regulators, such as MYC, VEGFA, AKT1, CDKN1A, RHOA, and PTEN, through their targets. Therefore the selected miRNAs can be regarded as putative biomarkers for 10 types of cancer.
Collapse
Affiliation(s)
- Jnanendra Prasad Sarkar
- Data, Analytics & AI, Larsen & Toubro Infotech Ltd., Pune, India
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Indrajit Saha
- Department of Computer Science and Engineering, National Institute of Technical Teachers' Training and Research, Kolkata, India
| | - Adrian Lancucki
- Computational Intelligence Research Group, Institute of Computer Science, University of Wroclaw, Wroclaw, Poland
| | - Nimisha Ghosh
- Department of Computer Science and Information Technology, SOA University, Bhubaneshwar, India
| | - Michal Wlasnowolski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
| | - Grzegorz Bokota
- Institute of Informatics, University of Warsaw, Warsaw, Poland
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
| | - Ashmita Dey
- Department of Computer Science & Engineering, Jadavpur University, Kolkata, India
| | - Piotr Lipinski
- Computational Intelligence Research Group, Institute of Computer Science, University of Wroclaw, Wroclaw, Poland
| | - Dariusz Plewczynski
- Faculty of Mathematics and Information Science, Warsaw University of Technology, Warsaw, Poland
- Centre of New Technologies, University of Warsaw, Warsaw, Poland
| |
Collapse
|
8
|
Karri K, Waxman DJ. Widespread Dysregulation of Long Noncoding Genes Associated With Fatty Acid Metabolism, Cell Division, and Immune Response Gene Networks in Xenobiotic-exposed Rat Liver. Toxicol Sci 2020; 174:291-310. [PMID: 31926019 PMCID: PMC7098378 DOI: 10.1093/toxsci/kfaa001] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/13/2022] Open
Abstract
Xenobiotic exposure dysregulates hundreds of protein-coding genes in mammalian liver, impacting many physiological processes and inducing diverse toxicological responses. Little is known about xenobiotic effects on long noncoding RNAs (lncRNAs), many of which have important regulatory functions. Here, we present a computational framework to discover liver-expressed, xenobiotic-responsive lncRNAs (xeno-lncs) with strong functional, gene regulatory potential and elucidate the impact of xenobiotic exposure on their gene regulatory networks. We assembled the long noncoding transcriptome of xenobiotic-exposed rat liver using RNA-seq datasets from male rats treated with 27 individual chemicals, representing 7 mechanisms of action (MOAs). Ortholog analysis was combined with coexpression data and causal inference methods to infer lncRNA function and deduce gene regulatory networks, including causal effects of lncRNAs on protein-coding gene expression and biological pathways. We discovered > 1400 liver-expressed xeno-lncs, many with human and/or mouse orthologs. Xenobiotics representing different MOAs often regulated common xeno-lnc targets: 123 xeno-lncs were dysregulated by ≥ 10 chemicals, and 5 xeno-lncs responded to ≥ 20 of the 27 chemicals investigated; 81 other xeno-lncs served as MOA-selective markers of xenobiotic exposure. Xeno-lnc-protein-coding gene coexpression regulatory network analysis identified xeno-lncs closely associated with exposure-induced perturbations of hepatic fatty acid metabolism, cell division, or immune response pathways, and with apoptosis or cirrhosis. We also identified hub and bottleneck lncRNAs, which are expected to be key regulators of gene expression. This work elucidates extensive networks of xeno-lnc-protein-coding gene interactions and provides a framework for understanding the widespread transcriptome-altering actions of foreign chemicals in a key-responsive mammalian tissue.
Collapse
Affiliation(s)
- Kritika Karri
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts
| | - David J Waxman
- Department of Biology and Bioinformatics Program, Boston University, Boston, Massachusetts
| |
Collapse
|
9
|
Mégret L, Nair SS, Dancourt J, Aaronson J, Rosinski J, Neri C. Combining feature selection and shape analysis uncovers precise rules for miRNA regulation in Huntington's disease mice. BMC Bioinformatics 2020; 21:75. [PMID: 32093602 PMCID: PMC7041117 DOI: 10.1186/s12859-020-3418-9] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/29/2020] [Accepted: 02/17/2020] [Indexed: 12/12/2022] Open
Abstract
Background MicroRNA (miRNA) regulation is associated with several diseases, including neurodegenerative diseases. Several approaches can be used for modeling miRNA regulation. However, their precision may be limited for analyzing multidimensional data. Here, we addressed this question by integrating shape analysis and feature selection into miRAMINT, a methodology that we used for analyzing multidimensional RNA-seq and proteomic data from a knock-in mouse model (Hdh mice) of Huntington’s disease (HD), a disease caused by CAG repeat expansion in huntingtin (htt). This dataset covers 6 CAG repeat alleles and 3 age points in the striatum and cortex of Hdh mice. Results Remarkably, compared to previous analyzes of this multidimensional dataset, the miRAMINT approach retained only 31 explanatory striatal miRNA-mRNA pairs that are precisely associated with the shape of CAG repeat dependence over time, among which 5 pairs with a strong change of target expression levels. Several of these pairs were previously associated with neuronal homeostasis or HD pathogenesis, or both. Such miRNA-mRNA pairs were not detected in cortex. Conclusions These data suggest that miRNA regulation has a limited global role in HD while providing accurately-selected miRNA-target pairs to study how the brain may compute molecular responses to HD over time. These data also provide a methodological framework for researchers to explore how shape analysis can enhance multidimensional data analytics in biology and disease.
Collapse
Affiliation(s)
- Lucile Mégret
- Sorbonne Université, CNRS UMR8256, INSERM ERL U1164, Brain-C Lab, Paris, France.
| | | | - Julia Dancourt
- Sorbonne Université, CNRS UMR8256, INSERM ERL U1164, Brain-C Lab, Paris, France
| | | | | | - Christian Neri
- Sorbonne Université, CNRS UMR8256, INSERM ERL U1164, Brain-C Lab, Paris, France.
| |
Collapse
|
10
|
Asiaee A, Abrams ZB, Nakayiza S, Sampath D, Coombes KR. Explaining Gene Expression Using Twenty-One MicroRNAs. J Comput Biol 2019; 27:1157-1170. [PMID: 31794247 DOI: 10.1089/cmb.2019.0321] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/19/2022] Open
Abstract
The transcriptome of a tumor contains detailed information about the disease. Although advances in sequencing technologies have generated larger data sets, there are still many questions about exactly how the transcriptome is regulated. One class of regulatory elements consists of microRNAs (or miRs), many of which are known to be associated with cancer. To better understand the relationships between miRs and cancers, we analyzed ∼9000 samples from 32 cancer types studied in The Cancer Genome Atlas. Our feature reduction algorithm found evidence for 21 biologically interpretable clusters of miRs, many of which were statistically associated with a specific type of cancer. Moreover, the clusters contain sufficient information to distinguish between most types of cancer. We then used linear models to measure, genome-wide, how much variation in gene expression could be explained by the 21 average expression values ("scores") of the clusters. Based on the ∼20,000 per-gene R2 values, we found that (1) mean differences between tissues of origin explain about 36% of variation; (2) the 21 miR cluster scores explain about 30% of the variation; and (3) combining tissue type with the miR scores explained about 56% of the total genome-wide variation in gene expression. Our analysis of poorly explained genes shows that they are enriched for olfactory receptor processes, sensory perception, and nervous system processing, which are necessary to receive and interpret signals from outside the organism. Therefore, it is reasonable for those genes to be always active and not get downregulated by miRs. In contrast, highly explained genes are characterized by genes translating to proteins necessary for transport, plasma membrane, or metabolic processes that are heavily regulated processes inside the cell. Other genetic regulatory elements such as transcription factors and methylation might help explain some of the remaining variation in gene expression.
Collapse
Affiliation(s)
- Amir Asiaee
- Mathematical Biosciences Institute, The Ohio State University, Columbus, Ohio, USA
| | - Zachary B Abrams
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Samantha Nakayiza
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| | - Deepa Sampath
- Division of Hematology, Department of Internal Medicine, The Ohio State University, Columbus, Ohio, USA
| | - Kevin R Coombes
- Department of Biomedical Informatics, The Ohio State University, Columbus, Ohio, USA
| |
Collapse
|
11
|
Zárybnický T, Matoušková P, Ambrož M, Šubrt Z, Skálová L, Boušová I. The Selection and Validation of Reference Genes for mRNA and microRNA Expression Studies in Human Liver Slices Using RT-qPCR. Genes (Basel) 2019; 10:genes10100763. [PMID: 31569378 PMCID: PMC6826422 DOI: 10.3390/genes10100763] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2019] [Revised: 09/25/2019] [Accepted: 09/27/2019] [Indexed: 01/06/2023] Open
Abstract
The selection of a suitable combination of reference genes (RGs) for data normalization is a crucial step for obtaining reliable and reproducible results from transcriptional response analysis using a reverse transcription-quantitative polymerase chain reaction. This is especially so if a three-dimensional multicellular model prepared from liver tissues originating from biologically diverse human individuals is used. The mRNA and miRNA RGs stability were studied in thirty-five human liver tissue samples and twelve precision-cut human liver slices (PCLS) treated for 24 h with dimethyl sulfoxide (controls) and PCLS treated with β-naphthoflavone (10 µM) or rifampicin (10 µM) as cytochrome P450 (CYP) inducers. Validation of RGs was performed by an expression analysis of CYP3A4 and CYP1A2 on rifampicin and β-naphthoflavone induction, respectively. Regarding mRNA, the best combination of RGs for the controls was YWHAZ and B2M, while YWHAZ and ACTB were selected for the liver samples and treated PCLS. Stability of all candidate miRNA RGs was comparable or better than that of generally used short non-coding RNA U6. The best combination for the control PCLS was miR-16-5p and miR-152-3p, in contrast to the miR-16-5b and miR-23b-3p selected for the treated PCLS. Our results showed that the candidate RGs were rather stable, especially for miRNA in human PCLS.
Collapse
Affiliation(s)
- Tomáš Zárybnický
- Department of Biochemical Sciences, Charles University, Faculty of Pharmacy in Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| | - Petra Matoušková
- Department of Biochemical Sciences, Charles University, Faculty of Pharmacy in Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| | - Martin Ambrož
- Department of Biochemical Sciences, Charles University, Faculty of Pharmacy in Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| | - Zdeněk Šubrt
- Department of General Surgery, Third Faculty of Medicine and University Hospital Královské Vinohrady, Charles University, 100 34 Prague, Czech Republic.
- Department of Surgery, University Hospital Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| | - Lenka Skálová
- Department of Biochemical Sciences, Charles University, Faculty of Pharmacy in Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| | - Iva Boušová
- Department of Biochemical Sciences, Charles University, Faculty of Pharmacy in Hradec Králové, 500 05 Hradec Králové, Czech Republic.
| |
Collapse
|
12
|
Le TD, Hoang T, Li J, Liu L, Liu H, Hu S. A Fast PC Algorithm for High Dimensional Causal Discovery with Multi-Core PCs. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2019; 16:1483-1495. [PMID: 27429444 DOI: 10.1109/tcbb.2016.2591526] [Citation(s) in RCA: 20] [Impact Index Per Article: 3.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
Discovering causal relationships from observational data is a crucial problem and it has applications in many research areas. The PC algorithm is the state-of-the-art constraint based method for causal discovery. However, runtime of the PC algorithm, in the worst-case, is exponential to the number of nodes (variables), and thus it is inefficient when being applied to high dimensional data, e.g., gene expression datasets. On another note, the advancement of computer hardware in the last decade has resulted in the widespread availability of multi-core personal computers. There is a significant motivation for designing a parallelized PC algorithm that is suitable for personal computers and does not require end users' parallel computing knowledge beyond their competency in using the PC algorithm. In this paper, we develop parallel-PC, a fast and memory efficient PC algorithm using the parallel computing technique. We apply our method to a range of synthetic and real-world high dimensional datasets. Experimental results on a dataset from the DREAM 5 challenge show that the original PC algorithm could not produce any results after running more than 24 hours; meanwhile, our parallel-PC algorithm managed to finish within around 12 hours with a 4-core CPU computer, and less than six hours with a 8-core CPU computer. Furthermore, we integrate parallel-PC into a causal inference method for inferring miRNA-mRNA regulatory relationships. The experimental results show that parallel-PC helps improve both the efficiency and accuracy of the causal inference algorithm.
Collapse
|
13
|
CRISPR/Cas9 genome editing of SLC37A4 gene elucidates the role of molecular markers of endoplasmic reticulum stress and apoptosis in renal involvement in glycogen storage disease type Ib. Gene 2019; 703:17-25. [DOI: 10.1016/j.gene.2019.04.002] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/26/2018] [Revised: 03/29/2019] [Accepted: 04/01/2019] [Indexed: 12/11/2022]
|
14
|
Pham VV, Zhang J, Liu L, Truong B, Xu T, Nguyen TT, Li J, Le TD. Identifying miRNA-mRNA regulatory relationships in breast cancer with invariant causal prediction. BMC Bioinformatics 2019; 20:143. [PMID: 30876399 PMCID: PMC6419852 DOI: 10.1186/s12859-019-2668-x] [Citation(s) in RCA: 19] [Impact Index Per Article: 3.2] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/23/2018] [Accepted: 02/05/2019] [Indexed: 12/14/2022] Open
Abstract
BACKGROUND microRNAs (miRNAs) regulate gene expression at the post-transcriptional level and they play an important role in various biological processes in the human body. Therefore, identifying their regulation mechanisms is essential for the diagnostics and therapeutics for a wide range of diseases. There have been a large number of researches which use gene expression profiles to resolve this problem. However, the current methods have their own limitations. Some of them only identify the correlation of miRNA and mRNA expression levels instead of the causal or regulatory relationships while others infer the causality but with a high computational complexity. To overcome these issues, in this study, we propose a method to identify miRNA-mRNA regulatory relationships in breast cancer using the invariant causal prediction. The key idea of invariant causal prediction is that the cause miRNAs of their target mRNAs are the ones which have persistent causal relationships with the target mRNAs across different environments. RESULTS In this research, we aim to find miRNA targets which are consistent across different breast cancer subtypes. Thus, first of all, we apply the Pam50 method to categorize BRCA samples into different "environment" groups based on different cancer subtypes. Then we use the invariant causal prediction method to find miRNA-mRNA regulatory relationships across subtypes. We validate the results with the miRNA-transfected experimental data and the results show that our method outperforms the state-of-the-art methods. In addition, we also integrate this new method with the Pearson correlation analysis method and Lasso in an ensemble method to take the advantages of these methods. We then validate the results of the ensemble method with the experimentally confirmed data and the ensemble method shows the best performance, even comparing to the proposed causal method. CONCLUSIONS This research found miRNA targets which are consistent across different breast cancer subtypes. Further functional enrichment analysis shows that miRNAs involved in the regulatory relationships predicated by the proposed methods tend to synergistically regulate target genes, indicating the usefulness of these methods, and the identified miRNA targets could be used in the design of wet-lab experiments to discover the causes of breast cancer.
Collapse
Affiliation(s)
- Vu Vh Pham
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
| | - Junpeng Zhang
- School of Engineering, Dali University, Dali, Yunnan, China
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
| | - Buu Truong
- Pham Ngoc Thach University of Medicine, Ho Chi Minh, Vietnam
| | - Taosheng Xu
- Institute of Intelligent Machines, Heifei Institutes of Physical Science, Chinese Academy of Sciences, Hefei, China
| | - Trung T Nguyen
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia
| | - Thuc D Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, Australia.
| |
Collapse
|
15
|
Causal discovery from sequential data in ALS disease based on entropy criteria. J Biomed Inform 2019; 89:41-55. [DOI: 10.1016/j.jbi.2018.10.004] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/30/2018] [Revised: 10/14/2018] [Accepted: 10/15/2018] [Indexed: 11/21/2022]
|
16
|
Wang Q, Liu J, Chen Z, Li F, Yu H. A causation-based method developed for an integrated risk assessment of heavy metals in soil. THE SCIENCE OF THE TOTAL ENVIRONMENT 2018; 642:1396-1405. [PMID: 30045520 DOI: 10.1016/j.scitotenv.2018.06.118] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 03/26/2018] [Revised: 06/08/2018] [Accepted: 06/10/2018] [Indexed: 06/08/2023]
Abstract
A comprehensive and fact-based risk assessment of heavy metals in soils is paramount for defining strategies for environmental management. However, the risk assessment approaches of heavy metals in soils are often incomplete, in particular, causation-based pollution source apportionment is absent at present. Here, we developed a causation-based method framework of an integrated risk assessment of soil heavy metals. This method framework involves risk identification, causation-based source apportionment and an environmental sensitivity assessment. Dongtang Township in Guangdong Province, China was used as a case study. We found that air Cd, the background value and metallurgical industries (Danxia and Fankou plants) were identified as the major causes of soil Cd, and air and soil Cd as well as water Cd interacted causally. Danxia and Fankou plants, the mining area and background value were the major causes of soil Pb. The risk level and environmental sensitivity of the Danxia and Fankou plants were assessed. This is the first study to establish a causation-based method framework of an integrated risk assessment of soil heavy metals. This framework promotes systematic integration of risk assessment of soil heavy metals and expands traditional research on pollution source apportionment from a correlation-based approach to crucial insights into causation.
Collapse
Affiliation(s)
- Qi Wang
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China
| | - Jianfeng Liu
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Zhao Chen
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China
| | - Fangbai Li
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China.
| | - Huanyun Yu
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China
| |
Collapse
|
17
|
Luo J, Huang W, Cao B. A novel approach to identify the miRNA-mRNA causal regulatory modules in Cancer. IEEE/ACM TRANSACTIONS ON COMPUTATIONAL BIOLOGY AND BIOINFORMATICS 2018; 15:309-315. [PMID: 28113985 DOI: 10.1109/tcbb.2016.2612199] [Citation(s) in RCA: 8] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/06/2023]
Abstract
MicroRNAs (miRNAs) play an essential role in many biological processes by regulating the target genes, especially in the initiation and development of cancers. Therefore, the identification of the miRNA-mRNA regulatory modules is important for understanding the regulatory mechanisms. Most computational methods only used statistical correlations in predicting miRNA-mRNA modules, and neglected the fact there are causal relationships between miRNAs and their target genes. In this paper, we propose a novel approach called CALM(the causal regulatory modules) to identify the miRNA-mRNA regulatory modules through integrating the causal interactions and statistical correlations between the miRNAs and their target genes. Our algorithm largely consists of three steps: it first forms the causal regulatory relationships of miRNAs and genes from gene expression profiles and detects the miRNA clusters according to the GO function information of their target genes, then expands each miRNA cluster by greedy adding(discarding) the target genes to maximize the modularity score. To show the performance of our method, we apply CALM on four datasets including EMT, breast, ovarian, thyroid cancer and validate our results. The experiment results show that our method can not only outperform the compared method, but also achieve ideal overall performance in terms of the functional enrichment.
Collapse
|
18
|
Liu Y, Du Q, Wang Q, Yu H, Liu J, Tian Y, Chang C, Lei J. Causal inference between bioavailability of heavy metals and environmental factors in a large-scale region. ENVIRONMENTAL POLLUTION (BARKING, ESSEX : 1987) 2017; 226:370-378. [PMID: 28457732 DOI: 10.1016/j.envpol.2017.03.019] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 11/30/2016] [Revised: 02/16/2017] [Accepted: 03/08/2017] [Indexed: 06/07/2023]
Abstract
The causation between bioavailability of heavy metals and environmental factors are generally obtained from field experiments at local scales at present, and lack sufficient evidence from large scales. However, inferring causation between bioavailability of heavy metals and environmental factors across large-scale regions is challenging. Because the conventional correlation-based approaches used for causation assessments across large-scale regions, at the expense of actual causation, can result in spurious insights. In this study, a general approach framework, Intervention calculus when the directed acyclic graph (DAG) is absent (IDA) combined with the backdoor criterion (BC), was introduced to identify causation between the bioavailability of heavy metals and the potential environmental factors across large-scale regions. We take the Pearl River Delta (PRD) in China as a case study. The causal structures and effects were identified based on the concentrations of heavy metals (Zn, As, Cu, Hg, Pb, Cr, Ni and Cd) in soil (0-20 cm depth) and vegetable (lettuce) and 40 environmental factors (soil properties, extractable heavy metals and weathering indices) in 94 samples across the PRD. Results show that the bioavailability of heavy metals (Cd, Zn, Cr, Ni and As) was causally influenced by soil properties and soil weathering factors, whereas no causal factor impacted the bioavailability of Cu, Hg and Pb. No latent factor was found between the bioavailability of heavy metals and environmental factors. The causation between the bioavailability of heavy metals and environmental factors at field experiments is consistent with that on a large scale. The IDA combined with the BC provides a powerful tool to identify causation between the bioavailability of heavy metals and environmental factors across large-scale regions. Causal inference in a large system with the dynamic changes has great implications for system-based risk management.
Collapse
Affiliation(s)
- Yuqiong Liu
- School of Resource and Environmental Science, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China; Hunan Hydro&Power Design Institute, Changsha, 410007, China
| | - Qingyun Du
- School of Resource and Environmental Science, Wuhan University, 129 Luoyu Road, Wuhan 430079, China; Key Laboratory of Geographic Information System, Ministry of Education, Wuhan University, 129 Luoyu Road, Wuhan 430079, China.
| | - Qi Wang
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China.
| | - Huanyun Yu
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China
| | - Jianfeng Liu
- Research Institute of Forestry, Chinese Academy of Forestry, Beijing 100091, China
| | - Yu Tian
- Guangdong Key Laboratory of Integrated Agro-environmental Pollution Control and Management, Guangdong Institute of Eco-environmental Science & Technology, Guangzhou 510650, China
| | - Chunying Chang
- Guangdong Provincial Academy of Environmental Science, Guangzhou 510045, China
| | - Jing Lei
- College of Agriculture, Guangxi University, Nanning 530005, China
| |
Collapse
|
19
|
Abstract
Background MicroRNAs (miRNAs) play important regulatory roles in the wide range of biological processes by inducing target mRNA degradation or translational repression. Based on the correlation between expression profiles of a miRNA and its target mRNA, various computational methods have previously been proposed to identify miRNA-mRNA association networks by incorporating the matched miRNA and mRNA expression profiles. However, there remain three major issues to be resolved in the conventional computation approaches for inferring miRNA-mRNA association networks from expression profiles. 1) Inferred correlations from the observed expression profiles using conventional correlation-based methods include numerous erroneous links or over-estimated edge weight due to the transitive information flow among direct associations. 2) Due to the high-dimension-low-sample-size problem on the microarray dataset, it is difficult to obtain an accurate and reliable estimate of the empirical correlations between all pairs of expression profiles. 3) Because the previously proposed computational methods usually suffer from varying performance across different datasets, a more reliable model that guarantees optimal or suboptimal performance across different datasets is highly needed. Results In this paper, we present DMirNet, a new framework for identifying direct miRNA-mRNA association networks. To tackle the aforementioned issues, DMirNet incorporates 1) three direct correlation estimation methods (namely Corpcor, SPACE, Network deconvolution) to infer direct miRNA-mRNA association networks, 2) the bootstrapping method to fully utilize insufficient training expression profiles, and 3) a rank-based Ensemble aggregation to build a reliable and robust model across different datasets. Our empirical experiments on three datasets demonstrate the combinatorial effects of necessary components in DMirNet. Additional performance comparison experiments show that DMirNet outperforms the state-of-the-art Ensemble-based model [1] which has shown the best performance across the same three datasets, with a factor of up to 1.29. Further, we identify 43 putative novel multi-cancer-related miRNA-mRNA association relationships from an inferred Top 1000 direct miRNA-mRNA association network. Conclusions We believe that DMirNet is a promising method to identify novel direct miRNA-mRNA relations and to elucidate the direct miRNA-mRNA association networks. Since DMirNet infers direct relationships from the observed data, DMirNet can contribute to reconstructing various direct regulatory pathways, including, but not limited to, the direct miRNA-mRNA association networks. Electronic supplementary material The online version of this article (doi:10.1186/s12918-016-0373-1) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Minsu Lee
- Department of Computer Science and Engineering, Ewha Womans University, Seoul, South Korea
| | - HyungJune Lee
- Department of Computer Science and Engineering, Ewha Womans University, Seoul, South Korea.
| |
Collapse
|
20
|
Walsh CJ, Hu P, Batt J, Dos Santos CC. Discovering MicroRNA-Regulatory Modules in Multi-Dimensional Cancer Genomic Data: A Survey of Computational Methods. Cancer Inform 2016; 15:25-42. [PMID: 27721651 PMCID: PMC5051584 DOI: 10.4137/cin.s39369] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/18/2016] [Revised: 08/14/2016] [Accepted: 08/16/2016] [Indexed: 12/20/2022] Open
Abstract
MicroRNAs (miRs) are small single-stranded noncoding RNA that function in RNA silencing and post-transcriptional regulation of gene expression. An increasing number of studies have shown that miRs play an important role in tumorigenesis, and understanding the regulatory mechanism of miRs in this gene regulatory network will help elucidate the complex biological processes at play during malignancy. Despite advances, determination of miR–target interactions (MTIs) and identification of functional modules composed of miRs and their specific targets remain a challenge. A large amount of data generated by high-throughput methods from various sources are available to investigate MTIs. The development of data-driven tools to harness these multi-dimensional data has resulted in significant progress over the past decade. In parallel, large-scale cancer genomic projects are allowing new insights into the commonalities and disparities of miR–target regulation across cancers. In the first half of this review, we explore methods for identification of pairwise MTIs, and in the second half, we explore computational tools for discovery of miR-regulatory modules in a cancer-specific and pan-cancer context. We highlight strengths and limitations of each of these tools as a practical guide for the computational biologists.
Collapse
Affiliation(s)
- Christopher J Walsh
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Pingzhao Hu
- Department of Biochemistry and Medical Genetics, University of Manitoba, Winnipeg, MB, Canada
| | - Jane Batt
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| | - Claudia C Dos Santos
- Keenan and Li Ka Shing Knowledge Institute of Saint Michael's Hospital, Toronto, ON, Canada.; Institute of Medical Sciences and Department of Medicine, University of Toronto, Toronto, ON, Canada
| |
Collapse
|
21
|
Le TD, Zhang J, Liu L, Liu H, Li J. miRLAB: An R Based Dry Lab for Exploring miRNA-mRNA Regulatory Relationships. PLoS One 2015; 10:e0145386. [PMID: 26716983 PMCID: PMC4696828 DOI: 10.1371/journal.pone.0145386] [Citation(s) in RCA: 25] [Impact Index Per Article: 2.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/11/2015] [Accepted: 12/03/2015] [Indexed: 11/19/2022] Open
Abstract
microRNAs (miRNAs) are important gene regulators at post-transcriptional level, and inferring miRNA-mRNA regulatory relationships is a crucial problem. Consequently, several computational methods of predicting miRNA targets have been proposed using expression data with or without sequence based miRNA target information. A typical procedure for applying and evaluating such a method is i) collecting matched miRNA and mRNA expression profiles in a specific condition, e.g. a cancer dataset from The Cancer Genome Atlas (TCGA), ii) applying the new computational method to the selected dataset, iii) validating the predictions against knowledge from literature and third-party databases, and comparing the performance of the method with some existing methods. This procedure is time consuming given the time elapsed when collecting and processing data, repeating the work from existing methods, searching for knowledge from literature and third-party databases to validate the results, and comparing the results from different methods. The time consuming procedure prevents researchers from quickly testing new computational models, analysing new datasets, and selecting suitable methods for assisting with the experiment design. Here, we present an R package, miRLAB, for automating the procedure of inferring and validating miRNA-mRNA regulatory relationships. The package provides a complete set of pipelines for testing new methods and analysing new datasets. miRLAB includes a pipeline to obtain matched miRNA and mRNA expression datasets directly from TCGA, 12 benchmark computational methods for inferring miRNA-mRNA regulatory relationships, the functions for validating the predictions using experimentally validated miRNA target data and miRNA perturbation data, and the tools for comparing the results from different computational methods.
Collapse
Affiliation(s)
- Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
- * E-mail: (TDL); (JL)
| | - Junpeng Zhang
- Faculty of Engineering, Dali University, Dali, China
| | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
| | - Huawen Liu
- Department of Computer Science, Zhejiang Normal University, China
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
- * E-mail: (TDL); (JL)
| |
Collapse
|
22
|
Le TD, Zhang J, Liu L, Li J. Ensemble Methods for MiRNA Target Prediction from Expression Data. PLoS One 2015; 10:e0131627. [PMID: 26114448 PMCID: PMC4482624 DOI: 10.1371/journal.pone.0131627] [Citation(s) in RCA: 27] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/21/2015] [Accepted: 06/04/2015] [Indexed: 01/23/2023] Open
Abstract
Background microRNAs (miRNAs) are short regulatory RNAs that are involved in several diseases, including cancers. Identifying miRNA functions is very important in understanding disease mechanisms and determining the efficacy of drugs. An increasing number of computational methods have been developed to explore miRNA functions by inferring the miRNA-mRNA regulatory relationships from data. Each of the methods is developed based on some assumptions and constraints, for instance, assuming linear relationships between variables. For such reasons, computational methods are often subject to the problem of inconsistent performance across different datasets. On the other hand, ensemble methods integrate the results from individual methods and have been proved to outperform each of their individual component methods in theory. Results In this paper, we investigate the performance of some ensemble methods over the commonly used miRNA target prediction methods. We apply eight different popular miRNA target prediction methods to three cancer datasets, and compare their performance with the ensemble methods which integrate the results from each combination of the individual methods. The validation results using experimentally confirmed databases show that the results of the ensemble methods complement those obtained by the individual methods and the ensemble methods perform better than the individual methods across different datasets. The ensemble method, Pearson+IDA+Lasso, which combines methods in different approaches, including a correlation method, a causal inference method, and a regression method, is the best performed ensemble method in this study. Further analysis of the results of this ensemble method shows that the ensemble method can obtain more targets which could not be found by any of the single methods, and the discovered targets are more statistically significant and functionally enriched. The source codes, datasets, miRNA target predictions by all methods, and the ground truth for validation are available in the Supplementary materials.
Collapse
Affiliation(s)
- Thuc Duy Le
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
- * E-mail: (TDL), (JL)
| | | | - Lin Liu
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
| | - Jiuyong Li
- School of Information Technology and Mathematical Sciences, University of South Australia, Adelaide, South Australia, Australia
- * E-mail: (TDL), (JL)
| |
Collapse
|