Reference Citation Analysis: Find an Article, Find a Category, Find a Journal, Find a Scholar

For: Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA 2018;24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]

For:	Johnson NT, Dhroso A, Hughes KJ, Korkin D. Biological classification with RNA-seq data: Can alternatively spliced transcript expression enhance machine learning classifiers? RNA 2018;24:1119-1132. [PMID: 29941426 PMCID: PMC6097660 DOI: 10.1261/rna.062802.117] [Citation(s) in RCA: 24] [Impact Index Per Article: 3.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2017] [Accepted: 06/03/2018] [Indexed: 05/09/2023]

Number

Cited by Other Article(s)

Kucukakcali Z, Akbulut S, Colak C. Prediction of genomic biomarkers for endometriosis using the transcriptomic dataset. World J Clin Cases 2025;13:104556. [DOI: 10.12998/wjcc.v13.i20.104556] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 12/23/2024] [Revised: 03/03/2025] [Accepted: 03/13/2025] [Indexed: 04/09/2025] Open

Abstract

BACKGROUND

Endometriosis is a clinical condition characterized by the presence of endometrial glands outside the uterine cavity. While its incidence remains mostly uncertain, endometriosis impacts around 180 million women worldwide. Despite the presentation of several epidemiological and clinical explanations, the precise mechanism underlying the disease remains ambiguous. In recent years, researchers have examined the hereditary dimension of the disease. Genetic research has aimed to discover the gene or genes responsible for the disease through association or linkage studies involving candidate genes or DNA mapping techniques.

AIM

To identify genetic biomarkers linked to endometriosis by the application of machine learning (ML) approaches.

METHODS

This case-control study accounted for the open-access transcriptomic data set of endometriosis and the control group. We included data from 22 controls and 16 endometriosis patients for this purpose. We used AdaBoost, XGBoost, Stochasting Gradient Boosting, Bagged Classification and Regression Trees (CART) for classification using five-fold cross validation. We evaluated the performance of the models using the performance measures of accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score.

RESULTS

Bagged CART gave the best classification metrics. The metrics obtained from this model are 85.7%, 85.7%, 100%, 75%, 75%, 100% and 85.7% for accuracy, balanced accuracy, sensitivity, specificity, positive predictive value, negative predictive value and F1 score, respectively. Based on the variable importance of modeling, we can use the genes CUX2, CLMP, CEP131, EHD4, CDH24, ILRUN, LINC01709, HOTAIR, SLC30A2 and NKG7 and other transcripts with inaccessible gene names as potential biomarkers for endometriosis.

CONCLUSION

This study determined possible genomic biomarkers for endometriosis using transcriptomic data from patients with/without endometriosis. The applied ML model successfully classified endometriosis and created a highly accurate diagnostic prediction model. Future genomic studies could explain the underlying pathology of endometriosis, and a non-invasive diagnostic method could replace the invasive ones.

Collapse

Sanches PHG, de Melo NC, Porcari AM, de Carvalho LM. Integrating Molecular Perspectives: Strategies for Comprehensive Multi-Omics Integrative Data Analysis and Machine Learning Applications in Transcriptomics, Proteomics, and Metabolomics. BIOLOGY 2024;13:848. [PMID: 39596803 PMCID: PMC11592251 DOI: 10.3390/biology13110848] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Subscribe] [Scholar Register] [Received: 06/30/2024] [Revised: 07/19/2024] [Accepted: 07/25/2024] [Indexed: 11/29/2024]

Ambeskovic A, McCall MN, Woodsmith J, Juhl H, Land H. Exon-Skipping-Based Subtyping of Colorectal Cancers. Gastroenterology 2024:S0016-5085(24)05357-5. [PMID: 39181169 DOI: 10.1053/j.gastro.2024.08.016] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 06/21/2023] [Revised: 07/24/2024] [Accepted: 08/14/2024] [Indexed: 08/27/2024]

Abstract

BACKGROUND & AIMS

The identification of colorectal cancer (CRC) molecular subtypes has prognostic and potentially diagnostic value for patients, yet reliable subtyping remains unavailable in the clinic. The current consensus molecular subtype (CMS) classification in CRCs is based on complex RNA expression patterns quantified at the gene level. The clinical application of these methods, however, is challenging due to high uncertainty of single-sample classification and associated costs. Alternative splicing, which strongly contributes to transcriptome diversity, has rarely been used for tissue type classification. Here, we present an AS-based CRC subtyping framework sensitive to differential exon use that can be adapted for clinical application.

METHODS

Unsupervised clustering was used to measure the strength of association between different categories of alternative splicing and CMSs. To build a classifier, the ground truth for CMS labels was derived from expression data quantified at the gene level. Feature selection was achieved through bootstrapping and L1-penalized estimation. The resulting feature space was used to construct a subtype prediction framework applicable to single and multiple samples. The performance of the models was evaluated on unseen CRCs from 2 independent sources (Indivumed, n = 129; The Cancer Genome Atlas, n = 99).

RESULTS

We developed a CRC subtype identifier based on 29 exon-skipping events that accurately classifies unseen tumors and enables more precise differentiation of subtypes characterized by distinct biological and prognostic features as compared to classifiers based on gene expression.

CONCLUSIONS

Here, we demonstrate that a small number of exon-skipping events can reliably classify CRC subtypes using individual patient specimens in a manner suitable to clinical application.

Collapse

Feng S, Wang Z, Jin Y, Xu S. TabDEG: Classifying differentially expressed genes from RNA-seq data based on feature extraction and deep learning framework. PLoS One 2024;19:e0305857. [PMID: 39037985 PMCID: PMC11262683 DOI: 10.1371/journal.pone.0305857] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/21/2023] [Accepted: 06/05/2024] [Indexed: 07/24/2024] Open

Ilangovan H, Kothiyal P, Hoadley KA, Elgart R, Eley G, Eslami P. Harmonizing heterogeneous transcriptomics datasets for machine learning-based analysis to identify spaceflown murine liver-specific changes. NPJ Microgravity 2024;10:61. [PMID: 38862523 PMCID: PMC11167036 DOI: 10.1038/s41526-024-00379-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/28/2023] [Accepted: 03/08/2024] [Indexed: 06/13/2024] Open

Vural-Ozdeniz M, Calisir K, Acar R, Yavuz A, Ozgur MM, Dalgıc E, Konu O. CAP-RNAseq: an integrated pipeline for functional annotation and prioritization of co-expression clusters. Brief Bioinform 2024;25:bbad536. [PMID: 38279653 PMCID: PMC10818169 DOI: 10.1093/bib/bbad536] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2023] [Revised: 12/04/2023] [Accepted: 12/21/2024] [Indexed: 01/28/2024] Open

Cascianelli S, Galzerano A, Masseroli M. Supervised Relevance-Redundancy assessments for feature selection in omics-based classification scenarios. J Biomed Inform 2023;144:104457. [PMID: 37488024 DOI: 10.1016/j.jbi.2023.104457] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2023] [Revised: 06/05/2023] [Accepted: 07/19/2023] [Indexed: 07/26/2023]

Abstract

BACKGROUND AND OBJECTIVE

Many classification tasks in translational bioinformatics and genomics are characterized by the high dimensionality of potential features and unbalanced sample distribution among classes. This can affect classifier robustness and increase the risk of overfitting, curse of dimensionality and generalization leaks; furthermore and most importantly, this can prevent obtaining adequate patient stratification required for precision medicine in facing complex diseases, like cancer. Setting up a feature selection strategy able to extract only proper predictive features by removing irrelevant, redundant, and noisy ones is crucial to achieving valuable results on the desired task.

METHODS

We propose a new feature selection approach, called ReRa, based on supervised Relevance-Redundancy assessments. ReRa consists of a customized step of relevance-based filtering, to identify a reduced subset of meaningful features, followed by a supervised similarity-based procedure to minimize redundancy. This latter step innovatively uses a combination of global and class-specific similarity assessments to remove redundant features while preserving those differentiated across classes, even when these classes are strongly unbalanced.

RESULTS

We compared ReRa with several existing feature selection methods to obtain feature spaces on which performing breast cancer patient subtyping using several classifiers: we considered two use cases based on gene or transcript isoform expression. In the vast majority of the assessed scenarios, when using ReRa-selected feature spaces, the performances were significantly increased compared to simple feature filtering, LASSO regularization, or even MRmr - another Relevance-Redundancy method. The two use cases represent an insightful example of translational application, taking advantage of ReRa capabilities to investigate and enhance a clinically-relevant patient stratification task, which could be easily applied also to other cancer types and diseases.

CONCLUSIONS

ReRa approach has the potential to improve the performance of machine learning models used in an unbalanced classification scenario. Compared to another Relevance-Redundancy approach like MRmr, ReRa does not require tuning the number of preserved features, ensures efficiency and scalability over huge initial dimensionalities and allows re-evaluation of all previously selected features at each iteration of the redundancy assessment, to ultimately preserve only the most relevant and class-differentiated features.

Collapse

Shakola F, Palejev D, Ivanov I. A Framework for Comparison and Assessment of Synthetic RNA-Seq Data. Genes (Basel) 2022;13:2362. [PMID: 36553629 PMCID: PMC9778097 DOI: 10.3390/genes13122362] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/09/2022] [Revised: 12/05/2022] [Accepted: 12/06/2022] [Indexed: 12/16/2022] Open

Abdelwahab O, Awad N, Elserafy M, Badr E. A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma. PLoS One 2022;17:e0269126. [PMID: 36067196 PMCID: PMC9447897 DOI: 10.1371/journal.pone.0269126] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/14/2021] [Accepted: 05/15/2022] [Indexed: 12/23/2022] Open

Wang S, Li M, Ng SB. Research on Infant Health Diagnosis and Intelligence Development Based on Machine Learning and Health Information Statistics. Front Public Health 2022;10:846598. [PMID: 35719653 PMCID: PMC9201248 DOI: 10.3389/fpubh.2022.846598] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/31/2021] [Accepted: 02/22/2022] [Indexed: 11/18/2022] Open

Pramana S, Hardiyanta IKY, Hidayat FY, Mariyah S. A comparative assessment on gene expression classification methods of RNA-seq data generated using next-generation sequencing (NGS). NARRA J 2022;2:e60. [PMID: 38450388 PMCID: PMC10914053 DOI: 10.52225/narra.v2i1.60] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 11/13/2021] [Accepted: 03/22/2022] [Indexed: 03/08/2024]

Kim J, Yoon Y, Park HJ, Kim YH. Comparative Study of Classification Algorithms for Various DNA Microarray Data. Genes (Basel) 2022;13:494. [PMID: 35328048 PMCID: PMC8951024 DOI: 10.3390/genes13030494] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/15/2022] [Accepted: 03/07/2022] [Indexed: 12/19/2022] Open

Using machine learning to detect the differential usage of novel gene isoforms. BMC Bioinformatics 2022;23:45. [PMID: 35042461 PMCID: PMC8764765 DOI: 10.1186/s12859-022-04576-3] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/11/2021] [Accepted: 01/10/2022] [Indexed: 11/24/2022] Open

Kakati T, Bhattacharyya DK, Kalita JK, Norden-Krichmar TM. DEGnext: classification of differentially expressed genes from RNA-seq data using a convolutional neural network with transfer learning. BMC Bioinformatics 2022;23:17. [PMID: 34991439 PMCID: PMC8734099 DOI: 10.1186/s12859-021-04527-4] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/15/2020] [Accepted: 12/13/2021] [Indexed: 12/11/2022] Open

Abstract

BACKGROUND

A limitation of traditional differential expression analysis on small datasets involves the possibility of false positives and false negatives due to sample variation. Considering the recent advances in deep learning (DL) based models, we wanted to expand the state-of-the-art in disease biomarker prediction from RNA-seq data using DL. However, application of DL to RNA-seq data is challenging due to absence of appropriate labels and smaller sample size as compared to number of genes. Deep learning coupled with transfer learning can improve prediction performance on novel data by incorporating patterns learned from other related data. With the emergence of new disease datasets, biomarker prediction would be facilitated by having a generalized model that can transfer the knowledge of trained feature maps to the new dataset. To the best of our knowledge, there is no Convolutional Neural Network (CNN)-based model coupled with transfer learning to predict the significant upregulating (UR) and downregulating (DR) genes from both trained and untrained datasets.

RESULTS

We implemented a CNN model, DEGnext, to predict UR and DR genes from gene expression data obtained from The Cancer Genome Atlas database. DEGnext uses biologically validated data along with logarithmic fold change values to classify differentially expressed genes (DEGs) as UR and DR genes. We applied transfer learning to our model to leverage the knowledge of trained feature maps to untrained cancer datasets. DEGnext's results were competitive (ROC scores between 88 and 99[Formula: see text]) with those of five traditional machine learning methods: Decision Tree, K-Nearest Neighbors, Random Forest, Support Vector Machine, and XGBoost. DEGnext was robust and effective in terms of transferring learned feature maps to facilitate classification of unseen datasets. Additionally, we validated that the predicted DEGs from DEGnext were mapped to significant Gene Ontology terms and pathways related to cancer.

CONCLUSIONS

DEGnext can classify DEGs into UR and DR genes from RNA-seq cancer datasets with high performance. This type of analysis, using biologically relevant fine-tuning data, may aid in the exploration of potential biomarkers and can be adapted for other disease datasets.

Collapse

Eshun RB, Kamrul Islam AKM, Bikdash MU. Identification of Significantly Expressed Gene Mutations for Automated Classification of Benign and Malignant Prostate Cancer. ANNUAL INTERNATIONAL CONFERENCE OF THE IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. IEEE ENGINEERING IN MEDICINE AND BIOLOGY SOCIETY. ANNUAL INTERNATIONAL CONFERENCE 2021;2021:2437-2443. [PMID: 34891773 DOI: 10.1109/embc46164.2021.9630460] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 06/14/2023]

Gupta R, Kleinjans J, Caiment F. Identifying novel transcript biomarkers for hepatocellular carcinoma (HCC) using RNA-Seq datasets and machine learning. BMC Cancer 2021;21:962. [PMID: 34445986 PMCID: PMC8394105 DOI: 10.1186/s12885-021-08704-9] [Citation(s) in RCA: 9] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/16/2020] [Accepted: 08/09/2021] [Indexed: 11/26/2022] Open

Abstract

BACKGROUND

Hepatocellular carcinoma (HCC) is one of the leading causes of cancer death in the world owing to limitations in its prognosis. The current prognosis approaches include radiological examination and detection of serum biomarkers, however, both have limited efficiency and are ineffective in early prognosis. Due to such limitations, we propose to use RNA-Seq data for evaluating putative higher accuracy biomarkers at the transcript level that could help in early prognosis.

METHODS

To identify such potential transcript biomarkers, RNA-Seq data for healthy liver and various HCC cell models were subjected to five different machine learning algorithms: random forest, K-nearest neighbor, Naïve Bayes, support vector machine, and neural networks. Various metrics, namely sensitivity, specificity, MCC, informedness, and AUC-ROC (except for support vector machine) were evaluated. The algorithms that produced the highest values for all metrics were chosen to extract the top features that were subjected to recursive feature elimination. Through recursive feature elimination, the least number of features were obtained to differentiate between the healthy and HCC cell models.

RESULTS

From the metrics used, it is demonstrated that the efficiency of the known protein biomarkers for HCC is comparatively lower than complete transcriptomics data. Among the different machine learning algorithms, random forest and support vector machine demonstrated the best performance. Using recursive feature elimination on top features of random forest and support vector machine three transcripts were selected that had an accuracy of 0.97 and kappa of 0.93. Of the three transcripts, two were protein coding (PARP2-202 and SPON2-203) and one was a non-coding transcript (CYREN-211). Lastly, we demonstrated that these three selected transcripts outperformed randomly taken three transcripts (15,000 combinations), hence were not chance findings, and could then be an interesting candidate for new HCC biomarker development.

CONCLUSION

Using RNA-Seq data combined with machine learning approaches can aid in finding novel transcript biomarkers. The three biomarkers identified: PARP2-202, SPON2-203, and CYREN-211, presented the highest accuracy among all other transcripts in differentiating the healthy and HCC cell models. The machine learning pipeline developed in this study can be used for any RNA-Seq dataset to find novel transcript biomarkers. Code: www.github.com/rajinder4489/ML_biomarkers.

Collapse

A deep learning approach to identify gene targets of a therapeutic for human splicing disorders. Nat Commun 2021;12:3332. [PMID: 34099697 PMCID: PMC8185002 DOI: 10.1038/s41467-021-23663-2] [Citation(s) in RCA: 26] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/28/2020] [Accepted: 05/07/2021] [Indexed: 01/16/2023] Open

A supervised machine learning-based methodology for analyzing dysregulation in splicing machinery: An application in cancer diagnosis. Artif Intell Med 2020;108:101950. [PMID: 32972670 DOI: 10.1016/j.artmed.2020.101950] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2019] [Revised: 08/15/2020] [Accepted: 08/18/2020] [Indexed: 02/06/2023]

Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch KE, Wilshire GB, Joshi T. GenomeForest: An Ensemble Machine Learning Classifier for Endometriosis. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE PROCEEDINGS. AMIA JOINT SUMMITS ON TRANSLATIONAL SCIENCE 2020;2020:33-42. [PMID: 32477621 PMCID: PMC7233069] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Grants] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]

Pathway-guided analysis identifies Myc-dependent alternative pre-mRNA splicing in aggressive prostate cancers. Proc Natl Acad Sci U S A 2020;117:5269-5279. [PMID: 32086391 PMCID: PMC7071906 DOI: 10.1073/pnas.1915975117] [Citation(s) in RCA: 54] [Impact Index Per Article: 10.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open

Fiosina J, Fiosins M, Bonn S. Explainable Deep Learning for Augmentation of Small RNA Expression Profiles. J Comput Biol 2020;27:234-247. [PMID: 31855058 PMCID: PMC7047095 DOI: 10.1089/cmb.2019.0320] [Citation(s) in RCA: 7] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/07/2023] Open

Klén R, Karhunen M, Elo LL. Likelihood contrasts: a machine learning algorithm for binary classification of longitudinal data. Sci Rep 2020;10:1016. [PMID: 31974488 PMCID: PMC6978422 DOI: 10.1038/s41598-020-57924-9] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/13/2019] [Accepted: 12/31/2019] [Indexed: 12/02/2022] Open

Akter S, Xu D, Nagel SC, Bromfield JJ, Pelch K, Wilshire GB, Joshi T. Machine Learning Classifiers for Endometriosis Using Transcriptomics and Methylomics Data. Front Genet 2019;10:766. [PMID: 31552087 PMCID: PMC6737999 DOI: 10.3389/fgene.2019.00766] [Citation(s) in RCA: 26] [Impact Index Per Article: 4.3] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/18/2019] [Accepted: 07/19/2019] [Indexed: 12/29/2022] Open

Al-Shaer AE, Flentke GR, Berres ME, Garic A, Smith SM. Exon level machine learning analyses elucidate novel candidate miRNA targets in an avian model of fetal alcohol spectrum disorder. PLoS Comput Biol 2019;15:e1006937. [PMID: 30973878 PMCID: PMC6478348 DOI: 10.1371/journal.pcbi.1006937] [Citation(s) in RCA: 5] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/15/2018] [Revised: 04/23/2019] [Accepted: 03/11/2019] [Indexed: 12/20/2022] Open

Abstract

Gestational alcohol exposure causes fetal alcohol spectrum disorder (FASD) and is a prominent cause of neurodevelopmental disability. Whole transcriptome sequencing (RNA-Seq) offer insights into mechanisms underlying FASD, but gene-level analysis provides limited information regarding complex transcriptional processes such as alternative splicing and non-coding RNAs. Moreover, traditional analytical approaches that use multiple hypothesis testing with a false discovery rate adjustment prioritize genes based on an adjusted p-value, which is not always biologically relevant. We address these limitations with a novel approach and implemented an unsupervised machine learning model, which we applied to an exon-level analysis to reduce data complexity to the most likely functionally relevant exons, without loss of novel information. This was performed on an RNA-Seq paired-end dataset derived from alcohol-exposed neural fold-stage chick crania, wherein alcohol causes facial deficits recapitulating those of FASD. A principal component analysis along with k-means clustering was utilized to extract exons that deviated from baseline expression. This identified 6857 differentially expressed exons representing 1251 geneIDs; 391 of these genes were identified in a prior gene-level analysis of this dataset. It also identified exons encoding 23 microRNAs (miRNAs) having significantly differential expression profiles in response to alcohol. We developed an RDAVID pipeline to identify KEGG pathways represented by these exons, and separately identified predicted KEGG pathways targeted by these miRNAs. Several of these (ribosome biogenesis, oxidative phosphorylation) were identified in our prior gene-level analysis. Other pathways are crucial to facial morphogenesis and represent both novel (focal adhesion, FoxO signaling, insulin signaling) and known (Wnt signaling) alcohol targets. Importantly, there was substantial overlap between the exomes themselves and the predicted miRNA targets, suggesting these miRNAs contribute to the gene-level expression changes. Our novel application of unsupervised machine learning in conjunction with statistical analyses facilitated the discovery of signaling pathways and miRNAs that inform mechanisms underlying FASD.

Collapse

Park E, Pan Z, Zhang Z, Lin L, Xing Y. The Expanding Landscape of Alternative Splicing Variation in Human Populations. Am J Hum Genet 2018;102:11-26. [PMID: 29304370 PMCID: PMC5777382 DOI: 10.1016/j.ajhg.2017.11.002] [Citation(s) in RCA: 246] [Impact Index Per Article: 35.1] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/30/2017] [Accepted: 11/03/2017] [Indexed: 12/16/2022] Open