1
|
Wong WKM, Thorat V, Joglekar MV, Dong CX, Lee H, Chew YV, Bhave A, Hawthorne WJ, Engin F, Pant A, Dalgaard LT, Bapat S, Hardikar AA. Analysis of Half a Billion Datapoints Across Ten Machine-Learning Algorithms Identifies Key Elements Associated With Insulin Transcription in Human Pancreatic Islet Cells. Front Endocrinol (Lausanne) 2022; 13:853863. [PMID: 35399953 PMCID: PMC8986156 DOI: 10.3389/fendo.2022.853863] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 01/13/2022] [Accepted: 02/22/2022] [Indexed: 11/24/2022] Open
Abstract
Machine learning (ML)-workflows enable unprejudiced/robust evaluation of complex datasets. Here, we analyzed over 490,000,000 data points to compare 10 different ML-workflows in a large (N=11,652) training dataset of human pancreatic single-cell (sc-)transcriptomes to identify genes associated with the presence or absence of insulin transcript(s). Prediction accuracy/sensitivity of each ML-workflow was tested in a separate validation dataset (N=2,913). Ensemble ML-workflows, in particular Random Forest ML-algorithm delivered high predictive power (AUC=0.83) and sensitivity (0.98), compared to other algorithms. The transcripts identified through these analyses also demonstrated significant correlation with insulin in bulk RNA-seq data from human islets. The top-10 features, (including IAPP, ADCYAP1, LDHA and SST) common to the three Ensemble ML-workflows were significantly dysregulated in scRNA-seq datasets from Ire-1αβ-/- mice that demonstrate dedifferentiation of pancreatic β-cells in a model of type 1 diabetes (T1D) and in pancreatic single cells from individuals with type 2 Diabetes (T2D). Our findings provide direct comparison of ML-workflows in big data analyses, identify key elements associated with insulin transcription and provide workflows for future analyses.
Collapse
Affiliation(s)
- Wilson K. M. Wong
- Diabetes and Islet Biology Group, School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Vinod Thorat
- Healthcare Analytics, AlgoAnalytics, Pune, India
| | - Mugdha V. Joglekar
- Diabetes and Islet Biology Group, School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Charlotte X. Dong
- Diabetes and Islet Biology Group, School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
| | - Hugo Lee
- Department of Biomolecular Chemistry, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| | - Yi Vee Chew
- Centre for Transplant and Renal Research, Westmead Institute for Medical Research, University of Sydney, Westmead, NSW, Australia
| | - Adwait Bhave
- Healthcare Analytics, AlgoAnalytics, Pune, India
| | - Wayne J. Hawthorne
- Centre for Transplant and Renal Research, Westmead Institute for Medical Research, University of Sydney, Westmead, NSW, Australia
| | - Feyza Engin
- Department of Biomolecular Chemistry, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
- Division of Endocrinology, Diabetes & Metabolism, Department of Medicine, School of Medicine and Public Health, University of Wisconsin-Madison, Madison, WI, United States
| | | | - Louise T. Dalgaard
- Department of Science and Environment, Roskilde University, Roskilde, Denmark
| | - Sharda Bapat
- Healthcare Analytics, AlgoAnalytics, Pune, India
- *Correspondence: Sharda Bapat, ; Anandwardhan A. Hardikar,
| | - Anandwardhan A. Hardikar
- Diabetes and Islet Biology Group, School of Medicine, Western Sydney University, Campbelltown, NSW, Australia
- Department of Science and Environment, Roskilde University, Roskilde, Denmark
- *Correspondence: Sharda Bapat, ; Anandwardhan A. Hardikar,
| |
Collapse
|