1
|
Hammond R, Athanasiadou R, Curado S, Aphinyanaphongs Y, Abrams C, Messito MJ, Gross R, Katzow M, Jay M, Razavian N, Elbel B. Correction: Predicting childhood obesity using electronic health records and publicly available data. PLoS One 2019; 14:e0223796. [PMID: 31589654 PMCID: PMC6779227 DOI: 10.1371/journal.pone.0223796] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/23/2022] Open
|
2
|
Hammond R, Athanasiadou R, Curado S, Aphinyanaphongs Y, Abrams C, Messito MJ, Gross R, Katzow M, Jay M, Razavian N, Elbel B. Predicting childhood obesity using electronic health records and publicly available data. PLoS One 2019; 14:e0215571. [PMID: 31009509 PMCID: PMC6476510 DOI: 10.1371/journal.pone.0215571] [Citation(s) in RCA: 30] [Impact Index Per Article: 6.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/21/2018] [Accepted: 04/05/2019] [Indexed: 01/13/2023] Open
Abstract
Background Because of the strong link between childhood obesity and adulthood obesity comorbidities, and the difficulty in decreasing body mass index (BMI) later in life, effective strategies are needed to address this condition in early childhood. The ability to predict obesity before age five could be a useful tool, allowing prevention strategies to focus on high risk children. The few existing prediction models for obesity in childhood have primarily employed data from longitudinal cohort studies, relying on difficult to collect data that are not readily available to all practitioners. Instead, we utilized real-world unaugmented electronic health record (EHR) data from the first two years of life to predict obesity status at age five, an approach not yet taken in pediatric obesity research. Methods and findings We trained a variety of machine learning algorithms to perform both binary classification and regression. Following previous studies demonstrating different obesity determinants for boys and girls, we similarly developed separate models for both groups. In each of the separate models for boys and girls we found that weight for length z-score, BMI between 19 and 24 months, and the last BMI measure recorded before age two were the most important features for prediction. The best performing models were able to predict obesity with an Area Under the Receiver Operator Characteristic Curve (AUC) of 81.7% for girls and 76.1% for boys. Conclusions We were able to predict obesity at age five using EHR data with an AUC comparable to cohort-based studies, reducing the need for investment in additional data collection. Our results suggest that machine learning approaches for predicting future childhood obesity using EHR data could improve the ability of clinicians and researchers to drive future policy, intervention design, and the decision-making process in a clinical setting.
Collapse
Affiliation(s)
- Robert Hammond
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
| | - Rodoniki Athanasiadou
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
| | - Silvia Curado
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Cell Biology, NYU School of Medicine, New York, New York, United States of America
| | - Yindalon Aphinyanaphongs
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
| | - Courtney Abrams
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
| | - Mary Jo Messito
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Rachel Gross
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Michelle Katzow
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Pediatrics, NYU School of Medicine, Bellevue Hospital Center, New York, New York, United States of America
| | - Melanie Jay
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- Department of Medicine, NYU School of Medicine, New York, New York, United States of America
| | - Narges Razavian
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- Department of Radiology, NYU School of Medicine, New York, New York, United States of America
| | - Brian Elbel
- NYU Langone Comprehensive Program on Obesity, NYU School of Medicine, New York, New York, United States of America
- Department of Population Health, NYU School of Medicine, New York, New York, United States of America
- NYU Wagner Graduate School of Public Service, New York, New York, United States of America
| |
Collapse
|
3
|
Athanasiadou R, Neymotin B, Brandt N, Wang W, Christiaen L, Gresham D, Tranchina D. A complete statistical model for calibration of RNA-seq counts using external spike-ins and maximum likelihood theory. PLoS Comput Biol 2019; 15:e1006794. [PMID: 30856174 PMCID: PMC6428340 DOI: 10.1371/journal.pcbi.1006794] [Citation(s) in RCA: 6] [Impact Index Per Article: 1.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/07/2018] [Revised: 03/21/2019] [Accepted: 01/16/2019] [Indexed: 01/09/2023] Open
Abstract
A fundamental assumption, common to the vast majority of high-throughput transcriptome analyses, is that the expression of most genes is unchanged among samples and that total cellular RNA remains constant. As the number of analyzed experimental systems increases however, different independent studies demonstrate that this assumption is often violated. We present a calibration method using RNA spike-ins that allows for the measurement of absolute cellular abundance of RNA molecules. We apply the method to pooled RNA from cell populations of known sizes. For each transcript, we compute a nominal abundance that can be converted to absolute by dividing by a scale factor determined in separate experiments: the yield coefficient of the transcript relative to that of a reference spike-in measured with the same protocol. The method is derived by maximum likelihood theory in the context of a complete statistical model for sequencing counts contributed by cellular RNA and spike-ins. The counts are based on a sample from a fixed number of cells to which a fixed population of spike-in molecules has been added. We illustrate and evaluate the method with applications to two global expression data sets, one from the model eukaryote Saccharomyces cerevisiae, proliferating at different growth rates, and differentiating cardiopharyngeal cell lineages in the chordate Ciona robusta. We tested the method in a technical replicate dilution study, and in a k-fold validation study.
Collapse
Affiliation(s)
- Rodoniki Athanasiadou
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Benjamin Neymotin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Nathan Brandt
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Wei Wang
- Center for Developmental Genetics, Department of Biology, New York University, New York, New York, United States of America
| | - Lionel Christiaen
- Center for Developmental Genetics, Department of Biology, New York University, New York, New York, United States of America
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York, United States of America
| | - Daniel Tranchina
- Department of Biology, New York University, New York, New York, United States of America
- Courant Institute of Mathematical Sciences, New York University, New York, New York, United States of America
| |
Collapse
|
4
|
Airoldi EM, Miller D, Athanasiadou R, Brandt N, Abdul-Rahman F, Neymotin B, Hashimoto T, Bahmani T, Gresham D. Steady-state and dynamic gene expression programs in Saccharomyces cerevisiae in response to variation in environmental nitrogen. Mol Biol Cell 2016; 27:1383-96. [PMID: 26941329 PMCID: PMC4831890 DOI: 10.1091/mbc.e14-05-1013] [Citation(s) in RCA: 29] [Impact Index Per Article: 3.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/28/2014] [Accepted: 02/23/2016] [Indexed: 11/16/2022] Open
Abstract
Steady-state and transiently perturbed nitrogen-limited chemostats show that nitrogen abundance is a primary signal controlling nitrogen-responsive gene expression. When cells experience an increase in nitrogen, some transcripts are rapidly degraded, suggesting that accelerated mRNA degradation contributes to remodeling of gene expression. Cell growth rate is regulated in response to the abundance and molecular form of essential nutrients. In Saccharomyces cerevisiae (budding yeast), the molecular form of environmental nitrogen is a major determinant of cell growth rate, supporting growth rates that vary at least threefold. Transcriptional control of nitrogen use is mediated in large part by nitrogen catabolite repression (NCR), which results in the repression of specific transcripts in the presence of a preferred nitrogen source that supports a fast growth rate, such as glutamine, that are otherwise expressed in the presence of a nonpreferred nitrogen source, such as proline, which supports a slower growth rate. Differential expression of the NCR regulon and additional nitrogen-responsive genes results in >500 transcripts that are differentially expressed in cells growing in the presence of different nitrogen sources in batch cultures. Here we find that in growth rate–controlled cultures using nitrogen-limited chemostats, gene expression programs are strikingly similar regardless of nitrogen source. NCR expression is derepressed in all nitrogen-limiting chemostat conditions regardless of nitrogen source, and in these conditions, only 34 transcripts exhibit nitrogen source–specific differential gene expression. Addition of either the preferred nitrogen source, glutamine, or the nonpreferred nitrogen source, proline, to cells growing in nitrogen-limited chemostats results in rapid, dose-dependent repression of the NCR regulon. Using a novel means of computational normalization to compare global gene expression programs in steady-state and dynamic conditions, we find evidence that the addition of nitrogen to nitrogen-limited cells results in the transient overproduction of transcripts required for protein translation. Simultaneously, we find that that accelerated mRNA degradation underlies the rapid clearing of a subset of transcripts, which is most pronounced for the highly expressed NCR-regulated permease genes GAP1, MEP2, DAL5, PUT4, and DIP5. Our results reveal novel aspects of nitrogen-regulated gene expression and highlight the need for a quantitative approach to study how the cell coordinates protein translation and nitrogen assimilation to optimize cell growth in different environments.
Collapse
Affiliation(s)
- Edoardo M Airoldi
- Department of Statistics, Harvard University, Cambridge, MA 02138 Broad Institute of Harvard and Massachusetts Institute of Technology, Cambridge, MA 02142
| | - Darach Miller
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Rodoniki Athanasiadou
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Nathan Brandt
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Farah Abdul-Rahman
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Benjamin Neymotin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - Tatsu Hashimoto
- Department of Statistics, Harvard University, Cambridge, MA 02138
| | - Tayebeh Bahmani
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, NY 10003
| |
Collapse
|
5
|
Nadel J, Athanasiadou R, Lemetre C, Wijetunga NA, Ó Broin P, Sato H, Zhang Z, Jeddeloh J, Montagna C, Golden A, Seoighe C, Greally JM. RNA:DNA hybrids in the human genome have distinctive nucleotide characteristics, chromatin composition, and transcriptional relationships. Epigenetics Chromatin 2015; 8:46. [PMID: 26579211 PMCID: PMC4647656 DOI: 10.1186/s13072-015-0040-6] [Citation(s) in RCA: 114] [Impact Index Per Article: 12.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/22/2015] [Accepted: 10/29/2015] [Indexed: 01/01/2023] Open
Abstract
Background RNA:DNA hybrids represent a non-canonical nucleic acid structure that has been associated with a range of human diseases and potential transcriptional regulatory functions. Mapping of RNA:DNA hybrids in human cells reveals them to have a number of characteristics that give insights into their functions. Results We find RNA:DNA hybrids to occupy millions of base pairs in the human genome. A directional sequencing approach shows the RNA component of the RNA:DNA hybrid to be purine-rich, indicating a thermodynamic contribution to their in vivo stability. The RNA:DNA hybrids are enriched at loci with decreased DNA methylation and increased DNase hypersensitivity, and within larger domains with characteristics of heterochromatin formation, indicating potential transcriptional regulatory properties. Mass spectrometry studies of chromatin at RNA:DNA hybrids shows the presence of the ILF2 and ILF3 transcription factors, supporting a model of certain transcription factors binding preferentially to the RNA:DNA conformation. Conclusions Overall, there is little to indicate a dependence for RNA:DNA hybrids forming co-transcriptionally, with results from the ribosomal DNA repeat unit instead supporting the intriguing model of RNA generating these structures intrans. The results of the study indicate heterogeneous functions of these genomic elements and new insights into their formation and stability in vivo. Electronic supplementary material The online version of this article (doi:10.1186/s13072-015-0040-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Julie Nadel
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Rodoniki Athanasiadou
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Biology, Center for Genomics and Systems Biology, New York University, 12 Waverly Place, New York, NY 10003 USA
| | - Christophe Lemetre
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Integrated Genomics Operation, Memorial Sloan-Kettering Cancer Center, New York, NY 10065 USA
| | - N Ari Wijetunga
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Pilib Ó Broin
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Hanae Sato
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Zhengdong Zhang
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | | | - Cristina Montagna
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Aaron Golden
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA
| | - Cathal Seoighe
- School of Mathematics, Statistics and Applied Mathematics, National University of Ireland Galway, Galway, Ireland
| | - John M Greally
- Department of Genetics, Albert Einstein College of Medicine, Bronx, NY 10461 USA ; Department of Genetics, Center for Epigenomics and Division of Computational Genetics, Albert Einstein College of Medicine, 1301 Morris Park Avenue, Bronx, NY 10461 USA
| |
Collapse
|
6
|
Abstract
The abundance of a transcript is determined by its rate of synthesis and its rate of degradation; however, global methods for quantifying RNA abundance cannot distinguish variation in these two processes. Here, we introduce RNA approach to equilibrium sequencing (RATE-seq), which uses in vivo metabolic labeling of RNA and approach to equilibrium kinetics, to determine absolute RNA degradation and synthesis rates. RATE-seq does not disturb cellular physiology, uses straightforward normalization with exogenous spike-ins, and can be readily adapted for studies in most organisms. We demonstrate the use of RATE-seq to estimate genome-wide kinetic parameters for coding and noncoding transcripts in Saccharomyces cerevisiae.
Collapse
Affiliation(s)
- Benjamin Neymotin
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003, USA
| | - Rodoniki Athanasiadou
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003, USA
| | - David Gresham
- Center for Genomics and Systems Biology, Department of Biology, New York University, New York, New York 10003, USA
| |
Collapse
|
7
|
Athanasiadou R, de Sousa D, Myant K, Merusi C, Stancheva I, Bird A. Targeting of de novo DNA methylation throughout the Oct-4 gene regulatory region in differentiating embryonic stem cells. PLoS One 2010; 5:e9937. [PMID: 20376339 PMCID: PMC2848578 DOI: 10.1371/journal.pone.0009937] [Citation(s) in RCA: 55] [Impact Index Per Article: 3.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/30/2009] [Accepted: 03/08/2010] [Indexed: 02/07/2023] Open
Abstract
Differentiation of embryonic stem (ES) cells is accompanied by silencing of the Oct-4 gene and de novo DNA methylation of its regulatory region. Previous studies have focused on the requirements for promoter region methylation. We therefore undertook to analyse the progression of DNA methylation of the ∼2000 base pair regulatory region of Oct-4 in ES cells that are wildtype or deficient for key proteins. We find that de novo methylation is initially seeded at two discrete sites, the proximal enhancer and distal promoter, spreading later to neighboring regions, including the remainder of the promoter. De novo methyltransferases Dnmt3a and Dnmt3b cooperate in the initial targeted stage of de novo methylation. Efficient completion of the pattern requires Dnmt3a and Dnmt1, but not Dnmt3b. Methylation of the Oct-4 promoter depends on the histone H3 lysine 9 methyltransferase G9a, as shown previously, but CpG methylation throughout most of the regulatory region accumulates even in the absence of G9a. Analysis of the Oct-4 regulatory domain as a whole has allowed us to detect targeted de novo methylation and to refine our understanding the roles of key protein components in this process.
Collapse
Affiliation(s)
- Rodoniki Athanasiadou
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Dina de Sousa
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Kevin Myant
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Cara Merusi
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Irina Stancheva
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
| | - Adrian Bird
- Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh, United Kingdom
- * E-mail:
| |
Collapse
|