1
|
Kock KH, Kimes PK, Gisselbrecht SS, Inukai S, Phanor SK, Anderson JT, Ramakrishnan G, Lipper CH, Song D, Kurland JV, Rogers JM, Jeong R, Blacklow SC, Irizarry RA, Bulyk ML. DNA binding analysis of rare variants in homeodomains reveals homeodomain specificity-determining residues. Nat Commun 2024; 15:3110. [PMID: 38600112 PMCID: PMC11006913 DOI: 10.1038/s41467-024-47396-0] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/16/2023] [Accepted: 03/29/2024] [Indexed: 04/12/2024] Open
Abstract
Homeodomains (HDs) are the second largest class of DNA binding domains (DBDs) among eukaryotic sequence-specific transcription factors (TFs) and are the TF structural class with the largest number of disease-associated mutations in the Human Gene Mutation Database (HGMD). Despite numerous structural studies and large-scale analyses of HD DNA binding specificity, HD-DNA recognition is still not fully understood. Here, we analyze 92 human HD mutants, including disease-associated variants and variants of uncertain significance (VUS), for their effects on DNA binding activity. Many of the variants alter DNA binding affinity and/or specificity. Detailed biochemical analysis and structural modeling identifies 14 previously unknown specificity-determining positions, 5 of which do not contact DNA. The same missense substitution at analogous positions within different HDs often exhibits different effects on DNA binding activity. Variant effect prediction tools perform moderately well in distinguishing variants with altered DNA binding affinity, but poorly in identifying those with altered binding specificity. Our results highlight the need for biochemical assays of TF coding variants and prioritize dozens of variants for further investigations into their pathogenicity and the development of clinical diagnostics and precision therapies.
Collapse
Affiliation(s)
- Kian Hong Kock
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
| | - Patrick K Kimes
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Stephen S Gisselbrecht
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sachi Inukai
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Sabrina K Phanor
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - James T Anderson
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Gayatri Ramakrishnan
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Boston Bangalore Biosciences Beginnings Program, Harvard University, Cambridge, MA, USA
| | - Colin H Lipper
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
| | - Dongyuan Song
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Jesse V Kurland
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
| | - Julia M Rogers
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Raehoon Jeong
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA
| | - Stephen C Blacklow
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA
- Department of Biological Chemistry and Molecular Pharmacology, Blavatnik Institute, Harvard Medical School, Boston, MA, USA
- Department of Cancer Biology, Dana Farber Cancer Institute, Boston, MA, USA
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Martha L Bulyk
- Division of Genetics, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, USA.
- Program in Biological and Biomedical Sciences, Harvard University, Cambridge, MA, USA.
- Committee on Higher Degrees in Biophysics, Harvard University, Cambridge, MA, USA.
- Bioinformatics and Integrative Genomics Graduate Program, Harvard University, Cambridge, MA, USA.
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
| |
Collapse
|
2
|
Grabski IN, Heymach JV, Kehl KL, Kopetz S, Lau KS, Riely GJ, Schrag D, Yaeger R, Irizarry RA, Haigis KM. Effects of KRAS Genetic Interactions on Outcomes in Cancers of the Lung, Pancreas, and Colorectum. Cancer Epidemiol Biomarkers Prev 2024; 33:158-169. [PMID: 37943166 PMCID: PMC10841605 DOI: 10.1158/1055-9965.epi-23-0262] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/29/2023] [Revised: 07/02/2023] [Accepted: 11/07/2023] [Indexed: 11/10/2023] Open
Abstract
BACKGROUND KRAS is among the most commonly mutated oncogenes in cancer, and previous studies have shown associations with survival in many cancer contexts. Evidence from both clinical observations and mouse experiments further suggests that these associations are allele- and tissue-specific. These findings motivate using clinical data to understand gene interactions and clinical covariates within different alleles and tissues. METHODS We analyze genomic and clinical data from the AACR Project GENIE Biopharma Collaborative for samples from lung, colorectal, and pancreatic cancers. For each of these cancer types, we report epidemiological associations for different KRAS alleles, apply principal component analysis (PCA) to discover groups of genes co-mutated with KRAS, and identify distinct clusters of patient profiles with implications for survival. RESULTS KRAS mutations were associated with inferior survival in lung, colon, and pancreas, although the specific mutations implicated varied by disease. Tissue- and allele-specific associations with smoking, sex, age, and race were found. Tissue-specific genetic interactions with KRAS were identified by PCA, which were clustered to produce five, four, and two patient profiles in lung, colon, and pancreas. Membership in these profiles was associated with survival in all three cancer types. CONCLUSIONS KRAS mutations have tissue- and allele-specific associations with inferior survival, clinical covariates, and genetic interactions. IMPACT Our results provide greater insight into the tissue- and allele-specific associations with KRAS mutations and identify clusters of patients that are associated with survival and clinical attributes from combinations of genetic interactions with KRAS mutations.
Collapse
Affiliation(s)
- Isabella N. Grabski
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - John V. Heymach
- Department of Thoracic and Head and Neck Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Kenneth L. Kehl
- Division of Population Sciences, Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Scott Kopetz
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Ken S. Lau
- Department of Cell and Developmental Biology, Vanderbilt University School of Medicine, Nashville, TN, USA
| | - Gregory J. Riely
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Deborah Schrag
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rona Yaeger
- Department of Medicine, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rafael A. Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Kevin M. Haigis
- Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| |
Collapse
|
3
|
Grabski IN, Street K, Irizarry RA. Significance analysis for clustering with single-cell RNA-sequencing data. Nat Methods 2023; 20:1196-1202. [PMID: 37429993 DOI: 10.1038/s41592-023-01933-9] [Citation(s) in RCA: 13] [Impact Index Per Article: 13.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/01/2022] [Accepted: 06/01/2023] [Indexed: 07/12/2023]
Abstract
Unsupervised clustering of single-cell RNA-sequencing data enables the identification of distinct cell populations. However, the most widely used clustering algorithms are heuristic and do not formally account for statistical uncertainty. We find that not addressing known sources of variability in a statistically rigorous manner can lead to overconfidence in the discovery of novel cell types. Here we extend a previous method, significance of hierarchical clustering, to propose a model-based hypothesis testing approach that incorporates significance analysis into the clustering algorithm and permits statistical evaluation of clusters as distinct cell populations. We also adapt this approach to permit statistical assessment on the clusters reported by any algorithm. Finally, we extend these approaches to account for batch structure. We benchmarked our approach against popular clustering workflows, demonstrating improved performance. To show practical utility, we applied our approach to the Human Lung Cell Atlas and an atlas of the mouse cerebellar cortex, identifying several cases of over-clustering and recapitulating experimentally validated cell type definitions.
Collapse
Affiliation(s)
- Isabella N Grabski
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA.
| | - Kelly Street
- Division of Biostatistics, Department of Population and Public Health Sciences, Keck School of Medicine, University of Southern California, Los Angeles, CA, USA.
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
4
|
Alexander TA, Irizarry RA, Bravo HC. Capturing discrete latent structures: choose LDs over PCs. Biostatistics 2022; 24:1-16. [PMID: 34467372 DOI: 10.1093/biostatistics/kxab030] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/24/2020] [Revised: 07/19/2021] [Accepted: 07/21/2021] [Indexed: 12/16/2022] Open
Abstract
High-dimensional biological data collection across heterogeneous groups of samples has become increasingly common, creating high demand for dimensionality reduction techniques that capture underlying structure of the data. Discovering low-dimensional embeddings that describe the separation of any underlying discrete latent structure in data is an important motivation for applying these techniques since these latent classes can represent important sources of unwanted variability, such as batch effects, or interesting sources of signal such as unknown cell types. The features that define this discrete latent structure are often hard to identify in high-dimensional data. Principal component analysis (PCA) is one of the most widely used methods as an unsupervised step for dimensionality reduction. This reduction technique finds linear transformations of the data which explain total variance. When the goal is detecting discrete structure, PCA is applied with the assumption that classes will be separated in directions of maximum variance. However, PCA will fail to accurately find discrete latent structure if this assumption does not hold. Visualization techniques, such as t-Distributed Stochastic Neighbor Embedding (t-SNE) and Uniform Manifold Approximation and Projection (UMAP), attempt to mitigate these problems with PCA by creating a low-dimensional space where similar objects are modeled by nearby points in the low-dimensional embedding and dissimilar objects are modeled by distant points with high probability. However, since t-SNE and UMAP are computationally expensive, often a PCA reduction is done before applying them which makes it sensitive to PCAs downfalls. Also, tSNE is limited to only two or three dimensions as a visualization tool, which may not be adequate for retaining discriminatory information. The linear transformations of PCA are preferable to non-linear transformations provided by methods like t-SNE and UMAP for interpretable feature weights. Here, we propose iterative discriminant analysis (iDA), a dimensionality reduction technique designed to mitigate these limitations. iDA produces an embedding that carries discriminatory information which optimally separates latent clusters using linear transformations that permit post hoc analysis to determine features that define these latent structures.
Collapse
Affiliation(s)
- Theresa A Alexander
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA
| | - Rafael A Irizarry
- Biostatistics and Computational Biology, Dana Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA 02115, USA
| | - Héctor Corrada Bravo
- Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, USA and Data Science and Statistical Computing, Genentech, Inc. South San Francisco, CA 94080, USA
| |
Collapse
|
5
|
Kumar MS, Slud EV, Hehnly C, Zhang L, Broach J, Irizarry RA, Schiff SJ, Paulson JN. Differential richness inference for 16S rRNA marker gene surveys. Genome Biol 2022; 23:166. [PMID: 35915508 PMCID: PMC9344657 DOI: 10.1186/s13059-022-02722-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/14/2021] [Accepted: 06/28/2022] [Indexed: 12/24/2022] Open
Abstract
Background Individual and environmental health outcomes are frequently linked to changes in the diversity of associated microbial communities. Thus, deriving health indicators based on microbiome diversity measures is essential. While microbiome data generated using high-throughput 16S rRNA marker gene surveys are appealing for this purpose, 16S surveys also generate a plethora of spurious microbial taxa. Results When this artificial inflation in the observed number of taxa is ignored, we find that changes in the abundance of detected taxa confound current methods for inferring differences in richness. Experimental evidence, theory-guided exploratory data analyses, and existing literature support the conclusion that most sub-genus discoveries are spurious artifacts of clustering 16S sequencing reads. We proceed to model a 16S survey’s systematic patterns of sub-genus taxa generation as a function of genus abundance to derive a robust control for false taxa accumulation. These controls unlock classical regression approaches for highly flexible differential richness inference at various levels of the surveyed microbial assemblage: from sample groups to specific taxa collections. The proposed methodology for differential richness inference is available through an R package, Prokounter. Conclusions False species discoveries bias richness estimation and confound differential richness inference. In the case of 16S microbiome surveys, supporting evidence indicate that most sub-genus taxa are spurious. Based on this finding, a flexible method is proposed and is shown to overcome the confounding problem noted with current approaches for differential richness inference. Package availability: https://github.com/mskb01/prokounter Supplementary Information The online version contains supplementary material available at 10.1186/s13059-022-02722-x.
Collapse
|
6
|
Grabski IN, Irizarry RA. A probabilistic gene expression barcode for annotation of cell types from single-cell RNA-seq data. Biostatistics 2022; 23:1150-1164. [PMID: 35770795 PMCID: PMC9802389 DOI: 10.1093/biostatistics/kxac021] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2021] [Revised: 05/10/2022] [Accepted: 05/22/2022] [Indexed: 01/07/2023] Open
Abstract
Single-cell RNA sequencing (scRNA-seq) quantifies gene expression for individual cells in a sample, which allows distinct cell-type populations to be identified and characterized. An important step in many scRNA-seq analysis pipelines is the annotation of cells into known cell types. While this can be achieved using experimental techniques, such as fluorescence-activated cell sorting, these approaches are impractical for large numbers of cells. This motivates the development of data-driven cell-type annotation methods. We find limitations with current approaches due to the reliance on known marker genes or from overfitting because of systematic differences, or batch effects, between studies. Here, we present a statistical approach that leverages public data sets to combine information across thousands of genes, uses a latent variable model to define cell-type-specific barcodes and account for batch effect variation, and probabilistically annotates cell-type identity from a reference of known cell types. The barcoding approach also provides a new way to discover marker genes. Using a range of data sets, including those generated to represent imperfect real-world reference data, we demonstrate that our approach substantially outperforms current reference-based methods, particularly when predicting across studies.
Collapse
Affiliation(s)
- Isabella N Grabski
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA and Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
7
|
Cable DM, Murray E, Shanmugam V, Zhang S, Zou LS, Diao M, Chen H, Macosko EZ, Irizarry RA, Chen F. Cell type-specific inference of differential expression in spatial transcriptomics. Nat Methods 2022; 19:1076-1087. [PMID: 36050488 PMCID: PMC10463137 DOI: 10.1038/s41592-022-01575-3] [Citation(s) in RCA: 30] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/18/2022] [Accepted: 07/15/2022] [Indexed: 12/13/2022]
Abstract
A central problem in spatial transcriptomics is detecting differentially expressed (DE) genes within cell types across tissue context. Challenges to learning DE include changing cell type composition across space and measurement pixels detecting transcripts from multiple cell types. Here, we introduce a statistical method, cell type-specific inference of differential expression (C-SIDE), that identifies cell type-specific DE in spatial transcriptomics, accounting for localization of other cell types. We model gene expression as an additive mixture across cell types of log-linear cell type-specific expression functions. C-SIDE's framework applies to many contexts: DE due to pathology, anatomical regions, cell-to-cell interactions and cellular microenvironment. Furthermore, C-SIDE enables statistical inference across multiple/replicates. Simulations and validation experiments on Slide-seq, MERFISH and Visium datasets demonstrate that C-SIDE accurately identifies DE with valid uncertainty quantification. Last, we apply C-SIDE to identify plaque-dependent immune activity in Alzheimer's disease and cellular interactions between tumor and immune cells. We distribute C-SIDE within the R package https://github.com/dmcable/spacexr .
Collapse
Affiliation(s)
- Dylan M Cable
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Evan Murray
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Vignesh Shanmugam
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Simon Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Luli S Zou
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA
- Department of Biostatistics, Harvard University, Boston, MA, USA
| | - Michael Diao
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, USA
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Haiqi Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Cecil H. and Ida Green Center for Reproductive Biology Sciences, University of Texas Southwestern Medical Center, Dallas, TX, USA
- Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX, USA
| | - Evan Z Macosko
- Broad Institute of MIT and Harvard, Cambridge, MA, USA
- Department of Psychiatry, Massachusetts General Hospital, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
- Department of Biostatistics, Harvard University, Boston, MA, USA.
| | - Fei Chen
- Broad Institute of MIT and Harvard, Cambridge, MA, USA.
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.
| |
Collapse
|
8
|
Abstract
Quantifying the impact of natural disasters or epidemics is critical for guiding policy decisions and interventions. When the effects of an event are long-lasting and difficult to detect in the short term, the accumulated effects can be devastating. Mortality is one of the most reliably measured health outcomes, partly due to its unambiguous definition. As a result, excess mortality estimates are an increasingly effective approach for quantifying the effect of an event. However, the fact that indirect effects are often characterized by small, but enduring, increases in mortality rates present a statistical challenge. This is compounded by sources of variability introduced by demographic changes, secular trends, seasonal and day of the week effects, and natural variation. Here, we present a model that accounts for these sources of variability and characterizes concerning increases in mortality rates with smooth functions of time that provide statistical power. The model permits discontinuities in the smooth functions to model sudden increases due to direct effects. We implement a flexible estimation approach that permits both surveillance of concerning increases in mortality rates and careful characterization of the effect of a past event. We demonstrate our tools' utility by estimating excess mortality after hurricanes in the United States and Puerto Rico. We use Hurricane Maria as a case study to show appealing properties that are unique to our method compared with current approaches. Finally, we show the flexibility of our approach by detecting and quantifying the 2014 Chikungunya outbreak in Puerto Rico and the COVID-19 pandemic in the United States. We make our tools available through the excessmort R package available from https://cran.r-project.org/web/packages/excessmort/.
Collapse
Affiliation(s)
- Rolando J. Acosta
- Department of Biostatistics, Harvard T.H. Chan School of Public, Boston, MA, USA
| | - Rafael A. Irizarry
- Department of Biostatistics, Harvard T.H. Chan School of Public, Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
9
|
Cable DM, Murray E, Zou LS, Goeva A, Macosko EZ, Chen F, Irizarry RA. Robust decomposition of cell type mixtures in spatial transcriptomics. Nat Biotechnol 2022; 40:517-526. [PMID: 33603203 PMCID: PMC8606190 DOI: 10.1038/s41587-021-00830-w] [Citation(s) in RCA: 267] [Impact Index Per Article: 133.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/07/2020] [Accepted: 12/31/2020] [Indexed: 02/07/2023]
Abstract
A limitation of spatial transcriptomics technologies is that individual measurements may contain contributions from multiple cells, hindering the discovery of cell-type-specific spatial patterns of localization and expression. Here, we develop robust cell type decomposition (RCTD), a computational method that leverages cell type profiles learned from single-cell RNA-seq to decompose cell type mixtures while correcting for differences across sequencing technologies. We demonstrate the ability of RCTD to detect mixtures and identify cell types on simulated datasets. Furthermore, RCTD accurately reproduces known cell type and subtype localization patterns in Slide-seq and Visium datasets of the mouse brain. Finally, we show how RCTD's recovery of cell type localization enables the discovery of genes within a cell type whose expression depends on spatial environment. Spatial mapping of cell types with RCTD enables the spatial components of cellular identity to be defined, uncovering new principles of cellular organization in biological tissue. RCTD is publicly available as an open-source R package at https://github.com/dmcable/RCTD .
Collapse
Affiliation(s)
- Dylan M. Cable
- Department of Electrical Engineering and Computer Science, MIT, Cambridge, MA, 02139,Broad Institute of Harvard and MIT, Cambridge, MA, 02142,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215
| | - Evan Murray
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142
| | - Luli S. Zou
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215,Department of Biostatistics, Harvard University, Boston, MA, 02115
| | | | - Evan Z. Macosko
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142,Department of Psychiatry, Massachusetts General Hospital, Boston, MA, 02114
| | - Fei Chen
- Broad Institute of Harvard and MIT, Cambridge, MA, 02142,Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge MA 02138
| | - Rafael A. Irizarry
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, 02215,Department of Biostatistics, Harvard University, Boston, MA, 02115
| |
Collapse
|
10
|
Robles-Fontán MM, Nieves EG, Cardona-Gerena I, Irizarry RA. Effectiveness estimates of three COVID-19 vaccines based on observational data from Puerto Rico. Lancet Reg Health Am 2022; 9:100212. [PMID: 35229081 PMCID: PMC8867062 DOI: 10.1016/j.lana.2022.100212] [Citation(s) in RCA: 10] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Indexed: 12/29/2022]
Abstract
Background On July 15, 2021, with 58% of the population fully vaccinated, the start of a COVID-19 surge was observed in Puerto Rico. On July 22, 2021, the government of Puerto Rico started imposing a series of strict vaccine mandates. Two months later, over 70% of the population was vaccinated, more than in any US state, and laboratory-confirmed SARS-CoV-2 had dropped substantially. The decision to impose mandates, as well as current Department of Health recommendations related to boosters, were guided by the data and the effectiveness estimates presented here. Methods Between December 15, 2020, when the vaccination process began in Puerto Rico, and October 15, 2021, 2,276,966 individuals were fully vaccinated against COVID-19. During this period 112,726 laboratory-confirmed SARS-CoV-2 infections were reported. These data permitted us to quantify the outcomes of the immunization campaign and to compare effectiveness of the mRNA-1273 (Moderna), BNT162b2 (Pfizer), and Ad26.COV2.S (J&J) vaccines. We obtained vaccination status, SARS-CoV-2 test results, and COVID-19 hospitalizations and deaths, from the Department of Health. We fit statistical models that adjusted for time-varying incidence rates and age group to estimate vaccine effectiveness, since the time of vaccination, against lab-confirmed SARS-CoV-2 infection, and COVID-19 hospitalization and death. Results Two weeks after final dose, the mRNA-1273, BNT162b2, and Ad26.COV2.S vaccines had an effectiveness of 90% (95% CI: 88–91), 87% (85–88), and, 64% (58–69), respectively. After five months, effectiveness waned to about 70%, 50%, and 40%, respectively. We found no evidence that effectiveness was different after the Delta variant became dominant. For those infected, the vaccines provided further protection against COVID-19 hospitalization and deaths across all age groups, and this conditional effect did not wane in time. Interpretation The mRNA-1273 and BNT162b2 vaccines were highly effective across all age groups. They were still effective after five months although the protection against SARS-CoV-2 infection waned. The Ad26.COV2.S vaccine was effective but to a lesser degree compared to the mRNA vaccines. Although, conditional on infection, protection against adverse outcomes did not wane, the waning in effectiveness resulted in a decreased protection against serious COVID-19 outcomes across time. Funding RAI's work was partly funded by NIH Grant R35GM131802.
Collapse
Affiliation(s)
| | - Elvis G Nieves
- Puerto Rico Department of Health, Río Piedras, PR, United States
| | | | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, CLSB 11007, 450 Brookline Ave, Boston, MA 02215, United States; Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115
| |
Collapse
|
11
|
Acosta RJ, Patnaik B, Buckee C, Kiang MV, Irizarry RA, Balsari S, Mahmud A. All-cause excess mortality across 90 municipalities in Gujarat, India, during the COVID-19 pandemic (March 2020-April 2021). PLOS Glob Public Health 2022; 2:e0000824. [PMID: 36962751 PMCID: PMC10021770 DOI: 10.1371/journal.pgph.0000824] [Citation(s) in RCA: 2] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Subscribe] [Scholar Register] [Received: 02/01/2022] [Accepted: 06/29/2022] [Indexed: 11/19/2022]
Abstract
Official COVID-19 mortality statistics are strongly influenced by local diagnostic capacity, strength of the healthcare and vital registration systems, and death certification criteria and capacity, often resulting in significant undercounting of COVID-19 attributable deaths. Excess mortality, which is defined as the increase in observed death counts compared to a baseline expectation, provides an alternate measure of the mortality shock-both direct and indirect-of the COVID-19 pandemic. Here, we use data from civil death registers from a convenience sample of 90 (of 162) municipalities across the state of Gujarat, India, to estimate the impact of the COVID-19 pandemic on all-cause mortality. Using a model fit to weekly data from January 2019 to February 2020, we estimated excess mortality over the course of the pandemic from March 2020 to April 2021. During this period, the official government data reported 10,098 deaths attributable to COVID-19 for the entire state of Gujarat. We estimated 21,300 [95% CI: 20, 700, 22, 000] excess deaths across these 90 municipalities in this period, representing a 44% [95% CI: 43%, 45%] increase over the expected baseline. The sharpest increase in deaths in our sample was observed in late April 2021, with an estimated 678% [95% CI: 649%, 707%] increase in mortality from expected counts. The 40 to 65 age group experienced the highest increase in mortality relative to the other age groups. We found substantial increases in mortality for males and females. Our excess mortality estimate for these 90 municipalities, representing approximately at least 8% of the population, based on the 2011 census, exceeds the official COVID-19 death count for the entire state of Gujarat, even before the delta wave of the pandemic in India peaked in May 2021. Prior studies have concluded that true pandemic-related mortality in India greatly exceeds official counts. This study, using data directly from the first point of official death registration data recording, provides incontrovertible evidence of the high excess mortality in Gujarat from March 2020 to April 2021.
Collapse
Affiliation(s)
- Rolando J Acosta
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | | | - Caroline Buckee
- Center for Communicable Disease Dynamics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Mathew V Kiang
- Department of Epidemiology and Population Health, Stanford University School of Medicine, Palo Alto, California, United States of America
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
- Department of Data Science, Dana-Farber Cancer Institute, Boston, Massachusetts, United States of America
| | - Satchit Balsari
- Department of Emergency Medicine, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, Massachusetts, United States of America
- Department of Global Health and Population, Harvard T. H. Chan School of Public Health, Boston, Massachusetts, United States of America
| | - Ayesha Mahmud
- Department of Demography, University of California, Berkeley, California, United States of America
| |
Collapse
|
12
|
Schnell A, Huang L, Singer M, Singaraju A, Barilla RM, Regan BML, Bollhagen A, Thakore PI, Dionne D, Delorey TM, Pawlak M, Meyer Zu Horste G, Rozenblatt-Rosen O, Irizarry RA, Regev A, Kuchroo VK. Stem-like intestinal Th17 cells give rise to pathogenic effector T cells during autoimmunity. Cell 2021; 184:6281-6298.e23. [PMID: 34875227 PMCID: PMC8900676 DOI: 10.1016/j.cell.2021.11.018] [Citation(s) in RCA: 90] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/27/2021] [Revised: 10/13/2021] [Accepted: 11/11/2021] [Indexed: 12/24/2022]
Abstract
While intestinal Th17 cells are critical for maintaining tissue homeostasis, recent studies have implicated their roles in the development of extra-intestinal autoimmune diseases including multiple sclerosis. However, the mechanisms by which tissue Th17 cells mediate these dichotomous functions remain unknown. Here, we characterized the heterogeneity, plasticity, and migratory phenotypes of tissue Th17 cells in vivo by combined fate mapping with profiling of the transcriptomes and TCR clonotypes of over 84,000 Th17 cells at homeostasis and during CNS autoimmune inflammation. Inter- and intra-organ single-cell analyses revealed a homeostatic, stem-like TCF1+ IL-17+ SLAMF6+ population that traffics to the intestine where it is maintained by the microbiota, providing a ready reservoir for the IL-23-driven generation of encephalitogenic GM-CSF+ IFN-γ+ CXCR6+ T cells. Our study defines a direct in vivo relationship between IL-17+ non-pathogenic and GM-CSF+ and IFN-γ+ pathogenic Th17 populations and provides a mechanism by which homeostatic intestinal Th17 cells direct extra-intestinal autoimmune disease.
Collapse
Affiliation(s)
- Alexandra Schnell
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA; Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Linglin Huang
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Meromit Singer
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Anvita Singaraju
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Rocky M Barilla
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Brianna M L Regan
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Alina Bollhagen
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA; German Cancer Research Center, DKFZ, Heidelberg 69120, Germany
| | - Pratiksha I Thakore
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Danielle Dionne
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Toni M Delorey
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Mathias Pawlak
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA; Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Gerd Meyer Zu Horste
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Orit Rozenblatt-Rosen
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
| | - Aviv Regev
- Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| | - Vijay K Kuchroo
- Evergrande Center for Immunologic Diseases, Harvard Medical School and Brigham and Women's Hospital, Boston, MA 02115, USA; Klarman Cell Observatory, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
13
|
Teng M, Du D, Chen D, Irizarry RA. Characterizing batch effects and binding site-specific variability in ChIP-seq data. NAR Genom Bioinform 2021; 3:lqab098. [PMID: 34661103 PMCID: PMC8515842 DOI: 10.1093/nargab/lqab098] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/16/2021] [Revised: 09/15/2021] [Accepted: 10/05/2021] [Indexed: 11/12/2022] Open
Abstract
Multiple sources of variability can bias ChIP-seq data toward inferring transcription factor (TF) binding profiles. As ChIP-seq datasets increase in public repositories, it is now possible and necessary to account for complex sources of variability in ChIP-seq data analysis. We find that two types of variability, the batch effects by sequencing laboratories and differences between biological replicates, not associated with changes in condition or state, vary across genomic sites. This implies that observed differences between samples from different conditions or states, such as cell-type, must be assessed statistically, with an understanding of the distribution of obscuring noise. We present a statistical approach that characterizes both differences of interests and these source of variability through the parameters of a mixed effects model. We demonstrate the utility of our approach on a CTCF binding dataset composed of 211 samples representing 90 different cell-types measured across three different laboratories. The results revealed that sites exhibiting large variability were associated with sequence characteristics such as GC-content and low complexity. Finally, we identified TFs associated with high-variance CTCF sites using TF motifs documented in public databases, pointing the possibility of these being false positives if the sources of variability are not properly accounted for.
Collapse
Affiliation(s)
- Mingxiang Teng
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL 33612, USA
| | - Dongliang Du
- Department of Biostatistics and Bioinformatics, Moffitt Cancer Center, Tampa, FL 33612, USA
| | - Danfeng Chen
- Lewis-Sigler Institute for Integrative Genomics, Princeton University, Princeton, NJ 08544, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| |
Collapse
|
14
|
Islam N, Shkolnikov VM, Acosta RJ, Klimkin I, Kawachi I, Irizarry RA, Alicandro G, Khunti K, Yates T, Jdanov DA, White M, Lewington S, Lacey B. Excess deaths associated with covid-19 pandemic in 2020: age and sex disaggregated time series analysis in 29 high income countries. BMJ 2021; 373:n1137. [PMID: 34011491 PMCID: PMC8132017 DOI: 10.1136/bmj.n1137] [Citation(s) in RCA: 214] [Impact Index Per Article: 71.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 04/29/2021] [Indexed: 02/02/2023]
Abstract
OBJECTIVE To estimate the direct and indirect effects of the covid-19 pandemic on mortality in 2020 in 29 high income countries with reliable and complete age and sex disaggregated mortality data. DESIGN Time series study of high income countries. SETTING Austria, Belgium, Czech Republic, Denmark, England and Wales, Estonia, Finland, France, Germany, Greece, Hungary, Israel, Italy, Latvia, Lithuania, the Netherlands, New Zealand, Northern Ireland, Norway, Poland, Portugal, Scotland, Slovakia, Slovenia, South Korea, Spain, Sweden, Switzerland, and United States. PARTICIPANTS Mortality data from the Short-term Mortality Fluctuations data series of the Human Mortality Database for 2016-20, harmonised and disaggregated by age and sex. INTERVENTIONS Covid-19 pandemic and associated policy measures. MAIN OUTCOME MEASURES Weekly excess deaths (observed deaths versus expected deaths predicted by model) in 2020, by sex and age (0-14, 15-64, 65-74, 75-84, and ≥85 years), estimated using an over-dispersed Poisson regression model that accounts for temporal trends and seasonal variability in mortality. RESULTS An estimated 979 000 (95% confidence interval 954 000 to 1 001 000) excess deaths occurred in 2020 in the 29 high income countries analysed. All countries had excess deaths in 2020, except New Zealand, Norway, and Denmark. The five countries with the highest absolute number of excess deaths were the US (458 000, 454 000 to 461 000), Italy (89 100, 87 500 to 90 700), England and Wales (85 400, 83 900 to 86 800), Spain (84 100, 82 800 to 85 300), and Poland (60 100, 58 800 to 61 300). New Zealand had lower overall mortality than expected (-2500, -2900 to -2100). In many countries, the estimated number of excess deaths substantially exceeded the number of reported deaths from covid-19. The highest excess death rates (per 100 000) in men were in Lithuania (285, 259 to 311), Poland (191, 184 to 197), Spain (179, 174 to 184), Hungary (174, 161 to 188), and Italy (168, 163 to 173); the highest rates in women were in Lithuania (210, 185 to 234), Spain (180, 175 to 185), Hungary (169, 156 to 182), Slovenia (158, 132 to 184), and Belgium (151, 141 to 162). Little evidence was found of subsequent compensatory reductions following excess mortality. CONCLUSION Approximately one million excess deaths occurred in 2020 in these 29 high income countries. Age standardised excess death rates were higher in men than women in almost all countries. Excess deaths substantially exceeded reported deaths from covid-19 in many countries, indicating that determining the full impact of the pandemic on mortality requires assessment of excess deaths. Many countries had lower deaths than expected in children <15 years. Sex inequality in mortality widened further in most countries in 2020.
Collapse
Affiliation(s)
- Nazrul Islam
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Vladimir M Shkolnikov
- Max Planck Institute for Demographic Research, Rostock, Germany
- International Laboratory for Population and Health, National Research University Higher School of Economics, Moscow, Russian Federation
| | - Rolando J Acosta
- Department of Biostatistics, Harvard T H Chan School of Public, Harvard University, Boston, MA, USA
| | - Ilya Klimkin
- International Laboratory for Population and Health, National Research University Higher School of Economics, Moscow, Russian Federation
| | - Ichiro Kawachi
- Department of Social and Behavioral Sciences, Harvard T H Chan School of Public Health, Harvard University, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T H Chan School of Public, Harvard University, Boston, MA, USA
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Gianfranco Alicandro
- Department of Pathophysiology and Transplantation, Università degli Studi di Milano, Milan, Italy
| | - Kamlesh Khunti
- Diabetes Research Centre, University of Leicester, Leicester, UK
- NIHR Applied Research Collaboration-East Midlands, Leicester General Hospital, Leicester, UK
| | - Tom Yates
- Diabetes Research Centre, University of Leicester, Leicester, UK
- NIHR Leicester Biomedical Research Centre, University Hospitals of Leicester NHS Trust and University of Leicester, Leicester, UK
| | - Dmitri A Jdanov
- Max Planck Institute for Demographic Research, Rostock, Germany
- International Laboratory for Population and Health, National Research University Higher School of Economics, Moscow, Russian Federation
| | - Martin White
- MRC Epidemiology Unit, University of Cambridge, Cambridge, UK
| | - Sarah Lewington
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
- MRC Population Heath Research Unit, Nuffield Department of Population Health, University of Oxford, Oxford, UK
| | - Ben Lacey
- Clinical Trial Service Unit and Epidemiological Studies Unit (CTSU), Nuffield Department of Population Health, University of Oxford, Oxford, UK
| |
Collapse
|
15
|
Braun DA, Street K, Burke KP, Cookmeyer DL, Denize T, Pedersen CB, Gohil SH, Schindler N, Pomerance L, Hirsch L, Bakouny Z, Hou Y, Forman J, Huang T, Li S, Cui A, Keskin DB, Steinharter J, Bouchard G, Sun M, Pimenta EM, Xu W, Mahoney KM, McGregor BA, Hirsch MS, Chang SL, Livak KJ, McDermott DF, Shukla SA, Olsen LR, Signoretti S, Sharpe AH, Irizarry RA, Choueiri TK, Wu CJ. Progressive immune dysfunction with advancing disease stage in renal cell carcinoma. Cancer Cell 2021; 39:632-648.e8. [PMID: 33711273 PMCID: PMC8138872 DOI: 10.1016/j.ccell.2021.02.013] [Citation(s) in RCA: 204] [Impact Index Per Article: 68.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Received: 07/17/2020] [Revised: 10/19/2020] [Accepted: 02/17/2021] [Indexed: 02/06/2023]
Abstract
The tumor immune microenvironment plays a critical role in cancer progression and response to immunotherapy in clear cell renal cell carcinoma (ccRCC), yet the composition and phenotypic states of immune cells in this tumor are incompletely characterized. We performed single-cell RNA and T cell receptor sequencing on 164,722 individual cells from tumor and adjacent non-tumor tissue in patients with ccRCC across disease stages: early, locally advanced, and advanced/metastatic. Terminally exhausted CD8+ T cells were enriched in metastatic disease and were restricted in T cell receptor diversity. Within the myeloid compartment, pro-inflammatory macrophages were decreased, and suppressive M2-like macrophages were increased in advanced disease. Terminally exhausted CD8+ T cells and M2-like macrophages co-occurred in advanced disease and expressed ligands and receptors that support T cell dysfunction and M2-like polarization. This immune dysfunction circuit is associated with a worse prognosis in external cohorts and identifies potentially targetable immune inhibitory pathways in ccRCC.
Collapse
Affiliation(s)
- David A Braun
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Kelly Street
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Kelly P Burke
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA; Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA 02215, USA
| | - David L Cookmeyer
- Harvard Medical School, Boston, MA 02215, USA; Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA 02215, USA
| | - Thomas Denize
- Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Christina B Pedersen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark; Center for Genomic Medicine, Rigshospitalet - Copenhagen University Hospital, Copenhagen, Denmark
| | - Satyen H Gohil
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Department of Academic Hematology, University College London, London, UK
| | - Nicholas Schindler
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Lucas Pomerance
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Laure Hirsch
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Ziad Bakouny
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Yue Hou
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Juliet Forman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Teddy Huang
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Shuqiang Li
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Ang Cui
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Harvard-MIT Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA
| | - Derin B Keskin
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - John Steinharter
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Gabrielle Bouchard
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Maxine Sun
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Erica M Pimenta
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Wenxin Xu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Kathleen M Mahoney
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA; Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Bradley A McGregor
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Michelle S Hirsch
- Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Steven L Chang
- Harvard Medical School, Boston, MA 02215, USA; Division of Urologic Surgery, Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Kenneth J Livak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - David F McDermott
- Harvard Medical School, Boston, MA 02215, USA; Division of Medical Oncology, Beth Israel Deaconess Medical Center, Boston, MA 02215, USA
| | - Sachet A Shukla
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA; Translational Immunogenomics Lab, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Lars R Olsen
- Section for Bioinformatics, Department of Health Technology, Technical University of Denmark, Kongens Lyngby, Denmark; Center for Genomic Medicine, Rigshospitalet - Copenhagen University Hospital, Copenhagen, Denmark
| | - Sabina Signoretti
- Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA; Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA 02215, USA
| | - Arlene H Sharpe
- Harvard Medical School, Boston, MA 02215, USA; Department of Immunology, Blavatnik Institute, Harvard Medical School, Boston, MA 02215, USA; Department of Pathology, Brigham and Women's Hospital, Boston, MA 02115, USA; Evergrande Center for Immunologic Diseases, Harvard Medical School & Brigham and Women's Hospital, Boston, MA 02115, USA
| | - Rafael A Irizarry
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02215, USA
| | - Toni K Choueiri
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA
| | - Catherine J Wu
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA 02215, USA; Harvard Medical School, Boston, MA 02215, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.
| |
Collapse
|
16
|
Abstract
Population displacement may occur after natural disasters, permanently altering the demographic composition of the affected regions. Measuring this displacement is vital for both optimal postdisaster resource allocation and calculation of measures of public health interest such as mortality estimates. Here, we analyzed data generated by mobile phones and social media to estimate the weekly island-wide population at risk and within-island geographic heterogeneity of migration in Puerto Rico after Hurricane Maria. We compared these two data sources with population estimates derived from air travel records and census data. We observed a loss of population across all data sources throughout the study period; however, the magnitude and dynamics differ by the data source. Census data predict a population loss of just over 129,000 from July 2017 to July 2018, a 4% decrease; air travel data predict a population loss of 168,295 for the same period, a 5% decrease; mobile phone-based estimates predict a loss of 235,375 from July 2017 to May 2018, an 8% decrease; and social media-based estimates predict a loss of 476,779 from August 2017 to August 2018, a 17% decrease. On average, municipalities with a smaller population size lost a bigger proportion of their population. Moreover, we infer that these municipalities experienced greater infrastructure damage as measured by the proportion of unknown locations stemming from these regions. Finally, our analysis measures a general shift of population from rural to urban centers within the island. Passively collected data provide a promising supplement to current at-risk population estimation procedures; however, each data source has its own biases and limitations.
Collapse
Affiliation(s)
- Rolando J Acosta
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| | - Nishant Kishore
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA 02115
- Department of Data Science, Dana-Farber Cancer Institute, Boston, MA 02215
| | - Caroline O Buckee
- Center for Communicable Disease Dynamics, Department of Epidemiology, Harvard T. H. Chan School of Public Health, Boston, MA 02115;
| |
Collapse
|
17
|
Abstract
As of mid-August 2020, more than 170 000 U.S. residents have died of coronavirus disease 2019 (COVID-19); however, the true number of deaths resulting from COVID-19, both directly and indirectly, is likely to be much higher. The proper attribution of deaths to this pandemic has a range of societal, legal, mortuary, and public health consequences. This article discusses the current difficulties of disaster death attribution and describes the strengths and limitations of relying on death counts from death certificates, estimations of indirect deaths, and estimations of excess mortality. Improving the tabulation of direct and indirect deaths on death certificates will require concerted efforts and consensus across medical institutions and public health agencies. In addition, actionable estimates of excess mortality will require timely access to standardized and structured vital registry data, which should be shared directly at the state level to ensure rapid response for local governments. Correct attribution of direct and indirect deaths and estimation of excess mortality are complementary goals that are critical to our understanding of the pandemic and its effect on human life.
Collapse
Affiliation(s)
- Mathew V Kiang
- Stanford University School of Medicine, Stanford, California (M.V.K.)
| | | | - Caroline O Buckee
- Harvard T.H. Chan School of Public Health, Boston, Massachusetts (C.O.B.)
| | | |
Collapse
|
18
|
Johnstone SE, Reyes A, Qi Y, Adriaens C, Hegazi E, Pelka K, Chen JH, Zou LS, Drier Y, Hecht V, Shoresh N, Selig MK, Lareau CA, Iyer S, Nguyen SC, Joyce EF, Hacohen N, Irizarry RA, Zhang B, Aryee MJ, Bernstein BE. Large-Scale Topological Changes Restrain Malignant Progression in Colorectal Cancer. Cell 2020; 182:1474-1489.e23. [PMID: 32841603 PMCID: PMC7575124 DOI: 10.1016/j.cell.2020.07.030] [Citation(s) in RCA: 92] [Impact Index Per Article: 23.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/08/2019] [Revised: 05/04/2020] [Accepted: 07/20/2020] [Indexed: 02/06/2023]
Abstract
Widespread changes to DNA methylation and chromatin are well documented in cancer, but the fate of higher-order chromosomal structure remains obscure. Here we integrated topological maps for colon tumors and normal colons with epigenetic, transcriptional, and imaging data to characterize alterations to chromatin loops, topologically associated domains, and large-scale compartments. We found that spatial partitioning of the open and closed genome compartments is profoundly compromised in tumors. This reorganization is accompanied by compartment-specific hypomethylation and chromatin changes. Additionally, we identify a compartment at the interface between the canonical A and B compartments that is reorganized in tumors. Remarkably, similar shifts were evident in non-malignant cells that have accumulated excess divisions. Our analyses suggest that these topological changes repress stemness and invasion programs while inducing anti-tumor immunity genes and may therefore restrain malignant progression. Our findings call into question the conventional view that tumor-associated epigenomic alterations are primarily oncogenic.
Collapse
Affiliation(s)
- Sarah E Johnstone
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Alejandro Reyes
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Department of Data Sciences, Dana Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| | - Yifeng Qi
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Carmen Adriaens
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Esmat Hegazi
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Karin Pelka
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Jonathan H Chen
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Luli S Zou
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Department of Data Sciences, Dana Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| | - Yotam Drier
- The Lautenberg Center for Immunology and Cancer Research, The Hebrew University, Jerusalem, Israel
| | - Vivian Hecht
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Noam Shoresh
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA
| | - Martin K Selig
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Caleb A Lareau
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02215, USA
| | - Sowmya Iyer
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA
| | - Son C Nguyen
- Department of Genetics, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Eric F Joyce
- Department of Genetics, Penn Epigenetics Institute, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Nir Hacohen
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA
| | - Rafael A Irizarry
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Department of Data Sciences, Dana Farber Cancer Institute, Boston, MA 02215, USA; Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA
| | - Bin Zhang
- Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Department of Chemistry, Massachusetts Institute of Technology, Cambridge, MA 02139, USA
| | - Martin J Aryee
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA; Department of Biostatistics, Harvard School of Public Health, Boston, MA 02215, USA.
| | - Bradley E Bernstein
- Department of Pathology, Massachusetts General Hospital and Harvard Medical School, Boston, MA 02114, USA; Broad Institute of MIT and Harvard, Cambridge, MA 02139, USA; Center for Cancer Research, Massachusetts General Hospital, Boston, MA 02129, USA.
| |
Collapse
|
19
|
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Author Correction: Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2020; 21:179. [PMID: 32698904 PMCID: PMC7374840 DOI: 10.1186/s13059-020-02109-w] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/11/2022] Open
Affiliation(s)
- F William Townes
- Department of Biostatistics, Harvard University, Cambridge, MA, USA.,Present Address: Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Martin J Aryee
- Department of Biostatistics, Harvard University, Cambridge, MA, USA.,Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA, USA.,Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA, USA.,Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA, USA. .,Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
20
|
Abstract
Single-cell RNA-seq (scRNA-seq) profiles gene expression of individual cells. Unique molecular identifiers (UMIs) remove duplicates in read counts resulting from polymerase chain reaction, a major source of noise. For scRNA-seq data lacking UMIs, we propose quasi-UMIs: quantile normalization of read counts to a compound Poisson distribution empirically derived from UMI datasets. When applied to ground-truth datasets having both reads and UMIs, quasi-UMI normalization has higher accuracy than competing methods. Using quasi-UMIs enables methods designed specifically for UMI data to be applied to non-UMI scRNA-seq datasets.
Collapse
Affiliation(s)
- F. William Townes
- Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Rafael A. Irizarry
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
- Department of Biostatistics, Harvard University, Cambridge, MA USA
| |
Collapse
|
21
|
Nuzzo PV, Berchuck JE, Korthauer K, Spisak S, Nassar AH, Abou Alaiwi S, Chakravarthy A, Shen SY, Bakouny Z, Boccardo F, Steinharter J, Bouchard G, Curran CR, Pan W, Baca SC, Seo JH, Lee GSM, Michaelson MD, Chang SL, Waikar SS, Sonpavde G, Irizarry RA, Pomerantz M, De Carvalho DD, Choueiri TK, Freedman ML. Detection of renal cell carcinoma using plasma and urine cell-free DNA methylomes. Nat Med 2020; 26:1041-1043. [PMID: 32572266 DOI: 10.1038/s41591-020-0933-1] [Citation(s) in RCA: 126] [Impact Index Per Article: 31.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/25/2019] [Accepted: 05/08/2020] [Indexed: 12/24/2022]
Abstract
Improving early cancer detection has the potential to substantially reduce cancer-related mortality. Cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMeDIP-seq) is a highly sensitive assay capable of detecting early-stage tumors. We report accurate classification of patients across all stages of renal cell carcinoma (RCC) in plasma (area under the receiver operating characteristic (AUROC) curve of 0.99) and demonstrate the validity of this assay to identify patients with RCC using urine cell-free DNA (cfDNA; AUROC of 0.86).
Collapse
Affiliation(s)
- Pier Vitale Nuzzo
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Department of Internal Medicine and Medical Specialties, School of Medicine, University of Genoa, Genoa, Italy
| | - Jacob E Berchuck
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Keegan Korthauer
- Department of Statistics, University of British Columbia, Vancouver, British Columbia, Canada.,BC Children's Hospital Research Institute, Vancouver, British Columbia, Canada
| | - Sandor Spisak
- Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Amin H Nassar
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Sarah Abou Alaiwi
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Ankur Chakravarthy
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Shu Yi Shen
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada
| | - Ziad Bakouny
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Francesco Boccardo
- Department of Internal Medicine and Medical Specialties, School of Medicine, University of Genoa, Genoa, Italy.,Academic Unit of Medical Oncology, IRCCS San Martino Polyclinic Hospital, Genoa, Italy
| | - John Steinharter
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Gabrielle Bouchard
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Catherine R Curran
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Wenting Pan
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Sylvan C Baca
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,The Eli and Edythe L. Broad Institute, Cambridge, MA, USA
| | - Ji-Heui Seo
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Gwo-Shu Mary Lee
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - M Dror Michaelson
- Massachusetts General Hospital Cancer Center, Hematology/Oncology, Boston, MA, USA
| | - Steven L Chang
- Division of Urology, Brigham and Women's Hospital, Boston, MA, USA
| | - Sushrut S Waikar
- Division of Renal Medicine, Brigham and Women's Hospital, Boston, MA, USA.,Section of Nephrology, Boston University Medical Center, Boston, MA, USA
| | - Guru Sonpavde
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA, USA.,Department of Data Sciences, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Mark Pomerantz
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Daniel D De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, Ontario, Canada.,Department of Medical Biophysics, University of Toronto, Toronto, Ontario, Canada
| | - Toni K Choueiri
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. .,The Eli and Edythe L. Broad Institute, Cambridge, MA, USA.
| | - Matthew L Freedman
- Department of Medical Oncology, The Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. .,Center for Functional Cancer Epigenetics, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA. .,The Eli and Edythe L. Broad Institute, Cambridge, MA, USA.
| |
Collapse
|
22
|
Nuzzo PV, Berchuck JE, Spisak S, Korthauer K, Nassar A, Abou Alaiwi S, Chakravarthy A, Shen SY, Bakouny Z, Boccardo F, Baca S, Lee GSM, Chang SL, Waikar S, Sonpavde G, Irizarry RA, Pomerantz M, De Carvalho D, Freedman ML, Choueiri TK. Sensitive detection of renal cell carcinoma using plasma and urine cell-free DNA methylomes. J Clin Oncol 2020. [DOI: 10.1200/jco.2020.38.6_suppl.728] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/20/2022] Open
Abstract
728 Background: Improving early cancer detection has the potential to significantly reduce cancer-related mortality. Cell-free methylated DNA immunoprecipitation and high-throughput sequencing (cfMedDIP-seq) is a highly sensitive, low-input, cost-efficient and bisulfite-free assay capable of detecting and classifying various tumor types. We tested the feasibility of cfMeDIP-seq to detect RCC in plasma samples and, for the first time, in urine cell-free DNA (cfDNA), with an emphasis on early-stage disease. Methods: We performed cfMeDIP-seq on 117 samples (72 plasma and 45 urine samples): 68 stage I-IV RCC cases pre-nephrectomy, 21 stage IV urothelial bladder cancer (UBC) plasma samples from 15 patients, and 28 healthy cancer-free controls. 60.5% of plasma samples and 66.7% of urine samples came from patients with TNM Stage I/II disease. cfDNA was immunoprecipitated and enriched using an antibody targeting 5-methylcytosine and amplified to create a sequence-ready library. The top differentially methylated regions (DMRs) which partitioned RCC and control samples or UBC were used to train a regularized binomial generalized linear model using 80% of the samples as a training set. The 20% of withheld test samples were then assigned a probability of being RCC or control. This process was repeated 100 times. This was performed using both plasma and urine cfDNA samples. Results: We identified 89,799 DMRs in plasma samples and 38,462 DMRs in urine samples. Iterative training and classification of held out samples, using the 300 DMRs which partitioned RCC and control samples, resulted in a mean AUROC of 0.990 (95% CI 0.984-0.997) in plasma samples and 0.791 (95% CI 0.759-0.823) in urine samples. Classification performance between tumor types was evaluated comparing plasma cfDNA from patients with RCC and UBC, resulting in a mean AUROC of 0.954 (95% CI 0.940-0.969). Conclusions: cfMeDIP-seq is a powerful tool for genome-wide discovery of non-invasive DNA methylation biomarkers. This is the first independent validation of plasma cfMeDIP-seq, demonstrating near-perfect classification of RCC in a cohort enriched for patients with early-stage disease and the potential of urine cfDNA methylome-based biomarkers for cancer detection.
Collapse
Affiliation(s)
| | - Jacob E Berchuck
- Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Sandor Spisak
- Department of Medical Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Keegan Korthauer
- Department of Statistics The University of British Columbia, Vancouver, BC, Canada
| | | | - Sarah Abou Alaiwi
- Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Ankur Chakravarthy
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Shu Yi Shen
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Ziad Bakouny
- Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Boston, MA
| | | | | | | | - Steven Lee Chang
- Division of Urological Surgery, Brigham and Women's Hospital, Boston, MA
| | | | | | | | | | - Daniel De Carvalho
- Princess Margaret Cancer Centre, University Health Network, Toronto, ON, Canada
| | - Matthew L. Freedman
- Lank Center for Genitourinary Oncology, Dana-Farber Cancer Institute, Boston, MA
| | - Toni K. Choueiri
- Dana-Farber Cancer Institute/Brigham and Women’s Hospital and Harvard University School of Medicine, Boston, MA
| |
Collapse
|
23
|
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20:295. [PMID: 31870412 PMCID: PMC6927135 DOI: 10.1186/s13059-019-1861-6] [Citation(s) in RCA: 186] [Impact Index Per Article: 37.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 12/23/2022] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.
Collapse
Affiliation(s)
- F. William Townes
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Present Address: Department of Computer Science, Princeton University, Princeton, NJ USA
| | - Stephanie C. Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD USA
| | - Martin J. Aryee
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA USA
- Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA USA
- Department of Pathology, Harvard Medical School, Boston, MA USA
| | - Rafael A. Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA USA
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA USA
| |
Collapse
|
24
|
Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol 2019; 20:295. [PMID: 31870412 DOI: 10.1101/574574] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 03/12/2019] [Accepted: 10/15/2019] [Indexed: 05/24/2023] Open
Abstract
Single-cell RNA-Seq (scRNA-Seq) profiles gene expression of individual cells. Recent scRNA-Seq datasets have incorporated unique molecular identifiers (UMIs). Using negative controls, we show UMI counts follow multinomial sampling with no zero inflation. Current normalization procedures such as log of counts per million and feature selection by highly variable genes produce false variability in dimension reduction. We propose simple multinomial methods, including generalized principal component analysis (GLM-PCA) for non-normal distributions, and feature selection using deviance. These methods outperform the current practice in a downstream clustering assessment using ground truth datasets.
Collapse
Affiliation(s)
- F William Townes
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
- Present Address: Department of Computer Science, Princeton University, Princeton, NJ, USA
| | - Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins University, Baltimore, MD, USA
| | - Martin J Aryee
- Department of Biostatistics, Harvard University, Cambridge, MA, USA
- Molecular Pathology Unit, Massachusetts General Hospital, Charlestown, MA, USA
- Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA, USA
- Department of Pathology, Harvard Medical School, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard University, Cambridge, MA, USA.
- Department of Data Sciences, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
25
|
Hicks SC, Irizarry RA. methylCC: technology-independent estimation of cell type composition using differentially methylated regions. Genome Biol 2019; 20:261. [PMID: 31783894 PMCID: PMC6883691 DOI: 10.1186/s13059-019-1827-8] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/17/2019] [Accepted: 09/19/2019] [Indexed: 01/01/2023] Open
Abstract
A major challenge in the analysis of DNA methylation (DNAm) data is variability introduced from intra-sample cellular heterogeneity, such as whole blood which is a convolution of DNAm profiles across a unique cell type. When this source of variability is confounded with an outcome of interest, if unaccounted for, false positives ensue. Current methods to estimate the cell type proportions in whole blood DNAm samples are only appropriate for one technology and lead to technology-specific biases if applied to data generated from other technologies. Here, we propose the technology-independent alternative: methylCC, which is available at https://github.com/stephaniehicks/methylCC.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, 615 N Wolfe St,, Baltimore, USA
| | - Rafael A Irizarry
- Department Data Sciences, Dana-Farber Cancer Institute, 450 Brookline Ave,, Boston, USA. .,Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, USA.
| |
Collapse
|
26
|
Korthauer K, Chakraborty S, Benjamini Y, Irizarry RA. Detection and accurate false discovery rate control of differentially methylated regions from whole genome bisulfite sequencing. Biostatistics 2019; 20:367-383. [PMID: 29481604 PMCID: PMC6587918 DOI: 10.1093/biostatistics/kxy007] [Citation(s) in RCA: 75] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/31/2017] [Accepted: 01/21/2018] [Indexed: 12/22/2022] Open
Abstract
With recent advances in sequencing technology, it is now feasible to measure DNA methylation at tens of millions of sites across the entire genome. In most applications, biologists are interested in detecting differentially methylated regions, composed of multiple sites with differing methylation levels among populations. However, current computational approaches for detecting such regions do not provide accurate statistical inference. A major challenge in reporting uncertainty is that a genome-wide scan is involved in detecting these regions, which needs to be accounted for. A further challenge is that sample sizes are limited due to the costs associated with the technology. We have developed a new approach that overcomes these challenges and assesses uncertainty for differentially methylated regions in a rigorous manner. Region-level statistics are obtained by fitting a generalized least squares regression model with a nested autoregressive correlated error structure for the effect of interest on transformed methylation proportions. We develop an inferential approach, based on a pooled null distribution, that can be implemented even when as few as two samples per population are available. Here, we demonstrate the advantages of our method using both experimental data and Monte Carlo simulation. We find that the new method improves the specificity and sensitivity of lists of regions and accurately controls the false discovery rate.
Collapse
Affiliation(s)
- Keegan Korthauer
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| | - Sutirtha Chakraborty
- Novartis, Inorbit Mall Rd, Silpa Gram Craft Village, HITEC City, Hyderabad, Telangana, India
| | - Yuval Benjamini
- The Statistics Department, Hebrew University, Mount Scopus, Jerusalem, Israel
| | - Rafael A Irizarry
- Department of Biostatistics & Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA, USA
| |
Collapse
|
27
|
Hicks SC, Okrah K, Paulson JN, Quackenbush J, Irizarry RA, Bravo HC. Smooth quantile normalization. Biostatistics 2019; 19:185-198. [PMID: 29036413 DOI: 10.1093/biostatistics/kxx028] [Citation(s) in RCA: 56] [Impact Index Per Article: 11.2] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/01/2016] [Accepted: 05/07/2017] [Indexed: 11/14/2022] Open
Abstract
Between-sample normalization is a critical step in genomic data analysis to remove systematic bias and unwanted technical variation in high-throughput data. Global normalization methods are based on the assumption that observed variability in global properties is due to technical reasons and are unrelated to the biology of interest. For example, some methods correct for differences in sequencing read counts by scaling features to have similar median values across samples, but these fail to reduce other forms of unwanted technical variation. Methods such as quantile normalization transform the statistical distributions across samples to be the same and assume global differences in the distribution are induced by only technical variation. However, it remains unclear how to proceed with normalization if these assumptions are violated, for example, if there are global differences in the statistical distributions between biological conditions or groups, and external information, such as negative or control features, is not available. Here, we introduce a generalization of quantile normalization, referred to as smooth quantile normalization (qsmooth), which is based on the assumption that the statistical distribution of each sample should be the same (or have the same distributional shape) within biological groups or conditions, but allowing that they may differ between groups. We illustrate the advantages of our method on several high-throughput datasets with global differences in distributions corresponding to different biological conditions. We also perform a Monte Carlo simulation study to illustrate the bias-variance tradeoff and root mean squared error of qsmooth compared to other global normalization methods. A software implementation is available from https://github.com/stephaniehicks/qsmooth.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Kwame Okrah
- Genetech, Product Development Biostatistics, 1 DNA Way, South San Francisco, CA 94080, USA
| | - Joseph N Paulson
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - John Quackenbush
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Ave, Boston, MA 02215, USA and Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA
| | - Héctor Corrada Bravo
- Department of Computer Science, University of Maryland, College Park, USA and Center for Bioinformatics and Computational Biology, Institute of Advanced Computer Studies, University of Maryland, 8314 Paint Branch Dr., College Park, MD 20742, College Park, USA
| |
Collapse
|
28
|
McCorkindale AL, Wahle P, Werner S, Jungreis I, Menzel P, Shukla CJ, Abreu RLP, Irizarry RA, Meyer IM, Kellis M, Zinzen RP. A gene expression atlas of embryonic neurogenesis in Drosophila reveals complex spatiotemporal regulation of lncRNAs. Development 2019; 146:dev.175265. [PMID: 30923056 PMCID: PMC6451322 DOI: 10.1242/dev.175265] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/27/2018] [Accepted: 02/05/2019] [Indexed: 01/09/2023]
Abstract
Cell type specification during early nervous system development in Drosophila melanogaster requires precise regulation of gene expression in time and space. Resolving the programs driving neurogenesis has been a major challenge owing to the complexity and rapidity with which distinct cell populations arise. To resolve the cell type-specific gene expression dynamics in early nervous system development, we have sequenced the transcriptomes of purified neurogenic cell types across consecutive time points covering crucial events in neurogenesis. The resulting gene expression atlas comprises a detailed resource of global transcriptome dynamics that permits systematic analysis of how cells in the nervous system acquire distinct fates. We resolve known gene expression dynamics and uncover novel expression signatures for hundreds of genes among diverse neurogenic cell types, most of which remain unstudied. We also identified a set of conserved long noncoding RNAs (lncRNAs) that are regulated in a tissue-specific manner and exhibit spatiotemporal expression during neurogenesis with exquisite specificity. lncRNA expression is highly dynamic and demarcates specific subpopulations within neurogenic cell types. Our spatiotemporal transcriptome atlas provides a comprehensive resource for investigating the function of coding genes and noncoding RNAs during crucial stages of early neurogenesis. Summary: DIV-MARIS, an adapted technique for examining stage- and cell type-specific gene expression, reveals a complex network of mRNAs and lncRNAs expressed in specific cell types during early Drosophila embryonic nervous system development.
Collapse
Affiliation(s)
- Alexandra L McCorkindale
- Laboratory for Systems Biology of Neural Tissue Differentiation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany .,Biofrontiers Institute, University of Colorado, Boulder, CO 80303, USA
| | - Philipp Wahle
- Laboratory for Systems Biology of Neural Tissue Differentiation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| | - Sascha Werner
- Laboratory for Systems Biology of Neural Tissue Differentiation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| | - Irwin Jungreis
- MIT Computer Science and Artificial Intelligence Laboratory, Cambridge, MA 02139, USA.,Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Peter Menzel
- Laboratory for Bioinformatics of RNA Structure and Transcriptome Regulation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| | - Chinmay J Shukla
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA 02138, USA.,Dana Farber Cancer Institute, Boston, MA 02215, USA
| | - Rúben Lopes Pereira Abreu
- Laboratory for Systems Biology of Neural Tissue Differentiation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| | | | - Irmtraud M Meyer
- Laboratory for Bioinformatics of RNA Structure and Transcriptome Regulation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany.,Freie Universität, Institute of Biochemistry, Department of Biology, Chemistry, Pharmacy, Thielallee 63, Berlin 14195, Germany
| | - Manolis Kellis
- Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA.,Laboratory for Bioinformatics of RNA Structure and Transcriptome Regulation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| | - Robert P Zinzen
- Laboratory for Systems Biology of Neural Tissue Differentiation, Berlin Institute for Medical Systems Biology (BIMSB), Max Delbrueck Centre for Molecular Medicine (MDC) in the Helmholtz Association, Robert-Roessle-Strasse 12, 13125 Berlin, Germany
| |
Collapse
|
29
|
Abstract
Demand for data science education is surging and traditional courses offered by statistics departments are not meeting the needs of those seeking training. This has led to a number of opinion pieces advocating for an update to the Statistics curriculum. The unifying recommendation is that computing should play a more prominent role. We strongly agree with this recommendation, but advocate the main priority is to bring applications to the forefront as proposed by Nolan and Speed (1999). We also argue that the individuals tasked with developing data science courses should not only have statistical training, but also have experience analyzing data with the main objective of solving real-world problems. Here, we share a set of general principles and offer a detailed guide derived from our successful experience developing and teaching a graduate-level, introductory data science course centered entirely on case studies. We argue for the importance of statistical thinking, as defined by Wild and Pfannkuch (1999) and describe how our approach teaches students three key skills needed to succeed in data science, which we refer to as creating, connecting, and computing. This guide can also be used for statisticians wanting to gain more practical knowledge about data science before embarking on teaching an introductory course.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA.,Department of Biostatistics, Harvard School of Public Health, Boston, MA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA.,Department of Biostatistics, Harvard School of Public Health, Boston, MA
| |
Collapse
|
30
|
Benjamini Y, Taylor J, Irizarry RA. Selection-Corrected Statistical Inference for Region Detection With High-Throughput Assays. J Am Stat Assoc 2018; 114:1351-1365. [DOI: 10.1080/01621459.2018.1498347] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/28/2022]
Affiliation(s)
- Yuval Benjamini
- Department of Statistics, Hebrew University of Jerusalem, Jerusalem, Israel
| | | | - Rafael A. Irizarry
- Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute, Boston, MA
- Department of Biostatistics, Harvard University, Cambridge, MA
| |
Collapse
|
31
|
Hicks SC, Townes FW, Teng M, Irizarry RA. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics 2018; 19:562-578. [PMID: 29121214 PMCID: PMC6215955 DOI: 10.1093/biostatistics/kxx053] [Citation(s) in RCA: 272] [Impact Index Per Article: 45.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/03/2017] [Accepted: 09/13/2017] [Indexed: 12/26/2022] Open
Abstract
Until recently, high-throughput gene expression technology, such as RNA-Sequencing (RNA-seq) required hundreds of thousands of cells to produce reliable measurements. Recent technical advances permit genome-wide gene expression measurement at the single-cell level. Single-cell RNA-Seq (scRNA-seq) is the most widely used and numerous publications are based on data produced with this technology. However, RNA-seq and scRNA-seq data are markedly different. In particular, unlike RNA-seq, the majority of reported expression levels in scRNA-seq are zeros, which could be either biologically-driven, genes not expressing RNA at the time of measurement, or technically-driven, genes expressing RNA, but not at a sufficient level to be detected by sequencing technology. Another difference is that the proportion of genes reporting the expression level to be zero varies substantially across single cells compared to RNA-seq samples. However, it remains unclear to what extent this cell-to-cell variation is being driven by technical rather than biological variation. Furthermore, while systematic errors, including batch effects, have been widely reported as a major challenge in high-throughput technologies, these issues have received minimal attention in published studies based on scRNA-seq technology. Here, we use an assessment experiment to examine data from published studies and demonstrate that systematic errors can explain a substantial percentage of observed cell-to-cell expression variability. Specifically, we present evidence that some of these reported zeros are driven by technical variation by demonstrating that scRNA-seq produces more zeros than expected and that this bias is greater for lower expressed genes. In addition, this missing data problem is exacerbated by the fact that this technical variation varies cell-to-cell. Then, we show how this technical cell-to-cell variability can be confused with novel biological results. Finally, we demonstrate and discuss how batch-effects and confounded experiments can intensify the problem.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - F William Townes
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Mingxiang Teng
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| |
Collapse
|
32
|
Kishore N, Marqués D, Mahmud A, Kiang MV, Rodriguez I, Fuller A, Ebner P, Sorensen C, Racy F, Lemery J, Maas L, Leaning J, Irizarry RA, Balsari S, Buckee CO. Mortality in Puerto Rico after Hurricane Maria. N Engl J Med 2018; 379:162-170. [PMID: 29809109 DOI: 10.1056/nejmsa1803972] [Citation(s) in RCA: 192] [Impact Index Per Article: 32.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/19/2022]
Abstract
BACKGROUND Quantifying the effect of natural disasters on society is critical for recovery of public health services and infrastructure. The death toll can be difficult to assess in the aftermath of a major disaster. In September 2017, Hurricane Maria caused massive infrastructural damage to Puerto Rico, but its effect on mortality remains contentious. The official death count is 64. METHODS Using a representative, stratified sample, we surveyed 3299 randomly chosen households across Puerto Rico to produce an independent estimate of all-cause mortality after the hurricane. Respondents were asked about displacement, infrastructure loss, and causes of death. We calculated excess deaths by comparing our estimated post-hurricane mortality rate with official rates for the same period in 2016. RESULTS From the survey data, we estimated a mortality rate of 14.3 deaths (95% confidence interval [CI], 9.8 to 18.9) per 1000 persons from September 20 through December 31, 2017. This rate yielded a total of 4645 excess deaths during this period (95% CI, 793 to 8498), equivalent to a 62% increase in the mortality rate as compared with the same period in 2016. However, this number is likely to be an underestimate because of survivor bias. The mortality rate remained high through the end of December 2017, and one third of the deaths were attributed to delayed or interrupted health care. Hurricane-related migration was substantial. CONCLUSIONS This household-based survey suggests that the number of excess deaths related to Hurricane Maria in Puerto Rico is more than 70 times the official estimate. (Funded by the Harvard T.H. Chan School of Public Health and others.).
Collapse
Affiliation(s)
- Nishant Kishore
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Domingo Marqués
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Ayesha Mahmud
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Mathew V Kiang
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Irmary Rodriguez
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Arlan Fuller
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Peggy Ebner
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Cecilia Sorensen
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Fabio Racy
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Jay Lemery
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Leslie Maas
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Jennifer Leaning
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Rafael A Irizarry
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Satchit Balsari
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| | - Caroline O Buckee
- From the Departments of Epidemiology (N.K., A.M., C.O.B.), Social and Behavioral Sciences (M.V.K.), and Biostatistics (R.A.I.) and the Center for Communicable Disease Dynamics (N.K., A.M., C.O.B.) and the François-Xavier Bagnoud Center for Health and Human Rights (A.F., J. Leaning, S.B.), Harvard T.H. Chan School of Public Health, Harvard University, the Department of Emergency Medicine, Beth Israel Deaconess Medical Center and Harvard Medical School (F.R., S.B.), and the Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute (R.A.I.) - all in Boston; the Department of Psychology, Carlos Albizu University (D.M., I.R.), and the Puerto Rico Science, Technology, and Research Trust (L.M.) - both in San Juan; Keck School of Medicine, University of Southern California, Los Angeles (P.E.); and the Section of Wilderness and Environmental Medicine at the Department of Emergency Medicine, University of Colorado School of Medicine, Aurora (C.S., J. Lemery)
| |
Collapse
|
33
|
Zheng SC, Beck S, Jaffe AE, Koestler DC, Hansen KD, Houseman AE, Irizarry RA, Teschendorff AE. Correcting for cell-type heterogeneity in epigenome-wide association studies: revisiting previous analyses. Nat Methods 2018; 14:216-217. [PMID: 28245219 DOI: 10.1038/nmeth.4187] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/05/2023]
Affiliation(s)
- Shijie C Zheng
- CAS Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,University of Chinese Academy of Sciences, Beijing, China
| | - Stephan Beck
- Medical Genomics, UCL Cancer Institute, University College London, London, UK
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, Maryland, USA
| | - Devin C Koestler
- Department of Biostatistics, University of Kansas Medical Center, Kansas City, Kansas, USA
| | - Kasper D Hansen
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.,McKusick-Nathans Institute of Genetic Medicine, Johns Hopkins University School of Medicine, Baltimore, Maryland, USA.,Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, Maryland, USA
| | - Andres E Houseman
- School of Biological and Population Health Sciences, College of Public Health and Human Sciences, Oregon State University, Corvallis, Oregon, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
| | - Andrew E Teschendorff
- CAS Key Lab of Computational Biology, CAS-MPG Partner Institute for Computational Biology, Shanghai Institute for Biological Sciences, Chinese Academy of Sciences, Shanghai, China.,Statistical Cancer Genomics, Paul O'Gorman Building, UCL Cancer Institute, University College London, London, UK.,Department of Women's Cancer, University College London, London, UK
| |
Collapse
|
34
|
Shukla CJ, McCorkindale AL, Gerhardinger C, Korthauer KD, Cabili MN, Shechner DM, Irizarry RA, Maass PG, Rinn JL. High-throughput identification of RNA nuclear enrichment sequences. EMBO J 2018; 37:embj.201798452. [PMID: 29335281 PMCID: PMC5852646 DOI: 10.15252/embj.201798452] [Citation(s) in RCA: 80] [Impact Index Per Article: 13.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/19/2017] [Revised: 12/18/2017] [Accepted: 12/20/2017] [Indexed: 11/21/2022] Open
Abstract
In the post‐genomic era, thousands of putative noncoding regulatory regions have been identified, such as enhancers, promoters, long noncoding RNAs (lncRNAs), and a cadre of small peptides. These ever‐growing catalogs require high‐throughput assays to test their functionality at scale. Massively parallel reporter assays have greatly enhanced the understanding of noncoding DNA elements en masse. Here, we present a massively parallel RNA assay (MPRNA) that can assay 10,000 or more RNA segments for RNA‐based functionality. We applied MPRNA to identify RNA‐based nuclear localization domains harbored in lncRNAs. We examined a pool of 11,969 oligos densely tiling 38 human lncRNAs that were fused to a cytosolic transcript. After cell fractionation and barcode sequencing, we identified 109 unique RNA regions that significantly enriched this cytosolic transcript in the nucleus including a cytosine‐rich motif. These nuclear enrichment sequences are highly conserved and over‐represented in global nuclear fractionation sequencing. Importantly, many of these regions were independently validated by single‐molecule RNA fluorescence in situ hybridization. Overall, we demonstrate the utility of MPRNA for future investigation of RNA‐based functionalities.
Collapse
Affiliation(s)
- Chinmay J Shukla
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA, USA
| | - Alexandra L McCorkindale
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Berlin Institute for Medical Systems Biology, Max Delbrück Center for Molecular Medicine, Berlin, Germany
| | - Chiara Gerhardinger
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Keegan D Korthauer
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | | | - David M Shechner
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA.,Broad Institute of MIT and Harvard, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Philipp G Maass
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA
| | - John L Rinn
- Department of Stem Cell and Regenerative Biology, Harvard University, Cambridge, MA, USA .,Broad Institute of MIT and Harvard, Cambridge, MA, USA.,Department of Pathology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| |
Collapse
|
35
|
Duvallet C, Gibbons SM, Gurry T, Irizarry RA, Alm EJ. Meta-analysis of gut microbiome studies identifies disease-specific and shared responses. Nat Commun 2017; 8:1784. [PMID: 29209090 PMCID: PMC5716994 DOI: 10.1038/s41467-017-01973-8] [Citation(s) in RCA: 564] [Impact Index Per Article: 80.6] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2017] [Accepted: 10/30/2017] [Indexed: 12/16/2022] Open
Abstract
Hundreds of clinical studies have demonstrated associations between the human microbiome and disease, yet fundamental questions remain on how we can generalize this knowledge. Results from individual studies can be inconsistent, and comparing published data is further complicated by a lack of standard processing and analysis methods. Here we introduce the MicrobiomeHD database, which includes 28 published case–control gut microbiome studies spanning ten diseases. We perform a cross-disease meta-analysis of these studies using standardized methods. We find consistent patterns characterizing disease-associated microbiome changes. Some diseases are associated with over 50 genera, while most show only 10–15 genus-level changes. Some diseases are marked by the presence of potentially pathogenic microbes, whereas others are characterized by a depletion of health-associated bacteria. Furthermore, we show that about half of genera associated with individual studies are bacteria that respond to more than one disease. Thus, many associations found in case–control studies are likely not disease-specific but rather part of a non-specific, shared response to health and disease. Reported associations between the human microbiome and disease are often inconsistent. Here, Duvallet et al. perform a meta-analysis of 28 gut microbiome studies spanning ten diseases, and find associations that are likely not disease-specific but potentially part of a shared response to disease.
Collapse
Affiliation(s)
- Claire Duvallet
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA
| | - Sean M Gibbons
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA
| | - Thomas Gurry
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, 02115, USA
| | - Eric J Alm
- Department of Biological Engineering, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,Center for Microbiome Informatics and Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA. .,The Broad Institute of MIT and Harvard, Cambridge, MA, 02139, USA.
| |
Collapse
|
36
|
Teng M, Irizarry RA. Accounting for GC-content bias reduces systematic errors and batch effects in ChIP-seq data. Genome Res 2017; 27:1930-1938. [PMID: 29025895 PMCID: PMC5668949 DOI: 10.1101/gr.220673.117] [Citation(s) in RCA: 22] [Impact Index Per Article: 3.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2017] [Accepted: 08/14/2017] [Indexed: 12/01/2022]
Abstract
The main application of ChIP-seq technology is the detection of genomic regions that bind to a protein of interest. A large part of functional genomics’ public catalogs is based on ChIP-seq data. These catalogs rely on peak calling algorithms that infer protein-binding sites by detecting genomic regions associated with more mapped reads (coverage) than expected by chance, as a result of the experimental protocol's lack of perfect specificity. We find that GC-content bias accounts for substantial variability in the observed coverage for ChIP-seq experiments and that this variability leads to false-positive peak calls. More concerning is that the GC effect varies across experiments, with the effect strong enough to result in a substantial number of peaks called differently when different laboratories perform experiments on the same cell line. However, accounting for GC content bias in ChIP-seq is challenging because the binding sites of interest tend to be more common in high GC-content regions, which confounds real biological signals with unwanted variability. To account for this challenge, we introduce a statistical approach that accounts for GC effects on both nonspecific noise and signal induced by the binding site. The method can be used to account for this bias in binding quantification as well to improve existing peak calling algorithms. We use this approach to show a reduction in false-positive peaks as well as improved consistency across laboratories.
Collapse
Affiliation(s)
- Mingxiang Teng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001, China
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA.,Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts 02115, USA
| |
Collapse
|
37
|
Nakayama RT, Pulice JL, Valencia AM, McBride MJ, McKenzie ZM, Gillespie MA, Ku WL, Teng M, Cui K, Williams RT, Cassel SH, Qing H, Widmer CJ, Demetri GD, Irizarry RA, Zhao K, Ranish JA, Kadoch C. SMARCB1 is required for widespread BAF complex-mediated activation of enhancers and bivalent promoters. Nat Genet 2017; 49:1613-1623. [PMID: 28945250 PMCID: PMC5803080 DOI: 10.1038/ng.3958] [Citation(s) in RCA: 174] [Impact Index Per Article: 24.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2016] [Accepted: 08/29/2017] [Indexed: 12/15/2022]
Abstract
Perturbations to mammalian SWI/SNF (BAF) complexes contribute to over 20% of human cancers, with driving roles first identified in malignant rhabdoid tumor (MRT), an aggressive pediatric cancer characterized by biallelic inactivation of the core BAF complex subunit SMARCB1 (BAF47). However, the mechanism by which this alteration contributes to tumorigenesis remains poorly understood. We find that BAF47 loss destabilizes BAF complexes on chromatin, absent significant changes in intra-complex integrity. Rescue of BAF47 in BAF47-deficient sarcoma cell lines results in increased genome-wide BAF complex occupancy, facilitating widespread enhancer activation and opposition of polycomb-mediated repression at bivalent promoters. We demonstrate differential regulation by BAF and PBAF complexes at enhancers and promoters, respectively, suggesting distinct functions of each complex which are perturbed upon BAF47 loss. Our results demonstrate collaborative mechanisms of mSWI/SNF-mediated gene activation, identifying functions that are coopted or abated to drive human cancers and developmental disorders.
Collapse
Affiliation(s)
- Robert T Nakayama
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Ludwig Center at Dana-Farber/Harvard and Center for Sarcoma and Bone Oncology, Department of Medical Oncology, Harvard Medical School, Boston, Massachusetts, USA
| | - John L Pulice
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Alfredo M Valencia
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Program in Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Matthew J McBride
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Program in Chemical Biology, Harvard University, Cambridge, Massachusetts, USA
| | - Zachary M McKenzie
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
| | | | - Wai Lim Ku
- Systems Biology Center, NHLBI, National Institutes of Health, Bethesda, Maryland, USA
| | - Mingxiang Teng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Kairong Cui
- Systems Biology Center, NHLBI, National Institutes of Health, Bethesda, Maryland, USA
| | - Robert T Williams
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Seth H Cassel
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Medical Scientist Training Program, Harvard Medical School, Boston, Massachusetts, USA
| | - He Qing
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - Christian J Widmer
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA
| | - George D Demetri
- Ludwig Center at Dana-Farber/Harvard and Center for Sarcoma and Bone Oncology, Department of Medical Oncology, Harvard Medical School, Boston, Massachusetts, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
| | - Keji Zhao
- Systems Biology Center, NHLBI, National Institutes of Health, Bethesda, Maryland, USA
| | | | - Cigall Kadoch
- Department of Pediatric Oncology, Dana-Farber Cancer Institute and Harvard Medical School, Boston, Massachusetts, USA.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| |
Collapse
|
38
|
Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. Salmon provides fast and bias-aware quantification of transcript expression. Nat Methods 2017; 14:417-419. [PMID: 28263959 PMCID: PMC5600148 DOI: 10.1038/nmeth.4197] [Citation(s) in RCA: 5230] [Impact Index Per Article: 747.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/29/2016] [Accepted: 01/22/2017] [Indexed: 12/12/2022]
Abstract
We introduce Salmon, a lightweight method for quantifying transcript abundance from RNA-seq reads. Salmon combines a new dual-phase parallel inference algorithm and feature-rich bias models with an ultra-fast read mapping procedure. It is the first transcriptome-wide quantifier to correct for fragment GC-content bias, which, as we demonstrate here, substantially improves the accuracy of abundance estimates and the sensitivity of subsequent differential expression analysis.
Collapse
Affiliation(s)
- Rob Patro
- Department of Computer Science, Stony Brook University, Stony Brook, New York, USA
| | | | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Cambridge, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Cambridge, Massachusetts, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Cambridge, Massachusetts, USA
| | - Carl Kingsford
- Computational Biology Department, Carnegie Mellon University, Pittsburgh, Pennsylvania, USA
| |
Collapse
|
39
|
Campbell PT, Rebbeck TR, Nishihara R, Beck AH, Begg CB, Bogdanov AA, Cao Y, Coleman HG, Freeman GJ, Heng YJ, Huttenhower C, Irizarry RA, Kip NS, Michor F, Nevo D, Peters U, Phipps AI, Poole EM, Qian ZR, Quackenbush J, Robins H, Rogan PK, Slattery ML, Smith-Warner SA, Song M, VanderWeele TJ, Xia D, Zabor EC, Zhang X, Wang M, Ogino S. Proceedings of the third international molecular pathological epidemiology (MPE) meeting. Cancer Causes Control 2017; 28:167-176. [PMID: 28097472 PMCID: PMC5303153 DOI: 10.1007/s10552-016-0845-z] [Citation(s) in RCA: 14] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2016] [Accepted: 12/20/2016] [Indexed: 02/07/2023]
Abstract
Molecular pathological epidemiology (MPE) is a transdisciplinary and relatively new scientific discipline that integrates theory, methods, and resources from epidemiology, pathology, biostatistics, bioinformatics, and computational biology. The underlying objective of MPE research is to better understand the etiology and progression of complex and heterogeneous human diseases with the goal of informing prevention and treatment efforts in population health and clinical medicine. Although MPE research has been commonly applied to investigating breast, lung, and colorectal cancers, its methodology can be used to study most diseases. Recent successes in MPE studies include: (1) the development of new statistical methods to address etiologic heterogeneity; (2) the enhancement of causal inference; (3) the identification of previously unknown exposure-subtype disease associations; and (4) better understanding of the role of lifestyle/behavioral factors on modifying prognosis according to disease subtype. Central challenges to MPE include the relative lack of transdisciplinary experts, educational programs, and forums to discuss issues related to the advancement of the field. To address these challenges, highlight recent successes in the field, and identify new opportunities, a series of MPE meetings have been held at the Dana-Farber Cancer Institute in Boston, MA. Herein, we share the proceedings of the Third International MPE Meeting, held in May 2016 and attended by 150 scientists from 17 countries. Special topics included integration of MPE with immunology and health disparity research. This meeting series will continue to provide an impetus to foster further transdisciplinary integration of divergent scientific fields.
Collapse
Affiliation(s)
- Peter T Campbell
- Epidemiology Research Program, American Cancer Society, 250 Williams Street NW, Atlanta, GA, 30303, USA.
| | - Timothy R Rebbeck
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Reiko Nishihara
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Andrew H Beck
- Cancer Research Institute, Beth Israel Deaconess Cancer Center, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Colin B Begg
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Alexei A Bogdanov
- Department of Radiology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Yin Cao
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
- Division of Gastroenterology, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Helen G Coleman
- Epidemiology and Health Services Research Group, Centre for Public Health, Queens University Belfast, Belfast, Northern Ireland
| | - Gordon J Freeman
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - Yujing J Heng
- Cancer Research Institute, Beth Israel Deaconess Cancer Center, Boston, MA, USA
- Department of Pathology, Harvard Medical School, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Microbial Systems and Communities, Genome Sequencing and Analysis Program, The Broad Institute, Cambridge, MA, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - N Sertac Kip
- Laboratory Medicine and Pathology, Geisinger Health System, Danville, PA, USA
| | - Franziska Michor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Daniel Nevo
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Ulrike Peters
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, USA
| | - Amanda I Phipps
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
- Department of Epidemiology, School of Public Health, University of Washington, Seattle, WA, USA
| | - Elizabeth M Poole
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Zhi Rong Qian
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA
| | - John Quackenbush
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, MA, USA
| | - Harlan Robins
- Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Peter K Rogan
- Department of Biochemistry, University of Western Ontario, London, Canada
| | | | - Stephanie A Smith-Warner
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Mingyang Song
- Department of Nutrition, Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Clinical and Translational Epidemiology Unit, Massachusetts General Hospital and Harvard Medical School, Boston, MA, USA
| | - Tyler J VanderWeele
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Daniel Xia
- Department of Pathology, Brigham and Women's Hospital, Boston, MA, USA
| | - Emily C Zabor
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Xuehong Zhang
- Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
| | - Molin Wang
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Shuji Ogino
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Medical Oncology, Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA, USA.
- Division of MPE Molecular Pathological Epidemiology, Department of Pathology, Brigham and Women's Hospital and Harvard Medical School, 450 Brookline Ave, Room SM1036, Boston, MA, 02215, USA.
- Department of Oncologic Pathology, Dana-Farber Cancer Institute, Boston, MA, USA.
| |
Collapse
|
40
|
Abstract
We find that current computational methods for estimating transcript abundance from RNA-seq data can lead to hundreds of false-positive results. We show that these systematic errors stem largely from a failure to model fragment GC content bias. Sample-specific biases associated with fragment sequence features lead to misidentification of transcript isoforms. We introduce alpine, a method for estimating sample-specific bias-corrected transcript abundance. By incorporating fragment sequence features, alpine greatly increases the accuracy of transcript abundance estimates, enabling a fourfold reduction in the number of false positives for reported changes in expression compared with Cufflinks. Using simulated data, we also show that alpine retains the ability to discover true positives, similar to other approaches. The method is available as an R/Bioconductor package that includes data visualization tools useful for bias discovery.
Collapse
Affiliation(s)
- Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| | - John B Hogenesch
- Department of Pharmacology, Institute for Translational Medicine and Therapeutics, University of Pennsylvania School of Medicine, Philadelphia, Pennsylvania, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, Boston, Massachusetts, USA
- Department of Biostatistics, Harvard TH Chan School of Public Health, Boston, Massachusetts, USA
| |
Collapse
|
41
|
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Erratum to: A benchmark for RNA-seq quantification pipelines. Genome Biol 2016; 17:203. [PMID: 27716375 PMCID: PMC5045616 DOI: 10.1186/s13059-016-1060-7] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 09/12/2016] [Indexed: 11/30/2022] Open
Affiliation(s)
- Mingxiang Teng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Carrie A Davis
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Sarah Djebali
- Bioinformatics and Genomics Programme Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Alexander Dobin
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Sheng Li
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Sara Olson
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Dmitri Pervouchine
- Bioinformatics and Genomics Programme Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Cricket A Sloan
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477, Stanford, CA, 94305, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Lijun Zhan
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA. .,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| |
Collapse
|
42
|
Collado-Torres L, Nellore A, Frazee AC, Wilks C, Love MI, Langmead B, Irizarry RA, Leek JT, Jaffe AE. Flexible expressed region analysis for RNA-seq with derfinder. Nucleic Acids Res 2016; 45:e9. [PMID: 27694310 PMCID: PMC5314792 DOI: 10.1093/nar/gkw852] [Citation(s) in RCA: 42] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/16/2016] [Revised: 08/25/2016] [Accepted: 09/15/2016] [Indexed: 12/20/2022] Open
Abstract
Differential expression analysis of RNA sequencing (RNA-seq) data typically relies on reconstructing transcripts or counting reads that overlap known gene structures. We previously introduced an intermediate statistical approach called differentially expressed region (DER) finder that seeks to identify contiguous regions of the genome showing differential expression signal at single base resolution without relying on existing annotation or potentially inaccurate transcript assembly. We present the derfinder software that improves our annotation-agnostic approach to RNA-seq analysis by: (i) implementing a computationally efficient bump-hunting approach to identify DERs that permits genome-scale analyses in a large number of samples, (ii) introducing a flexible statistical modeling framework, including multi-group and time-course analyses and (iii) introducing a new set of data visualizations for expressed region analysis. We apply this approach to public RNA-seq data from the Genotype-Tissue Expression (GTEx) project and BrainSpan project to show that derfinder permits the analysis of hundreds of samples at base resolution in R, identifies expression outside of known gene boundaries and can be used to visualize expressed regions at base-resolution. In simulations, our base resolution approaches enable discovery in the presence of incomplete annotation and is nearly as powerful as feature-level methods when the annotation is complete. derfinder analysis using expressed region-level and single base-level approaches provides a compromise between full transcript reconstruction and feature-level analysis. The package is available from Bioconductor at www.bioconductor.org/packages/derfinder.
Collapse
Affiliation(s)
- Leonardo Collado-Torres
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA.,Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA
| | - Abhinav Nellore
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Alyssa C Frazee
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Christopher Wilks
- Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Michael I Love
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.,Dana-Farber Cancer Institute, Harvard University, Boston, MA 02215, USA
| | - Ben Langmead
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA.,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA.,Department of Computer Science, Johns Hopkins University, Baltimore, MD 21218, USA
| | - Rafael A Irizarry
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA.,Dana-Farber Cancer Institute, Harvard University, Boston, MA 02215, USA
| | - Jeffrey T Leek
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA .,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA
| | - Andrew E Jaffe
- Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD 21205, USA .,Center for Computational Biology, Johns Hopkins University, Baltimore, MD 21205, USA.,Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Baltimore, MD 21205, USA.,Department of Mental Health, Johns Hopkins University, Baltimore, MD 21205, USA
| |
Collapse
|
43
|
Nakayama R, Williams RT, Cassel SH, Teng M, Irizarry RA, Demetri GD, Kadoch C. Abstract 2658: Genome-wide mistargeting of oncogenic SWI/SNF(BAF) complexes in SMARCB1(BAF47)-deficient sarcomas. Cancer Res 2016. [DOI: 10.1158/1538-7445.am2016-2658] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/16/2022]
Abstract
Abstract
SMARCB1/BAF47/INI1 is a core subunit of the mammalian SWI/SNF (BAF) family of ATP-dependent chromatin remodeling complexes, which remodel nucleosome architecture to achieve coordinated regulation of gene expression. Genetic loss of SMARCB1 has been identified in several cancer types, including malignant rhabdoid tumor (MRT, 98%) and epithelioid sarcoma (EpS, 90%), strongly implicating this event as the oncogenic driver in these malignancies. However, the precise mechanism underpinning the tumor suppressive function of BAF47 to date remains unclear. In order to elucidate the underlying mechanism and to identify direct genetic targets of aberrant BAF complexes in this context, we comprehensively evaluated the effects of BAF47 reintroduction in BAF47-deficient sarcomas with respect to complex subunit and associated protein factor composition and stability, global chromatin structure, and gene regulation.
Reintroduced BAF47 stably integrated into BAF complexes, and remarkably, stabilized a highly specific set of BAF subunits, resulting in an increased complex molecular weight and stoichiometric nuclear abundance. These biochemical changes inducing the formation of wild-type complexes in MRT and EpS cell settings were directly linked to reproducible changes in BAF complex localization genome-wide, particularly, in the targeting to H3K4me3-marked promoter regions of direct target genes to establish DNA accessibility. Importantly, changes in BAF47-dependent BAF complex targeting between oncogenic and induced wild-type conditions were reproducibly associated with differential chromatin architecture and gene expression signatures hallmark to both MRT and EpS, and uniformly resulted in proliferative senescence of MRT and EpS cell lines in culture.
These studies highlight, for the first time, the full spectrum of structural and functional contributions of the BAF47 subunit, implicating its role as a keystone in heteromorphic BAF complex assembly; BAF47 is required for the stable integration of several BAF subunits and novel interacting factors, which we determine govern specific genome-wide targeting mechanisms and chromatin-templated activities. These results reveal the mechanisms underlying the oncogenesis of BAF47-deficient sarcomas and point toward novel therapeutic strategies for this group of human sarcomas.
Citation Format: Robert Nakayama, Robert T. Williams, Seth H. Cassel, Mingxiang Teng, Rafael A. Irizarry, George D. Demetri, Cigall Kadoch. Genome-wide mistargeting of oncogenic SWI/SNF(BAF) complexes in SMARCB1(BAF47)-deficient sarcomas. [abstract]. In: Proceedings of the 107th Annual Meeting of the American Association for Cancer Research; 2016 Apr 16-20; New Orleans, LA. Philadelphia (PA): AACR; Cancer Res 2016;76(14 Suppl):Abstract nr 2658.
Collapse
|
44
|
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. Erratum to: A benchmark for RNA-seq quantification pipelines. Genome Biol 2016; 17:107. [PMID: 27215799 PMCID: PMC4877800 DOI: 10.1186/s13059-016-0986-0] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Track Full Text] [Download PDF] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/13/2016] [Accepted: 05/13/2016] [Indexed: 11/10/2022] Open
Affiliation(s)
- Mingxiang Teng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Carrie A Davis
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Sarah Djebali
- Bioinformatics and Genomics Programme Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Alexander Dobin
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Sheng Li
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Sara Olson
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Dmitri Pervouchine
- Bioinformatics and Genomics Programme Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Cricket A Sloan
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477, Stanford, CA, 94305, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Lijun Zhan
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA. .,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| |
Collapse
|
45
|
Teng M, Love MI, Davis CA, Djebali S, Dobin A, Graveley BR, Li S, Mason CE, Olson S, Pervouchine D, Sloan CA, Wei X, Zhan L, Irizarry RA. A benchmark for RNA-seq quantification pipelines. Genome Biol 2016; 17:74. [PMID: 27107712 PMCID: PMC4842274 DOI: 10.1186/s13059-016-0940-1] [Citation(s) in RCA: 119] [Impact Index Per Article: 14.9] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/06/2015] [Accepted: 04/08/2016] [Indexed: 02/07/2023] Open
Abstract
Obtaining RNA-seq measurements involves a complex data analytical process with a large number of competing algorithms as options. There is much debate about which of these methods provides the best approach. Unfortunately, it is currently difficult to evaluate their performance due in part to a lack of sensitive assessment metrics. We present a series of statistical summaries and plots to evaluate the performance in terms of specificity and sensitivity, available as a R/Bioconductor package (http://bioconductor.org/packages/rnaseqcomp). Using two independent datasets, we assessed seven competing pipelines. Performance was generally poor, with two methods clearly underperforming and RSEM slightly outperforming the rest.
Collapse
Affiliation(s)
- Mingxiang Teng
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.,School of Computer Science and Technology, Harbin Institute of Technology, Harbin, China
| | - Michael I Love
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA.,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA
| | - Carrie A Davis
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Sarah Djebali
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Alexander Dobin
- Functional Genomics Group, Cold Spring Harbor Laboratory, 1 Bungtown Road, Cold Spring Harbor, NY, 11724, USA
| | - Brenton R Graveley
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Sheng Li
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Christopher E Mason
- Department of Physiology and Biophysics, Weill Cornell Medical College, New York, New York, USA
| | - Sara Olson
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Dmitri Pervouchine
- Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG) and UPF, Doctor Aiguader, 88, Barcelona, 08003, Spain
| | - Cricket A Sloan
- Department of Genetics, Stanford University, 300 Pasteur Drive, MC-5477, Stanford, CA, 94305, USA
| | - Xintao Wei
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Lijun Zhan
- Department of Genetics and Genome Sciences, Institute for System Genomics, UConn Health Center, Farmington, CT, 06030, USA
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02215, USA. .,Department of Biostatistics, Harvard TH Chan School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| |
Collapse
|
46
|
Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, Liu XS. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol 2015; 15:554. [PMID: 25476604 PMCID: PMC4290824 DOI: 10.1186/s13059-014-0554-4] [Citation(s) in RCA: 1239] [Impact Index Per Article: 137.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2014] [Indexed: 12/26/2022] Open
Abstract
We propose the Model-based Analysis of Genome-wide CRISPR/Cas9 Knockout (MAGeCK) method for prioritizing single-guide RNAs, genes and pathways in genome-scale CRISPR/Cas9 knockout screens. MAGeCK demonstrates better performance compared with existing methods, identifies both positively and negatively selected genes simultaneously, and reports robust results across different experimental conditions. Using public datasets, MAGeCK identified novel essential genes and pathways, including EGFR in vemurafenib-treated A375 cells harboring a BRAF mutation. MAGeCK also detected cell type-specific essential genes, including BCR and ABL1, in KBM7 cells bearing a BCR-ABL fusion, and IGF1R in HL-60 cells, which depends on the insulin signaling pathway for proliferation.
Collapse
|
47
|
Hicks SC, Irizarry RA. quantro: a data-driven approach to guide the choice of an appropriate normalization method. Genome Biol 2015; 16:117. [PMID: 26040460 PMCID: PMC4495646 DOI: 10.1186/s13059-015-0679-0] [Citation(s) in RCA: 64] [Impact Index Per Article: 7.1] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/20/2015] [Accepted: 05/18/2015] [Indexed: 11/29/2022] Open
Abstract
Normalization is an essential step in the analysis of high-throughput data. Multi-sample global normalization methods, such as quantile normalization, have been successfully used to remove technical variation. However, these methods rely on the assumption that observed global changes across samples are due to unwanted technical variability. Applying global normalization methods has the potential to remove biologically driven variation. Currently, it is up to the subject matter experts to determine if the stated assumptions are appropriate. Here, we propose a data-driven alternative. We demonstrate the utility of our method (quantro) through examples and simulations. A software implementation is available from http://www.bioconductor.org/packages/release/bioc/html/quantro.html.
Collapse
Affiliation(s)
- Stephanie C Hicks
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02115-5450, USA. .,Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| | - Rafael A Irizarry
- Department of Biostatistics and Computational Biology, Dana-Farber Cancer Institute, 450 Brookline Avenue, Boston, MA, 02115-5450, USA. .,Department of Biostatistics, Harvard School of Public Health, 677 Huntington Avenue, Boston, MA, 02115, USA.
| |
Collapse
|
48
|
Ogino S, Campbell PT, Nishihara R, Phipps AI, Beck AH, Sherman ME, Chan AT, Troester MA, Bass AJ, Fitzgerald KC, Irizarry RA, Kelsey KT, Nan H, Peters U, Poole EM, Qian ZR, Tamimi RM, Tchetgen Tchetgen EJ, Tworoger SS, Zhang X, Giovannucci EL, van den Brandt PA, Rosner BA, Wang M, Chatterjee N, Begg CB. Proceedings of the second international molecular pathological epidemiology (MPE) meeting. Cancer Causes Control 2015; 26:959-72. [PMID: 25956270 DOI: 10.1007/s10552-015-0596-2] [Citation(s) in RCA: 34] [Impact Index Per Article: 3.8] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2015] [Accepted: 04/27/2015] [Indexed: 02/07/2023]
Abstract
Disease classification system increasingly incorporates information on pathogenic mechanisms to predict clinical outcomes and response to therapy and intervention. Technological advancements to interrogate omics (genomics, epigenomics, transcriptomics, proteomics, metabolomics, metagenomics, interactomics, etc.) provide widely open opportunities in population-based research. Molecular pathological epidemiology (MPE) represents integrative science of molecular pathology and epidemiology. This unified paradigm requires multidisciplinary collaboration between pathology, epidemiology, biostatistics, bioinformatics, and computational biology. Integration of these fields enables better understanding of etiologic heterogeneity, disease continuum, causal inference, and the impact of environment, diet, lifestyle, host factors (including genetics and immunity), and their interactions on disease evolution. Hence, the Second International MPE Meeting was held in Boston in December 2014, with aims to: (1) develop conceptual and practical frameworks; (2) cultivate and expand opportunities; (3) address challenges; and (4) initiate the effort of specifying guidelines for MPE. The meeting mainly consisted of presentations of method developments and recent data in various malignant neoplasms and tumors (breast, prostate, ovarian and colorectal cancers, renal cell carcinoma, lymphoma, and leukemia), followed by open discussion sessions on challenges and future plans. In particular, we recognized need for efforts to further develop statistical methodologies. This meeting provided an unprecedented opportunity for interdisciplinary collaboration, consistent with the purposes of the Big Data to Knowledge, Genetic Associations and Mechanisms in Oncology, and Precision Medicine Initiative of the US National Institute of Health. The MPE meeting series can help advance transdisciplinary population science and optimize training and education systems for twenty-first century medicine and public health.
Collapse
Affiliation(s)
- Shuji Ogino
- Department of Pathology, Brigham and Women's Hospital, Harvard Medical School, 450 Brookline Ave., Room M422, Boston, MA, 02215, USA,
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |
Collapse
|
49
|
Vandiver AR, Irizarry RA, Hansen KD, Garza LA, Runarsson A, Li X, Chien AL, Wang TS, Leung SG, Kang S, Feinberg AP. Age and sun exposure-related widespread genomic blocks of hypomethylation in nonmalignant skin. Genome Biol 2015; 16:80. [PMID: 25886480 PMCID: PMC4423110 DOI: 10.1186/s13059-015-0644-y] [Citation(s) in RCA: 90] [Impact Index Per Article: 10.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/08/2014] [Accepted: 03/23/2015] [Indexed: 01/09/2023] Open
Abstract
Background Aging and sun exposure are the leading causes of skin cancer. It has been shown that epigenetic changes, such as DNA methylation, are well established mechanisms for cancer, and also have emerging roles in aging and common disease. Here, we directly ask whether DNA methylation is altered following skin aging and/or chronic sun exposure in humans. Results We compare epidermis and dermis of both sun-protected and sun-exposed skin derived from younger subjects (under 35 years old) and older subjects (over 60 years old), using the Infinium HumanMethylation450 array and whole genome bisulfite sequencing. We observe large blocks of the genome that are hypomethylated in older, sun-exposed epidermal samples, with the degree of hypomethylation associated with clinical measures of photo-aging. We replicate these findings using whole genome bisulfite sequencing, comparing epidermis from an additional set of younger and older subjects. These blocks largely overlap known hypomethylated blocks in colon cancer and we observe that these same regions are similarly hypomethylated in squamous cell carcinoma samples. Conclusions These data implicate large scale epigenomic change in mediating the effects of environmental damage with photo-aging. Electronic supplementary material The online version of this article (doi:10.1186/s13059-015-0644-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Amy R Vandiver
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA.
| | - Rafael A Irizarry
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA. .,Dana-Farber Cancer Institute, CLSB 11007, 450 Brookline Ave, Boston, MA, 02215, USA.
| | - Kasper D Hansen
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA. .,Department of Biostatistics and Institute for Genetic Medicine, Johns Hopkins University School of Medicine, 615N. Wolfe St, E3527, Baltimore, MD, 21205, USA.
| | - Luis A Garza
- Department of Dermatology, Johns Hopkins University School of Medicine, CRB II Room 204, 1550 Orleans Street, Baltimore, MD, 21287, USA.
| | - Arni Runarsson
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA.
| | - Xin Li
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA.
| | - Anna L Chien
- Department of Dermatology, Johns Hopkins University School of Medicine, CRB II Room 204, 1550 Orleans Street, Baltimore, MD, 21287, USA.
| | - Timothy S Wang
- Department of Dermatology, Johns Hopkins University School of Medicine, CRB II Room 204, 1550 Orleans Street, Baltimore, MD, 21287, USA.
| | - Sherry G Leung
- Department of Dermatology, Johns Hopkins University School of Medicine, CRB II Room 204, 1550 Orleans Street, Baltimore, MD, 21287, USA.
| | - Sewon Kang
- Department of Dermatology, Johns Hopkins University School of Medicine, CRB II Room 204, 1550 Orleans Street, Baltimore, MD, 21287, USA.
| | - Andrew P Feinberg
- Center for Epigenetics, Johns Hopkins University School of Medicine, Rangos 570, 855N. Wolfe St, Baltimore, MD, 21205, USA. .,Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA.
| |
Collapse
|
50
|
Timp W, Bravo HC, McDonald OG, Goggins M, Umbricht C, Zeiger M, Feinberg AP, Irizarry RA. Large hypomethylated blocks as a universal defining epigenetic alteration in human solid tumors. Genome Med 2014; 6:61. [PMID: 25191524 PMCID: PMC4154522 DOI: 10.1186/s13073-014-0061-y] [Citation(s) in RCA: 134] [Impact Index Per Article: 13.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/12/2014] [Accepted: 08/12/2014] [Indexed: 01/30/2023] Open
Abstract
Background One of the most provocative recent observations in cancer epigenetics is the discovery of large hypomethylated blocks, including single copy genes, in colorectal cancer, that correspond in location to heterochromatic LOCKs (large organized chromatin lysine-modifications) and LADs (lamin-associated domains). Methods Here we performed a comprehensive genome-scale analysis of 10 breast, 28 colon, nine lung, 38 thyroid, 18 pancreas cancers, and five pancreas neuroendocrine tumors as well as matched normal tissue from most of these cases, as well as 51 premalignant lesions. We used a new statistical approach that allows the identification of large hypomethylated blocks on the Illumina HumanMethylation450 BeadChip platform. Results We find that hypomethylated blocks are a universal feature of common solid human cancer, and that they occur at the earliest stage of premalignant tumors and progress through clinical stages of thyroid and colon cancer development. We also find that the disrupted CpG islands widely reported previously, including hypermethylated island bodies and hypomethylated shores, are enriched in hypomethylated blocks, with flattening of the methylation signal within and flanking the islands. Finally, we found that genes showing higher between individual gene expression variability are enriched within these hypomethylated blocks. Conclusion Thus hypomethylated blocks appear to be a universal defining epigenetic alteration in human cancer, at least for common solid tumors. Electronic supplementary material The online version of this article (doi:10.1186/s13073-014-0061-y) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Winston Timp
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Department of Biomedical Engineering, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Hector Corrada Bravo
- Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD USA
| | - Oliver G McDonald
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Michael Goggins
- Department of Pathology, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Chris Umbricht
- Departments of Surgery and Molecular Biology & Genetics, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Martha Zeiger
- Departments of Surgery and Molecular Biology & Genetics, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Andrew P Feinberg
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Molecular Biology & Genetics, Johns Hopkins University School of Medicine, Baltimore, MD USA
| | - Rafael A Irizarry
- Center for Epigenetics, Johns Hopkins University School of Medicine, Baltimore, MD USA ; Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD USA ; Department of Biostatistics and Computational Biology, Dana Farber Cancer Institute and Department of Biostatistics, Harvard School of Public Health, Boston, MA USA
| |
Collapse
|