1
|
Chen Z, Liang H, Wei P. Data-adaptive and pathway-based tests for association studies between somatic mutations and germline variations in human cancers. Genet Epidemiol 2023; 47:617-636. [PMID: 37822029 DOI: 10.1002/gepi.22537] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/20/2022] [Revised: 07/22/2023] [Accepted: 09/18/2023] [Indexed: 10/13/2023]
Abstract
Cancer is a disease driven by a combination of inherited genetic variants and somatic mutations. Recently available large-scale sequencing data of cancer genomes have provided an unprecedented opportunity to study the interactions between them. However, previous studies on this topic have been limited by simple, low statistical power tests such as Fisher's exact test. In this paper, we design data-adaptive and pathway-based tests based on the score statistic for association studies between somatic mutations and germline variations. Previous research has shown that two single-nucleotide polymorphism (SNP)-set-based association tests, adaptive sum of powered score (aSPU) and data-adaptive pathway-based (aSPUpath) tests, increase the power in genome-wide association studies (GWASs) with a single disease trait in a case-control study. We extend aSPU and aSPUpath to multi-traits, that is, somatic mutations of multiple genes in a cohort study, allowing extensive information aggregation at both SNP and gene levels.p $p$ -values from different parameters assuming varying genetic architecture are combined to yield data-adaptive tests for somatic mutations and germline variations. Extensive simulations show that, in comparison with some commonly used methods, our data-adaptive somatic mutations/germline variations tests can be applied to multiple germline SNPs/genes/pathways, and generally have much higher statistical powers while maintaining the appropriate type I error. The proposed tests are applied to a large-scale real-world International Cancer Genome Consortium whole genome sequencing data set of 2583 subjects, detecting more significant and biologically relevant associations compared with the other existing methods on both gene and pathway levels. Our study has systematically identified the associations between various germline variations and somatic mutations across different cancer types, which potentially provides valuable utility for cancer risk prediction, prognosis, and therapeutics.
Collapse
Affiliation(s)
- Zhongyuan Chen
- Division of Biostatistics, Medical College of Wisconsin, Milwaukee, Wisconsin, USA
| | - Han Liang
- Department of Bioinformatics and Computational Biology, MD Anderson Cancer Center, Houston, Texas, USA
| | - Peng Wei
- Department of Biostatistics, MD Anderson Cancer Center, Houston, Texas, USA
| |
Collapse
|
2
|
McCaw ZR, O'Dushlaine C, Somineni H, Bereket M, Klein C, Karaletsos T, Casale FP, Koller D, Soare TW. An allelic-series rare-variant association test for candidate-gene discovery. Am J Hum Genet 2023; 110:1330-1342. [PMID: 37494930 PMCID: PMC10432147 DOI: 10.1016/j.ajhg.2023.07.001] [Citation(s) in RCA: 4] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/23/2022] [Revised: 06/30/2023] [Accepted: 07/01/2023] [Indexed: 07/28/2023] Open
Abstract
Allelic series are of candidate therapeutic interest because of the existence of a dose-response relationship between the functionality of a gene and the degree or severity of a phenotype. We define an allelic series as a collection of variants in which increasingly deleterious mutations lead to increasingly large phenotypic effects, and we have developed a gene-based rare-variant association test specifically targeted to identifying genes containing allelic series. Building on the well-known burden test and sequence kernel association test (SKAT), we specify a variety of association models covering different genetic architectures and integrate these into a Coding-Variant Allelic-Series Test (COAST). Through extensive simulations, we confirm that COAST maintains the type I error and improves the power when the pattern of coding-variant effect sizes increases monotonically with mutational severity. We applied COAST to identify allelic-series genes for four circulating-lipid traits and five cell-count traits among 145,735 subjects with available whole-exome sequencing data from the UK Biobank. Compared with optimal SKAT (SKAT-O), COAST identified 29% more Bonferroni-significant associations with circulating-lipid traits, on average, and 82% more with cell-count traits. All of the gene-trait associations identified by COAST have corroborating evidence either from rare-variant associations in the full cohort (Genebass, n = 400,000) or from common-variant associations in the GWAS Catalog. In addition to detecting many gene-trait associations present in Genebass by using only a fraction (36.9%) of the sample, COAST detects associations, such as that between ANGPTL4 and triglycerides, that are absent from Genebass but that have clear common-variant support.
Collapse
Affiliation(s)
| | | | | | | | | | | | - Francesco Paolo Casale
- Institute of AI for Health, Helmholtz Munich, Neuherberg, Germany; Helmholtz Pioneer Campus, Helmholtz Munich, Neuherberg, Germany; School of Computation, Information and Technology, Technical University of Munich, Munich, Germany
| | | | | |
Collapse
|
3
|
Kuksa PP, Greenfest-Allen E, Cifello J, Ionita M, Wang H, Nicaretta H, Cheng PL, Lee WP, Wang LS, Leung YY. Scalable approaches for functional analyses of whole-genome sequencing non-coding variants. Hum Mol Genet 2022; 31:R62-R72. [PMID: 35943817 PMCID: PMC9585666 DOI: 10.1093/hmg/ddac191] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/01/2022] [Revised: 08/04/2022] [Accepted: 08/08/2022] [Indexed: 11/23/2022] Open
Abstract
Non-coding genetic variants outside of protein-coding genome regions play an important role in genetic and epigenetic regulation. It has become increasingly important to understand their roles, as non-coding variants often make up the majority of top findings of genome-wide association studies (GWAS). In addition, the growing popularity of disease-specific whole-genome sequencing (WGS) efforts expands the library of and offers unique opportunities for investigating both common and rare non-coding variants, which are typically not detected in more limited GWAS approaches. However, the sheer size and breadth of WGS data introduce additional challenges to predicting functional impacts in terms of data analysis and interpretation. This review focuses on the recent approaches developed for efficient, at-scale annotation and prioritization of non-coding variants uncovered in WGS analyses. In particular, we review the latest scalable annotation tools, databases and functional genomic resources for interpreting the variant findings from WGS based on both experimental data and in silico predictive annotations. We also review machine learning-based predictive models for variant scoring and prioritization. We conclude with a discussion of future research directions which will enhance the data and tools necessary for the effective functional analyses of variants identified by WGS to improve our understanding of disease etiology.
Collapse
Affiliation(s)
- Pavel P Kuksa
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Emily Greenfest-Allen
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Jeffrey Cifello
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matei Ionita
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Hui Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Heather Nicaretta
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Po-Liang Cheng
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Wan-Ping Lee
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Li-San Wang
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yuk Yee Leung
- Penn Neurodegeneration Genomics Center, Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
4
|
Chen J, Wang L, De Jager PL, Bennett DA, Buchman AS, Yang J. A scalable Bayesian functional GWAS method accounting for multivariate quantitative functional annotations with applications for studying Alzheimer disease. HGG ADVANCES 2022; 3:100143. [PMID: 36204489 PMCID: PMC9530673 DOI: 10.1016/j.xhgg.2022.100143] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/28/2022] [Accepted: 09/14/2022] [Indexed: 11/30/2022] Open
Abstract
Existing methods for integrating functional annotations in genome-wide association studies (GWASs) to fine-map and prioritize potential causal variants are limited to using non-overlapped categorical annotations or limited by the computation burden of modeling genome-wide variants. To overcome these limitations, we propose a scalable Bayesian functional GWAS method to account for multivariate quantitative functional annotations (BFGWAS_QUANT), accompanied by a scalable computation algorithm enabling joint modeling of genome-wide variants. Simulation studies validated the performance of BFGWAS_QUANT for accurately quantifying annotation enrichment and improving GWAS power. Applying BFGWAS_QUANT to study five Alzheimer disease (AD)-related phenotypes using individual-level GWAS data (n = ∼1,000), we found that histone modification annotations have higher enrichment than expression quantitative trait locus (eQTL) annotations for all considered phenotypes, with the highest enrichment in H3K27me3 (polycomb regression). We also found that cis-eQTLs in microglia had higher enrichment than eQTLs of bulk brain frontal cortex tissue for all considered phenotypes. A similar enrichment pattern was also identified using the International Genomics of Alzheimer's Project (IGAP) summary-level GWAS data of AD (n = ∼54,000). The strongest known APOE E4 risk allele was identified for all five phenotypes, and the APOE locus was validated using the IGAP data. BFGWAS_QUANT fine-mapped 32 significant variants from 1,073 genome-wide significant variants in the IGAP data. We also demonstrated that the polygenic risk scores (PRSs) using effect size estimates by BFGWAS_QUANT had a similar prediction accuracy as other methods assuming a sparse causal model. Overall, BFGWAS_QUANT is a useful GWAS tool for quantifying annotation enrichment and prioritizing potential causal variants.
Collapse
Affiliation(s)
- Junyu Chen
- Department of Epidemiology, Emory University School of Public Health, Atlanta, GA 30322, USA
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| | - Lei Wang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
- Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO 80045, USA
| | - Philip L. De Jager
- Center for Translational and Computational Neuroimmunology, Department of Neurology and Taub Institute for Research on Alzheimer’s Disease and the Aging Brain, Columbia University Irving Medical Center, New York, NY 10032, USA
| | - David A. Bennett
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Aron S. Buchman
- Rush Alzheimer’s Disease Center, Rush University Medical Center, Chicago, IL 60612, USA
| | - Jingjing Yang
- Center for Computational and Quantitative Genetics, Department of Human Genetics, Emory University School of Medicine, Atlanta, GA 30322, USA
| |
Collapse
|
5
|
Deng Y, He Y, Xu G, Pan W. Speeding up Monte Carlo simulations for the adaptive sum of powered score test with importance sampling. Biometrics 2022; 78:261-273. [PMID: 33215683 PMCID: PMC8134502 DOI: 10.1111/biom.13407] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.7] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/30/2019] [Revised: 08/30/2020] [Accepted: 10/29/2020] [Indexed: 12/21/2022]
Abstract
A central but challenging problem in genetic studies is to test for (usually weak) associations between a complex trait (e.g., a disease status) and sets of multiple genetic variants. Due to the lack of a uniformly most powerful test, data-adaptive tests, such as the adaptive sum of powered score (aSPU) test, are advantageous in maintaining high power against a wide range of alternatives. However, there is often no closed-form to accurately and analytically calculate the p-values of many adaptive tests like aSPU, thus Monte Carlo (MC) simulations are often used, which can be time consuming to achieve a stringent significance level (e.g., 5e-8) used in genome-wide association studies (GWAS). To estimate such a small p-value, we need a huge number of MC simulations (e.g., 1e+10). As an alternative, we propose using importance sampling to speed up such calculations. We develop some theory to motivate a proposed algorithm for the aSPU test, and show that the proposed method is computationally more efficient than the standard MC simulations. Using both simulated and real data, we demonstrate the superior performance of the new method over the standard MC simulations.
Collapse
Affiliation(s)
- Yangqing Deng
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Department of Mathematics, University of North Texas, Denton, TX 76203, USA
| | - Yinqiu He
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, Ann Arbor, MI 48109, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, Minneapolis, MN 55455, USA,Corresponding author:
| |
Collapse
|
6
|
Sarnowski C, Huan T, Jain D, Liu C, Yao C, Joehanes R, Levy D, Dupuis J. JEM: A joint test to estimate the effect of multiple genetic variants on DNA methylation. Genet Epidemiol 2021; 45:280-292. [PMID: 33038041 PMCID: PMC8005415 DOI: 10.1002/gepi.22369] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/20/2020] [Revised: 09/11/2020] [Accepted: 09/29/2020] [Indexed: 11/10/2022]
Abstract
Multiple methods have been proposed to aggregate genetic variants in a gene or a region and jointly test their association with a trait of interest. However, these joint tests do not provide estimates of the individual effect of each variant. Moreover, few methods have evaluated the joint association of multiple variants with DNA methylation. We propose a method based on linear mixed models to estimate the joint and individual effect of multiple genetic variants on DNA methylation leveraging genomic annotations. Our approach is flexible, can incorporate covariates and annotation features, and takes into account relatedness and linkage disequilibrium (LD). Our method had correct Type-I error and overall high power for different simulated scenarios where we varied the number and specificity of functional annotations, number of causal and total genetic variants, frequency of genetic variants, LD, and genetic variant effect. Our method outperformed the family Sequence Kernel Association Test and had more stable estimations of effects than a classical single-variant linear mixed-effect model. Applied genome-wide to the Framingham Heart Study data, our method identified 921 DNA methylation sites influenced by at least one rare or low-frequency genetic variant located within 50 kilobases (kb) of the DNA methylation site.
Collapse
Affiliation(s)
- Chloé Sarnowski
- Department of Biostatistics, Boston University School of
Public Health, Boston, MA, United States of America
| | - Tianxiao Huan
- Boston University’s and National Heart, Lung, and
Blood Institute’s Framingham Heart Study, Framingham, MA, United States of
America
- The Population Sciences Branch, National Heart, Lung, and
Blood Institute, National Institutes of Health, Bethesda, MD, United States of
America
| | - Deepti Jain
- Department of Biostatistics, University of Washington,
Seattle, WA, United States of America
| | - Chunyu Liu
- Department of Biostatistics, Boston University School of
Public Health, Boston, MA, United States of America
- Boston University’s and National Heart, Lung, and
Blood Institute’s Framingham Heart Study, Framingham, MA, United States of
America
- The Population Sciences Branch, National Heart, Lung, and
Blood Institute, National Institutes of Health, Bethesda, MD, United States of
America
| | - Chen Yao
- Boston University’s and National Heart, Lung, and
Blood Institute’s Framingham Heart Study, Framingham, MA, United States of
America
- The Population Sciences Branch, National Heart, Lung, and
Blood Institute, National Institutes of Health, Bethesda, MD, United States of
America
| | - Roby Joehanes
- Boston University’s and National Heart, Lung, and
Blood Institute’s Framingham Heart Study, Framingham, MA, United States of
America
- The Population Sciences Branch, National Heart, Lung, and
Blood Institute, National Institutes of Health, Bethesda, MD, United States of
America
- Hebrew SeniorLife, Harvard Medical School, Boston, MA,
United States of America
| | - Daniel Levy
- Boston University’s and National Heart, Lung, and
Blood Institute’s Framingham Heart Study, Framingham, MA, United States of
America
- The Population Sciences Branch, National Heart, Lung, and
Blood Institute, National Institutes of Health, Bethesda, MD, United States of
America
| | - Josée Dupuis
- Department of Biostatistics, Boston University School of
Public Health, Boston, MA, United States of America
| |
Collapse
|
7
|
Yang T, Tang H, Risch HA, Olson SH, Petersen G, Bracci PM, Gallinger S, Hung R, Neale RE, Scelo G, Duell EJ, Kurtz RC, Khaw KT, Severi G, Sund M, Wareham N, Amos CI, Li D, Wei P. Incorporating multiple sets of eQTL weights into gene-by-environment interaction analysis identifies novel susceptibility loci for pancreatic cancer. Genet Epidemiol 2020; 44:880-892. [PMID: 32779232 PMCID: PMC7657998 DOI: 10.1002/gepi.22348] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2020] [Revised: 07/14/2020] [Accepted: 07/30/2020] [Indexed: 11/11/2022]
Abstract
It is of great scientific interest to identify interactions between genetic variants and environmental exposures that may modify the risk of complex diseases. However, larger sample sizes are usually required to detect gene-by-environment interaction (G × E) than required to detect genetic main association effects. To boost the statistical power and improve the understanding of the underlying molecular mechanisms, we incorporate functional genomics information, specifically, expression quantitative trait loci (eQTLs), into a data-adaptive G × E test, called aGEw. This test adaptively chooses the best eQTL weights from multiple tissues and provides an extra layer of weighting at the genetic variant level. Extensive simulations show that the aGEw test can control the Type 1 error rate, and the power is resilient to the inclusion of neutral variants and noninformative external weights. We applied the proposed aGEw test to the Pancreatic Cancer Case-Control Consortium (discovery cohort of 3,585 cases and 3,482 controls) and the PanScan II genome-wide association study data (replication cohort of 2,021 cases and 2,105 controls) with smoking as the exposure of interest. Two novel putative smoking-related pancreatic cancer susceptibility genes, TRIP10 and KDM3A, were identified. The aGEw test is implemented in an R package aGE.
Collapse
Affiliation(s)
- Tianzhong Yang
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
- Divison of Biostatistics, University of Minnesota, Minneapolis, MN, USA
| | - Hongwei Tang
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | | | - Sara H. Olson
- Department of Epidemiology and Biostatistics, Memorial Sloan Kettering Cancer Center, New York, NY, US
| | - Gloria Petersen
- Department of Health Sciences Research, Mayo Clinic, Rochester, MN, USA
| | - Paige M. Bracci
- Department of Epidemiology & Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Steven Gallinger
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rayjean Hung
- Lunenfeld-Tanenbaum Research Institute, Mount Sinai Hospital, University of Toronto, Toronto, Canada
| | - Rachel E. Neale
- Cancer Aetiology and Prevention Group, QIMR Berghofer Medical Research Institute, Brisbane, QLD, Australia
| | | | - Eric J. Duell
- Unit of Nutrition and Cancer, Cancer Epidemiology Research Program Catalan Institute of Oncology - Bellvitge Biomedical Research Institute (ICO-IDIBELL) Avda. Gran Via 199-203 08908 L’Hospitalet de Llobregat, Barcelona, Spain
| | - Robert C. Kurtz
- Department of Surgery, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Kay-Tee Khaw
- Department of Public Health and Primary Care, University of Cambridge, UK
| | - Gianluca Severi
- Gustave Roussy, F-94805, Villejuif, France
- CESP, Fac. de médecine - Univ. Paris-Sud, Fac. de médecine - UVSQ, INSERM, Université Paris-Saclay, 94805, Villejuif, France
| | - Malin Sund
- Department of Surgical and Perioperative Sciences, Umeå University, Sweden
| | - Nick Wareham
- MRC Epidemiology Unit, University of Cambridge School of Clinical Medicine, Cambridge, UK
| | - Christopher I Amos
- Dan L Duncan Comprehensive Cancer Center, Baylor College of Medicine, Houston, TX, USA
| | - Donghui Li
- Department of Gastrointestinal Medical Oncology, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| |
Collapse
|
8
|
Li X, Li Z, Zhou H, Gaynor SM, Liu Y, Chen H, Sun R, Dey R, Arnett DK, Aslibekyan S, Ballantyne CM, Bielak LF, Blangero J, Boerwinkle E, Bowden DW, Broome JG, Conomos MP, Correa A, Cupples LA, Curran JE, Freedman BI, Guo X, Hindy G, Irvin MR, Kardia SLR, Kathiresan S, Khan AT, Kooperberg CL, Laurie CC, Liu XS, Mahaney MC, Manichaikul AW, Martin LW, Mathias RA, McGarvey ST, Mitchell BD, Montasser ME, Moore JE, Morrison AC, O'Connell JR, Palmer ND, Pampana A, Peralta JM, Peyser PA, Psaty BM, Redline S, Rice KM, Rich SS, Smith JA, Tiwari HK, Tsai MY, Vasan RS, Wang FF, Weeks DE, Weng Z, Wilson JG, Yanek LR, Neale BM, Sunyaev SR, Abecasis GR, Rotter JI, Willer CJ, Peloso GM, Natarajan P, Lin X. Dynamic incorporation of multiple in silico functional annotations empowers rare variant association analysis of large whole-genome sequencing studies at scale. Nat Genet 2020; 52:969-983. [PMID: 32839606 PMCID: PMC7483769 DOI: 10.1038/s41588-020-0676-4] [Citation(s) in RCA: 150] [Impact Index Per Article: 30.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2019] [Accepted: 07/02/2020] [Indexed: 12/13/2022]
Abstract
Large-scale whole-genome sequencing studies have enabled the analysis of rare variants (RVs) associated with complex phenotypes. Commonly used RV association tests have limited scope to leverage variant functions. We propose STAAR (variant-set test for association using annotation information), a scalable and powerful RV association test method that effectively incorporates both variant categories and multiple complementary annotations using a dynamic weighting scheme. For the latter, we introduce 'annotation principal components', multidimensional summaries of in silico variant annotations. STAAR accounts for population structure and relatedness and is scalable for analyzing very large cohort and biobank whole-genome sequencing studies of continuous and dichotomous traits. We applied STAAR to identify RVs associated with four lipid traits in 12,316 discovery and 17,822 replication samples from the Trans-Omics for Precision Medicine Program. We discovered and replicated new RV associations, including disruptive missense RVs of NPC1L1 and an intergenic region near APOC1P1 associated with low-density lipoprotein cholesterol.
Collapse
Affiliation(s)
- Xihao Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Zilin Li
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Hufeng Zhou
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Sheila M Gaynor
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Yaowu Liu
- School of Statistics, Southwestern University of Finance and Economics, Chengdu, China
| | - Han Chen
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Center for Precision Health, School of Public Health and School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Ryan Sun
- Department of Biostatistics, University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Rounak Dey
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA
| | - Donna K Arnett
- College of Public Health, University of Kentucky, Lexington, KY, USA
| | - Stella Aslibekyan
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | - Lawrence F Bielak
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - John Blangero
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Eric Boerwinkle
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
| | - Donald W Bowden
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Jai G Broome
- Division of Medical Genetics, University of Washington, Seattle, WA, USA
| | - Matthew P Conomos
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Adolfo Correa
- Jackson Heart Study, Department of Medicine, University of Mississippi Medical Center, Jackson, MS, USA
| | - L Adrienne Cupples
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
| | - Joanne E Curran
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Barry I Freedman
- Department of Internal Medicine, Nephrology, Wake Forest School of Medicine, Winston-Salem, NC, USA
| | - Xiuqing Guo
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - George Hindy
- Department of Population Medicine, Qatar University College of Medicine, QU Health, Doha, Qatar
| | - Marguerite R Irvin
- Department of Epidemiology, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Sharon L R Kardia
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Sekar Kathiresan
- Verve Therapeutics, Cambridge, MA, USA
- Cardiology Division, Massachusetts General Hospital, Boston, MA, USA
- Department of Medicine, Harvard Medical School, Boston, MA, USA
| | - Alyna T Khan
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Charles L Kooperberg
- Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, USA
| | - Cathy C Laurie
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - X Shirley Liu
- Department of Data Sciences, Dana-Farber Cancer Institute and Harvard T.H. Chan School of Public Health, Boston, MA, USA
- Department of Statistics, Harvard University, Cambridge, MA, USA
| | - Michael C Mahaney
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Ani W Manichaikul
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Lisa W Martin
- Division of Cardiology, George Washington School of Medicine and Health Sciences, Washington, DC, USA
| | - Rasika A Mathias
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Stephen T McGarvey
- Department of Epidemiology, International Health Institute, Department of Anthropology, Brown University, Providence, RI, USA
| | - Braxton D Mitchell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
- Geriatrics Research and Education Clinical Center, Baltimore VA Medical Center, Baltimore, MD, USA
| | - May E Montasser
- Division of Endocrinology, Diabetes, and Nutrition, Program for Personalized and Genomic Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Jill E Moore
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - Alanna C Morrison
- Human Genetics Center, Department of Epidemiology, Human Genetics, and Environmental Sciences, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
| | - Jeffrey R O'Connell
- Department of Medicine, University of Maryland School of Medicine, Baltimore, MD, USA
| | - Nicholette D Palmer
- Department of Biochemistry, Wake Forest University School of Medicine, Winston-Salem, NC, USA
| | - Akhil Pampana
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Juan M Peralta
- Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, The University of Texas Rio Grande Valley, Brownsville, TX, USA
| | - Patricia A Peyser
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
| | - Bruce M Psaty
- Cardiovascular Health Research Unit, Departments of Medicine, Epidemiology, and Health Services, University of Washington, Seattle, WA, USA
- Kaiser Permanente Washington Health Research Institute, Seattle, WA, USA
| | - Susan Redline
- Division of Sleep and Circadian Disorders, Brigham and Women's Hospital, Boston, MA, USA
- Division of Sleep Medicine, Harvard Medical School, Boston, MA, USA
- Division of Pulmonary, Critical Care, and Sleep Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Kenneth M Rice
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Stephen S Rich
- Center for Public Health Genomics, University of Virginia, Charlottesville, VA, USA
| | - Jennifer A Smith
- Department of Epidemiology, School of Public Health, University of Michigan, Ann Arbor, MI, USA
- Survey Research Center, Institute for Social Research, University of Michigan, Ann Arbor, MI, USA
| | - Hemant K Tiwari
- Department of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Michael Y Tsai
- Department of Laboratory Medicine and Pathology, University of Minnesota, Minneapolis, MN, USA
| | - Ramachandran S Vasan
- Framingham Heart Study, National Heart, Lung, and Blood Institute and Boston University, Framingham, MA, USA
- Department of Medicine, Boston University School of Medicine, Boston, MA, USA
| | - Fei Fei Wang
- Department of Biostatistics, University of Washington, Seattle, WA, USA
| | - Daniel E Weeks
- Department of Human Genetics and Biostatistics, University of Pittsburgh, Pittsburgh, PA, USA
| | - Zhiping Weng
- Program in Bioinformatics and Integrative Biology, University of Massachusetts Medical School, Worcester, MA, USA
| | - James G Wilson
- Department of Physiology and Biophysics, University of Mississippi Medical Center, Jackson, MS, USA
- Division of Cardiology, Beth Israel Deaconess Medical Center, Boston, MA, USA
| | - Lisa R Yanek
- GeneSTAR Research Program, Department of Medicine, Johns Hopkins University School of Medicine, Baltimore, MD, USA
| | - Benjamin M Neale
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Stanley Center for Psychiatric Research, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Analytic and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA, USA
| | - Shamil R Sunyaev
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Division of Genetics, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA
- Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA
| | - Gonçalo R Abecasis
- Regeneron Pharmaceuticals, Tarrytown, NY, USA
- Department of Biostatistics, University of Michigan, Ann Arbor, MI, USA
| | - Jerome I Rotter
- The Institute for Translational Genomics and Population Sciences, Department of Pediatrics, The Lundquist Institute for Biomedical Innovation at Harbor-UCLA Medical Center, Torrance, CA, USA
| | - Cristen J Willer
- Department of Internal Medicine, University of Michigan, Ann Arbor, MI, USA
- Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA
- Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA
| | - Gina M Peloso
- Department of Biostatistics, Boston University School of Public Health, Boston, MA, USA
| | - Pradeep Natarajan
- Department of Medicine, Harvard Medical School, Boston, MA, USA
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA
- Center for Genomic Medicine and Cardiovascular Research Center, Massachusetts General Hospital, Boston, MA, USA
| | - Xihong Lin
- Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
- Department of Statistics, Harvard University, Cambridge, MA, USA.
- Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
| |
Collapse
|
9
|
Wu C, Xu G, Shen X, Pan W. A Regularization-Based Adaptive Test for High-Dimensional Generalized Linear Models. JOURNAL OF MACHINE LEARNING RESEARCH : JMLR 2020; 21:128. [PMID: 32802002 PMCID: PMC7425805] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Grants] [Figures] [Subscribe] [Scholar Register] [Indexed: 06/11/2023]
Abstract
In spite of its urgent importance in the era of big data, testing high-dimensional parameters in generalized linear models (GLMs) in the presence of high-dimensional nuisance parameters has been largely under-studied, especially with regard to constructing powerful tests for general (and unknown) alternatives. Most existing tests are powerful only against certain alternatives and may yield incorrect Type I error rates under high-dimensional nuisance parameter situations. In this paper, we propose the adaptive interaction sum of powered score (aiSPU) test in the framework of penalized regression with a non-convex penalty, called truncated Lasso penalty (TLP), which can maintain correct Type I error rates while yielding high statistical power across a wide range of alternatives. To calculate its p-values analytically, we derive its asymptotic null distribution. Via simulations, its superior finite-sample performance is demonstrated over several representative existing methods. In addition, we apply it and other representative tests to an Alzheimer's Disease Neuroimaging Initiative (ADNI) data set, detecting possible gene-gender interactions for Alzheimer's disease. We also put R package "aispu" implementing the proposed test on GitHub.
Collapse
Affiliation(s)
- Chong Wu
- Department of Statistics, Florida State University, FL, USA
| | - Gongjun Xu
- Department of Statistics, University of Michigan, MI, USA
| | - Xiaotong Shen
- School of Statistics, University of Minnesota, MN, USA
| | - Wei Pan
- Division of Biostatistics, University of Minnesota, MN, USA
| |
Collapse
|
10
|
Bocher O, Génin E. Rare variant association testing in the non-coding genome. Hum Genet 2020; 139:1345-1362. [PMID: 32500240 DOI: 10.1007/s00439-020-02190-y] [Citation(s) in RCA: 15] [Impact Index Per Article: 3.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/22/2020] [Accepted: 05/29/2020] [Indexed: 12/25/2022]
Abstract
The development of next-generation sequencing technologies has opened-up some new possibilities to explore the contribution of genetic variants to human diseases and in particular that of rare variants. Statistical methods have been developed to test for association with rare variants that require the definition of testing units and, in these testing units, the selection of qualifying variants to include in the test. In the coding regions of the genome, testing units are usually the different genes and qualifying variants are selected based on their functional effects on the encoded proteins. Extending these tests to the non-coding regions of the genome is challenging. Testing units are difficult to define as the non-coding genome organisation is still rather unknown. Qualifying variants are difficult to select as the functional impact of non-coding variants on gene expression is hard to predict. These difficulties could explain why very few investigators so far have analysed the non-coding parts of their whole genome sequencing data. These non-coding parts yet represent the vast majority of the genome and some studies suggest that they could play a major role in disease susceptibility. In this review, we discuss recent experimental and statistical developments to gain knowledge on the non-coding genome and how this knowledge could be used to include rare non-coding variants in association tests. We describe the few studies that have considered variants from the non-coding genome in association tests and how they managed to define testing units and select qualifying variants.
Collapse
Affiliation(s)
- Ozvan Bocher
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
| | - Emmanuelle Génin
- Génétique, Génomique Fonctionnelle Et Biotechnologies, Faculté de Médecine, Univ Brest, Inserm, Inserm UMR1078, Bâtiment E-IBRBS 2ieme étage, 22 avenue Camille Desmoulins, 29238, Brest Cedex 3, France.
- CHU Brest, Brest, France.
| |
Collapse
|
11
|
Zhang M, Gelfman S, McCarthy J, Harms MB, Moreno CAM, Goldstein DB, Allen AS. Incorporating external information to improve sparse signal detection in rare-variant gene-set-based analyses. Genet Epidemiol 2020; 44:330-338. [PMID: 32043633 DOI: 10.1002/gepi.22283] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/28/2019] [Revised: 12/17/2019] [Accepted: 01/27/2020] [Indexed: 01/30/2023]
Abstract
Gene-set analyses are used to assess whether there is any evidence of association with disease among a set of biologically related genes. Such an analysis typically treats all genes within the sets similarly, even though there is substantial, external, information concerning the likely importance of each gene within each set. For example, for traits that are under purifying selection, we would expect genes showing extensive genic constraint to be more likely to be trait associated than unconstrained genes. Here we improve gene-set analyses by incorporating such external information into a higher-criticism-based signal detection analysis. We show that when this external information is predictive of whether a gene is associated with disease, our approach can lead to a significant increase in power. Further, our approach is particularly powerful when the signal is sparse, that is when only a small number of genes within the set are associated with the trait. We illustrate our approach with a gene-set analysis of amyotrophic lateral sclerosis (ALS) and implicate a number of gene-sets containing SOD1 and NEK1 as well as showing enrichment of small p values for gene-sets containing known ALS genes. We implement our approach in the R package wHC.
Collapse
Affiliation(s)
- Mengqi Zhang
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| | - Sahar Gelfman
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Janice McCarthy
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina
| | - Matthew B Harms
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Department of Neurology, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - Cristiane A M Moreno
- Institute of Genomic Medicine, Columbia University, New York City, New York.,Center for Motor Neuron Biology and Disease, Columbia University, New York City, New York
| | - David B Goldstein
- Institute of Genomic Medicine, Columbia University, New York City, New York
| | - Andrew S Allen
- Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina.,Center for Genomic and Computational Biology, Duke University, Durham, North Carolina.,Center for Statistical Genetics and Genomics, Duke University, Durham, North Carolina
| |
Collapse
|
12
|
Yang T, Wu C, Wei P, Pan W. Integrating DNA sequencing and transcriptomic data for association analyses of low-frequency variants and lipid traits. Hum Mol Genet 2020; 29:515-526. [PMID: 31919517 PMCID: PMC7015848 DOI: 10.1093/hmg/ddz314] [Citation(s) in RCA: 5] [Impact Index Per Article: 1.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/24/2019] [Revised: 12/11/2019] [Accepted: 12/16/2019] [Indexed: 12/13/2022] Open
Abstract
Transcriptome-wide association studies (TWAS) integrate genome-wide association studies (GWAS) and transcriptomic data to showcase their improved statistical power of identifying gene-trait associations while, importantly, offering further biological insights. TWAS have thus far focused on common variants as available from GWAS. Compared with common variants, the findings for or even applications to low-frequency variants are limited and their underlying role in regulating gene expression is less clear. To fill this gap, we extend TWAS to integrating whole genome sequencing data with transcriptomic data for low-frequency variants. Using the data from the Framingham Heart Study, we demonstrate that low-frequency variants play an important and universal role in predicting gene expression, which is not completely due to linkage disequilibrium with the nearby common variants. By including low-frequency variants, in addition to common variants, we increase the predictivity of gene expression for 79% of the examined genes. Incorporating this piece of functional genomic information, we perform association testing for five lipid traits in two UK10K whole genome sequencing cohorts, hypothesizing that cis-expression quantitative trait loci, including low-frequency variants, are more likely to be trait-associated. We discover that two genes, LDLR and TTC22, are genome-wide significantly associated with low-density lipoprotein cholesterol based on 3203 subjects and that the association signals are largely independent of common variants. We further demonstrate that a joint analysis of both common and low-frequency variants identifies association signals that would be missed by testing on either common variants or low-frequency variants alone.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
13
|
Yang T, Kim J, Wu C, Ma Y, Wei P, Pan W. An adaptive test for meta-analysis of rare variant association studies. Genet Epidemiol 2020; 44:104-116. [PMID: 31830326 PMCID: PMC6980317 DOI: 10.1002/gepi.22273] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/15/2019] [Revised: 11/12/2019] [Accepted: 11/25/2019] [Indexed: 01/02/2023]
Abstract
Single genome-wide studies may be underpowered to detect trait-associated rare variants with moderate or weak effect sizes. As a viable alternative, meta-analysis is widely used to increase power by combining different studies. The power of meta-analysis critically depends on the underlying association patterns and heterogeneity levels, which are unknown and vary from locus to locus. However, existing methods mainly focus on one or only a few combinations of the association pattern and heterogeneity level, thus may lose power in many situations. To address this issue, we propose a general and unified framework by combining a class of tests including and beyond some existing ones, leading to high power across a wide range of scenarios. We demonstrate that the proposed test is more powerful than some existing methods in simulation studies, then show their performance with the NHLBI Exome-Sequencing Project (ESP) data. One gene (B4GALNT2) was found by our proposed test, but not by others, to be statistically significantly associated with plasma triglyceride. The signal was driven by African-ancestry subjects but it was previously reported to be associated with coronary artery disease among European-ancestry subjects. We implemented our method in an R package aSPUmeta, publicly available at https://github.com/ytzhong/metaRV and will be on CRAN soon.
Collapse
Affiliation(s)
- Tianzhong Yang
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Junghi Kim
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| | - Chong Wu
- Department of Statistics, Florida State University, Tallahassee, FL, USA
| | - Yiding Ma
- Department of Biostatistics and Data Science, School of Public Health, The University of Texas Health Science Center at Houston, Houston, TX, USA
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Peng Wei
- Department of Biostatistics, The University of Texas MD Anderson Cancer Center, Houston, TX, USA
| | - Wei Pan
- Division of Biostatistics, School of Public Health, University of Minnesota, Minneapolis, MN, USA
| |
Collapse
|
14
|
Iacobuzio-Donahue CA, Michael C, Baez P, Kappagantula R, Hooper JE, Hollman TJ. Cancer biology as revealed by the research autopsy. Nat Rev Cancer 2019; 19:686-697. [PMID: 31519982 PMCID: PMC7453489 DOI: 10.1038/s41568-019-0199-4] [Citation(s) in RCA: 48] [Impact Index Per Article: 8.0] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Accepted: 08/13/2019] [Indexed: 12/19/2022]
Abstract
A research autopsy is a post-mortem medical procedure performed on a deceased individual with the primary goal of collecting tissue to support basic and translational research. This approach has increasingly been used to investigate the pathophysiological mechanisms of cancer evolution, metastasis and treatment resistance. In this Review, we discuss the rationale for the use of research autopsies in cancer research and provide an evidence-based discussion of the quality of post-mortem tissues compared with other types of biospecimens. We also discuss the advantages of using post-mortem tissues over other types of biospecimens, including the large amounts of tissue that can be obtained and the extent of multiregion sampling that is achievable, which is not otherwise possible in living patients. We highlight how the research autopsy has supported the identification of the clonal origins and modes of spread among metastases, the extent that selective pressures imposed by treatments cause bottlenecks leading to parallel and convergent tumour evolution, and the creation of rare tissue banks and patient-derived model systems. Finally, we comment on the future of the research autopsy as an integral component of precision medicine strategies.
Collapse
Affiliation(s)
- Christine A Iacobuzio-Donahue
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- Human Oncology and Pathogenesis Program, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
- David M. Rubenstein Center for Pancreatic Cancer Research, Memorial Sloan Kettering Cancer Center, New York, NY, USA.
| | - Chelsea Michael
- Department of Health Informatics, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Priscilla Baez
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Rajya Kappagantula
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| | - Jody E Hooper
- Department of Pathology, The Johns Hopkins University, Baltimore, MD, USA
| | - Travis J Hollman
- Department of Pathology, Memorial Sloan Kettering Cancer Center, New York, NY, USA
- Parker Institute for Cancer Immunotherapy, Memorial Sloan Kettering Cancer Center, New York, NY, USA
| |
Collapse
|