1
|
Zhou Y, Cosentino J, Yun T, Biradar MI, Shreibati J, Lai D, Schwantes-An TH, Luben R, McCaw Z, Engmann J, Providencia R, Schmidt AF, Munroe P, Yang H, Carroll A, Khawaja AP, McLean CY, Behsaz B, Hormozdiari F. Utilizing multimodal AI to improve genetic analyses of cardiovascular traits. MEDRXIV : THE PREPRINT SERVER FOR HEALTH SCIENCES 2024:2024.03.19.24304547. [PMID: 38562791 PMCID: PMC10984061 DOI: 10.1101/2024.03.19.24304547] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 04/04/2024]
Abstract
Electronic health records, biobanks, and wearable biosensors contain multiple high-dimensional clinical data (HDCD) modalities (e.g., ECG, Photoplethysmography (PPG), and MRI) for each individual. Access to multimodal HDCD provides a unique opportunity for genetic studies of complex traits because different modalities relevant to a single physiological system (e.g., circulatory system) encode complementary and overlapping information. We propose a novel multimodal deep learning method, M-REGLE, for discovering genetic associations from a joint representation of multiple complementary HDCD modalities. We showcase the effectiveness of this model by applying it to several cardiovascular modalities. M-REGLE jointly learns a lower representation (i.e., latent factors) of multimodal HDCD using a convolutional variational autoencoder, performs genome wide association studies (GWAS) on each latent factor, then combines the results to study the genetics of the underlying system. To validate the advantages of M-REGLE and multimodal learning, we apply it to common cardiovascular modalities (PPG and ECG), and compare its results to unimodal learning methods in which representations are learned from each data modality separately, but the downstream genetic analyses are performed on the combined unimodal representations. M-REGLE identifies 19.3% more loci on the 12-lead ECG dataset, 13.0% more loci on the ECG lead I + PPG dataset, and its genetic risk score significantly outperforms the unimodal risk score at predicting cardiac phenotypes, such as atrial fibrillation (Afib), in multiple biobanks.
Collapse
Affiliation(s)
| | | | | | - Mahantesh I Biradar
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | | | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - Robert Luben
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | - Zachary McCaw
- Department of Biostatistics, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
| | - Jorgen Engmann
- Center for Translational Genomics, Population Science and Experimental Medicine, Institute of Cardiovascular Science, University College London, UK
| | - Rui Providencia
- Institute of Health Informatics Research, University College London, London, UK
- Electrophysiology Department, Barts Heart Centre, St. Bartholomew's Hospital, London, UK
| | - Amand Floriaan Schmidt
- Department of Cardiology; Amsterdam University Medical Centres, Amsterdam, The Netherlands
- Institute of Cardiovascular Science; University College London, London, UK
- Division of Heart and Lungs, University Medical Center Utrecht, Utrecht, Netherlands
| | - Patricia Munroe
- William Harvey Research Institute, Barts and the London Faculty of Medicine and Dentistry, Queen Mary University of London, London, UK
| | - Howard Yang
- Google Research, San Francisco CA, 94105 USA
| | | | - Anthony P Khawaja
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | | | | | | |
Collapse
|
2
|
Yun T, Cosentino J, Behsaz B, McCaw ZR, Hill D, Luben R, Lai D, Bates J, Yang H, Schwantes-An TH, Zhou Y, Khawaja AP, Carroll A, Hobbs BD, Cho MH, McLean CY, Hormozdiari F. Unsupervised representation learning improves genomic discovery and risk prediction for respiratory and circulatory functions and diseases. medRxiv 2023:2023.04.28.23289285. [PMID: 37163049 PMCID: PMC10168505 DOI: 10.1101/2023.04.28.23289285] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
High-dimensional clinical data are becoming more accessible in biobank-scale datasets. However, effectively utilizing high-dimensional clinical data for genetic discovery remains challenging. Here we introduce a general deep learning-based framework, REpresentation learning for Genetic discovery on Low-dimensional Embeddings (REGLE), for discovering associations between genetic variants and high-dimensional clinical data. REGLE uses convolutional variational autoencoders to compute a non-linear, low-dimensional, disentangled embedding of the data with highly heritable individual components. REGLE can incorporate expert-defined or clinical features and provides a framework to create accurate disease-specific polygenic risk scores (PRS) in datasets which have minimal expert phenotyping. We apply REGLE to both respiratory and circulatory systems: spirograms which measure lung function and photoplethysmograms (PPG) which measure blood volume changes. Genome-wide association studies on REGLE embeddings identify more genome-wide significant loci than existing methods and replicate known loci for both spirograms and PPG, demonstrating the generality of the framework. Furthermore, these embeddings are associated with overall survival. Finally, we construct a set of PRSs that improve predictive performance of asthma, chronic obstructive pulmonary disease, hypertension, and systolic blood pressure in multiple biobanks. Thus, REGLE embeddings can quantify clinically relevant features that are not currently captured in a standardized or automated way.
Collapse
Affiliation(s)
| | | | | | | | - Davin Hill
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA 94304, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
| | - Robert Luben
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | - John Bates
- Verily Life Sciences, South San Francisco, CA 94080, USA
| | | | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN 46202, USA
- Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN 46202, USA
| | | | - Anthony P. Khawaja
- NIHR Biomedical Research Centre at Moorfields Eye Hospital & UCL Institute of Ophthalmology, London EC1V 9EL, UK
- MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | | | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA 02115, USA
- Harvard Medical School, Boston, MA 02115, USA
| | | | | |
Collapse
|
3
|
Hill D, Torop M, Masoomi A, Castaldi PJ, Silverman EK, Bodduluri S, Bhatt SP, Yun T, McLean CY, Hormozdiari F, Dy J, Cho MH, Hobbs BD. Deep Learning Utilizing Suboptimal Spirometry Data to Improve Lung Function and Mortality Prediction in the UK Biobank. medRxiv 2023:2023.04.28.23289178. [PMID: 37162978 PMCID: PMC10168495 DOI: 10.1101/2023.04.28.23289178] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Grants] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 05/11/2023]
Abstract
Background Spirometry measures lung function by selecting the best of multiple efforts meeting pre-specified quality control (QC), and reporting two key metrics: forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC). We hypothesize that discarded submaximal and QC-failing data meaningfully contribute to the prediction of airflow obstruction and all-cause mortality. Methods We evaluated volume-time spirometry data from the UK Biobank. We identified "best" spirometry efforts as those passing QC with the maximum FVC. "Discarded" efforts were either submaximal or failed QC. To create a combined representation of lung function we implemented a contrastive learning approach, Spirogram-based Contrastive Learning Framework (Spiro-CLF), which utilized all recorded volume-time curves per participant and applied different transformations (e.g. flow-volume, flow-time). In a held-out 20% testing subset we applied the Spiro-CLF representation of a participant's overall lung function to 1) binary predictions of FEV1/FVC < 0.7 and FEV1 Percent Predicted (FEV1PP) < 80%, indicative of airflow obstruction, and 2) Cox regression for all-cause mortality. Findings We included 940,705 volume-time curves from 352,684 UK Biobank participants with 2-3 spirometry efforts per individual (66.7% with 3 efforts) and at least one QC-passing spirometry effort. Of all spirometry efforts, 24.1% failed QC and 37.5% were submaximal. Spiro-CLF prediction of FEV1/FVC < 0.7 utilizing discarded spirometry efforts had an Area under the Receiver Operating Characteristics (AUROC) of 0.981 (0.863 for FEV1PP prediction). Incorporating discarded spirometry efforts in all-cause mortality prediction was associated with a concordance index (c-index) of 0.654, which exceeded the c-indices from FEV1 (0.590), FVC (0.559), or FEV1/FVC (0.599) from each participant's single best effort. Interpretation A contrastive learning model using raw spirometry curves can accurately predict lung function using submaximal and QC-failing efforts. This model also has superior prediction of all-cause mortality compared to standard lung function measurements. Funding MHC is supported by NIH R01HL137927, R01HL135142, HL147148, and HL089856.BDH is supported by NIH K08HL136928, U01 HL089856, and an Alpha-1 Foundation Research Grant.DH is supported by NIH 2T32HL007427-41EKS is supported by NIH R01 HL152728, R01 HL147148, U01 HL089856, R01 HL133135, P01 HL132825, and P01 HL114501.PJC is supported by NIH R01HL124233 and R01HL147326.SPB is supported by NIH R01HL151421 and UH3HL155806.TY, FH, and CYM are employees of Google LLC.
Collapse
Affiliation(s)
- Davin Hill
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Max Torop
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
| | - Aria Masoomi
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
| | - Peter J. Castaldi
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Division of General Medicine and Primary Care, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Edwin K. Silverman
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Sandeep Bodduluri
- Division of Pulmonary, Allergy and Critical Care Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | - Surya P. Bhatt
- Division of Pulmonary, Allergy and Critical Care Medicine, University of Alabama at Birmingham, Birmingham, AL, USA
| | | | | | | | - Jennifer Dy
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
| | - Michael H. Cho
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| | - Brian D. Hobbs
- Channing Division of Network Medicine, Brigham and Women’s Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women’s Hospital, Boston, MA, USA
| |
Collapse
|
4
|
Cosentino J, Behsaz B, Alipanahi B, McCaw ZR, Hill D, Schwantes-An TH, Lai D, Carroll A, Hobbs BD, Cho MH, McLean CY, Hormozdiari F. Inference of chronic obstructive pulmonary disease with deep learning on raw spirograms identifies new genetic loci and improves risk models. Nat Genet 2023; 55:787-795. [PMID: 37069358 DOI: 10.1038/s41588-023-01372-4] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/02/2022] [Accepted: 03/14/2023] [Indexed: 04/19/2023]
Abstract
Chronic obstructive pulmonary disease (COPD), the third leading cause of death worldwide, is highly heritable. While COPD is clinically defined by applying thresholds to summary measures of lung function, a quantitative liability score has more power to identify genetic signals. Here we train a deep convolutional neural network on noisy self-reported and International Classification of Diseases labels to predict COPD case-control status from high-dimensional raw spirograms and use the model's predictions as a liability score. The machine-learning-based (ML-based) liability score accurately discriminates COPD cases and controls, and predicts COPD-related hospitalization without any domain-specific knowledge. Moreover, the ML-based liability score is associated with overall survival and exacerbation events. A genome-wide association study on the ML-based liability score replicates existing COPD and lung function loci and also identifies 67 new loci. Lastly, our method provides a general framework to use ML methods and medical-record-based labels that does not require domain knowledge or expert curation to improve disease prediction and genomic discovery for drug design.
Collapse
Affiliation(s)
| | | | | | | | - Davin Hill
- Department of Electrical and Computer Engineering, Northeastern University, Boston, MA, USA
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
| | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
- Division of Cardiology, Department of Medicine, Indiana University School of Medicine, Indianapolis, IN, USA
| | - Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN, USA
| | | | - Brian D Hobbs
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | - Michael H Cho
- Channing Division of Network Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Division of Pulmonary and Critical Care Medicine, Brigham and Women's Hospital, Boston, MA, USA
- Harvard Medical School, Boston, MA, USA
| | | | | |
Collapse
|
5
|
Ahadi S, Wilson KA, Babenko B, McLean CY, Bryant D, Pritchard O, Kumar A, Carrera EM, Lamy R, Stewart JM, Varadarajan A, Berndl M, Kapahi P, Bashir A. Longitudinal fundus imaging and its genome-wide association analysis provide evidence for a human retinal aging clock. eLife 2023; 12:e82364. [PMID: 36975205 PMCID: PMC10110236 DOI: 10.7554/elife.82364] [Citation(s) in RCA: 2] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 08/02/2022] [Accepted: 03/22/2023] [Indexed: 03/29/2023] Open
Abstract
Biological age, distinct from an individual's chronological age, has been studied extensively through predictive aging clocks. However, these clocks have limited accuracy in short time-scales. Here we trained deep learning models on fundus images from the EyePACS dataset to predict individuals' chronological age. Our retinal aging clocking, 'eyeAge', predicted chronological age more accurately than other aging clocks (mean absolute error of 2.86 and 3.30 years on quality-filtered data from EyePACS and UK Biobank, respectively). Additionally, eyeAge was independent of blood marker-based measures of biological age, maintaining an all-cause mortality hazard ratio of 1.026 even when adjusted for phenotypic age. The individual-specific nature of eyeAge was reinforced via multiple GWAS hits in the UK Biobank cohort. The top GWAS locus was further validated via knockdown of the fly homolog, Alk, which slowed age-related decline in vision in flies. This study demonstrates the potential utility of a retinal aging clock for studying aging and age-related diseases and quantitatively measuring aging on very short time-scales, opening avenues for quick and actionable evaluation of gero-protective therapeutics.
Collapse
Affiliation(s)
- Sara Ahadi
- Google ResearchMountain ViewUnited States
| | | | | | | | | | | | - Ajay Kumar
- Department of Biophysics, Post Graduate Institute of Medical Education and ResearchChandigarhIndia
| | | | - Ricardo Lamy
- Department of Ophthalmology, Zuckerberg San Francisco General Hospital and Trauma CenterSan FranciscoUnited States
| | - Jay M Stewart
- Department of Ophthalmology, University of California, San FranciscoSan FranciscoUnited States
| | | | | | - Pankaj Kapahi
- Buck Institute for Research on AgingNovatoUnited States
| | - Ali Bashir
- Google ResearchMountain ViewUnited States
| |
Collapse
|
6
|
O'Connell J, Yun T, Moreno M, Li H, Litterman N, Kolesnikov A, Noblin E, Chang PC, Shastri A, Dorfman EH, Shringarpure S, Auton A, Carroll A, McLean CY. A population-specific reference panel for improved genotype imputation in African Americans. Commun Biol 2021; 4:1269. [PMID: 34741098 PMCID: PMC8571350 DOI: 10.1038/s42003-021-02777-9] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/15/2021] [Accepted: 10/12/2021] [Indexed: 12/17/2022] Open
Abstract
There is currently a dearth of accessible whole genome sequencing (WGS) data for individuals residing in the Americas with Sub-Saharan African ancestry. We generated whole genome sequencing data at intermediate (15×) coverage for 2,294 individuals with large amounts of Sub-Saharan African ancestry, predominantly Atlantic African admixed with varying amounts of European and American ancestry. We performed extensive comparisons of variant callers, phasing algorithms, and variant filtration on these data to construct a high quality imputation panel containing data from 2,269 unrelated individuals. With the exception of the TOPMed imputation server (which notably cannot be downloaded), our panel substantially outperformed other available panels when imputing African American individuals. The raw sequencing data, variant calls and imputation panel for this cohort are all freely available via dbGaP and should prove an invaluable resource for further study of admixed African genetics.
Collapse
Affiliation(s)
| | | | | | - Helen Li
- Google Health, Cambridge, MA, USA
| | | | | | | | | | | | | | | | | | | | | |
Collapse
|
7
|
Alipanahi B, Hormozdiari F, Behsaz B, Cosentino J, McCaw ZR, Schorsch E, Sculley D, Dorfman EH, Foster PJ, Peng LH, Phene S, Hammel N, Carroll A, Khawaja AP, McLean CY. Large-scale machine-learning-based phenotyping significantly improves genomic discovery for optic nerve head morphology. Am J Hum Genet 2021; 108:1217-1230. [PMID: 34077760 PMCID: PMC8322934 DOI: 10.1016/j.ajhg.2021.05.004] [Citation(s) in RCA: 21] [Impact Index Per Article: 7.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/07/2020] [Accepted: 05/10/2021] [Indexed: 02/06/2023] Open
Abstract
Genome-wide association studies (GWASs) require accurate cohort phenotyping, but expert labeling can be costly, time intensive, and variable. Here, we develop a machine learning (ML) model to predict glaucomatous optic nerve head features from color fundus photographs. We used the model to predict vertical cup-to-disc ratio (VCDR), a diagnostic parameter and cardinal endophenotype for glaucoma, in 65,680 Europeans in the UK Biobank (UKB). A GWAS of ML-based VCDR identified 299 independent genome-wide significant (GWS; p ≤ 5 × 10-8) hits in 156 loci. The ML-based GWAS replicated 62 of 65 GWS loci from a recent VCDR GWAS in the UKB for which two ophthalmologists manually labeled images for 67,040 Europeans. The ML-based GWAS also identified 93 novel loci, significantly expanding our understanding of the genetic etiologies of glaucoma and VCDR. Pathway analyses support the biological significance of the novel hits to VCDR: select loci near genes involved in neuronal and synaptic biology or harboring variants are known to cause severe Mendelian ophthalmic disease. Finally, the ML-based GWAS results significantly improve polygenic prediction of VCDR and primary open-angle glaucoma in the independent EPIC-Norfolk cohort.
Collapse
Affiliation(s)
| | | | | | | | | | | | - D Sculley
- Google Health, Cambridge, MA 02142, USA
| | | | - Paul J Foster
- NIHR Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology, London EC1V 9EL, UK
| | | | | | | | | | - Anthony P Khawaja
- NIHR Biomedical Research Centre at Moorfields Eye Hospital and UCL Institute of Ophthalmology, London EC1V 9EL, UK; MRC Epidemiology Unit, University of Cambridge, Cambridge CB2 0SL, UK
| | | |
Collapse
|
8
|
Lai D, Alipanahi B, Fontanillas P, Schwantes-An TH, Aasly J, Alcalay RN, Beecham GW, Berg D, Bressman S, Brice A, Brockman K, Clark L, Cookson M, Das S, Van Deerlin V, Follett J, Farrer MJ, Trinh J, Gasser T, Goldwurm S, Gustavsson E, Klein C, Lang AE, Langston JW, Latourelle J, Lynch T, Marder K, Marras C, Martin ER, McLean CY, Mejia-Santana H, Molho E, Myers RH, Nuytemans K, Ozelius L, Payami H, Raymond D, Rogaeva E, Rogers MP, Ross OA, Samii A, Saunders-Pullman R, Schüle B, Schulte C, Scott WK, Tanner C, Tolosa E, Tomkins JE, Vilas D, Trojanowski JQ, Uitti R, Vance JM, Visanji NP, Wszolek ZK, Zabetian CP, Mirelman A, Giladi N, Orr Urtreger A, Cannon P, Fiske B, Foroud T. Genomewide Association Studies of LRRK2 Modifiers of Parkinson's Disease. Ann Neurol 2021; 90:76-88. [PMID: 33938021 PMCID: PMC8252519 DOI: 10.1002/ana.26094] [Citation(s) in RCA: 12] [Impact Index Per Article: 4.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2020] [Revised: 04/28/2021] [Accepted: 04/29/2021] [Indexed: 02/03/2023]
Abstract
Objective The aim of this study was to search for genes/variants that modify the effect of LRRK2 mutations in terms of penetrance and age‐at‐onset of Parkinson's disease. Methods We performed the first genomewide association study of penetrance and age‐at‐onset of Parkinson's disease in LRRK2 mutation carriers (776 cases and 1,103 non‐cases at their last evaluation). Cox proportional hazard models and linear mixed models were used to identify modifiers of penetrance and age‐at‐onset of LRRK2 mutations, respectively. We also investigated whether a polygenic risk score derived from a published genomewide association study of Parkinson's disease was able to explain variability in penetrance and age‐at‐onset in LRRK2 mutation carriers. Results A variant located in the intronic region of CORO1C on chromosome 12 (rs77395454; p value = 2.5E‐08, beta = 1.27, SE = 0.23, risk allele: C) met genomewide significance for the penetrance model. Co‐immunoprecipitation analyses of LRRK2 and CORO1C supported an interaction between these 2 proteins. A region on chromosome 3, within a previously reported linkage peak for Parkinson's disease susceptibility, showed suggestive associations in both models (penetrance top variant: p value = 1.1E‐07; age‐at‐onset top variant: p value = 9.3E‐07). A polygenic risk score derived from publicly available Parkinson's disease summary statistics was a significant predictor of penetrance, but not of age‐at‐onset. Interpretation This study suggests that variants within or near CORO1C may modify the penetrance of LRRK2 mutations. In addition, common Parkinson's disease associated variants collectively increase the penetrance of LRRK2 mutations. ANN NEUROL 2021;90:82–94
Collapse
Affiliation(s)
- Dongbing Lai
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| | | | | | - Tae-Hwi Schwantes-An
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| | - Jan Aasly
- Department of Neurology, St. Olavs Hospital, Trondheim, Norway
| | - Roy N Alcalay
- Department of Neurology, Columbia University, New York, NY
| | - Gary W Beecham
- John P. Hussman Institute for Human Genomics and Dr. John T. Macdonald Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL
| | - Daniela Berg
- Department of Neurology, Christian-Albrechts-University of Kiel, Kiel, Germany.,Department of Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany
| | - Susan Bressman
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Alexis Brice
- Sorbonne Université, Institut du Cerveau et de la Moelle épinière (ICM), AP-HP, Inserm, CNRS, University Hospital Pitié-Salpêtrière, Paris, France
| | - Kathrin Brockman
- Department of Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.,German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - Lorraine Clark
- Department of Pathology and Cell Biology, Columbia University, New York, NY
| | - Mark Cookson
- Laboratory of Neurogenetics, National Institute of Aging, National Institute of Health, Bethesda, MD
| | | | - Vivianna Van Deerlin
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA
| | - Jordan Follett
- Laboratory of Neurogenetics and Neuroscience, Fixel Institute for Neurological Diseases, McKnight Brain Institute, L5-101D, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL
| | - Matthew J Farrer
- Laboratory of Neurogenetics and Neuroscience, Fixel Institute for Neurological Diseases, McKnight Brain Institute, L5-101D, UF Clinical and Translational Science Institute, University of Florida, Gainesville, FL
| | - Joanne Trinh
- Institute of Neurogenetics, University of Luebeck, Luebeck, Germany
| | - Thomas Gasser
- Department of Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.,German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | | | - Emil Gustavsson
- Centre for Applied Neurogenetics, University of British Columbia, Vancouver, Canada
| | - Christine Klein
- Institute of Neurogenetics, University of Luebeck, Luebeck, Germany
| | - Anthony E Lang
- The Edmond J. Safra Program in Parkinson's Disease and the Morton and Gloria Shulman Movement Disorders Clinic, Toronto Western Hospital, Toronto, Canada
| | - J William Langston
- Departments of Neurology, Neuroscience, and Pathology, Stanford University School of Medicine, Stanford, CA
| | | | - Timothy Lynch
- Dublin Neurological Institute at the Mater Misericordiae University Hospital, Conway Institute of Biomolecular and Biomedical Research, University College Dublin, Dublin, Ireland
| | - Karen Marder
- Department of Neurology and Psychiatry, Taub Institute and Sergievsky Center, Columbia University Vagelos College of Physicians and Surgeons, New York, NY
| | - Connie Marras
- The Edmond J. Safra Program in Parkinson's Disease and the Morton and Gloria Shulman Movement Disorders Clinic, Toronto Western Hospital, Toronto, Canada
| | - Eden R Martin
- John P. Hussman Institute for Human Genomics and Dr. John T. Macdonald Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL
| | - Cory Y McLean
- 23andMe, Inc., Sunnyvale, CA.,Google LLC, Cambridge, MA
| | | | - Eric Molho
- Department of Neurology, Albany Medical College, Albany, NY
| | | | - Karen Nuytemans
- John P. Hussman Institute for Human Genomics and Dr. John T. Macdonald Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL
| | - Laurie Ozelius
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Haydeh Payami
- Department of Neurology, University of Alabama at Birmingham, Birmingham, AL
| | - Deborah Raymond
- Department of Neurology, Icahn School of Medicine at Mount Sinai, New York, NY
| | - Ekaterina Rogaeva
- Tanz Centre for Research in Neurodegenerative Diseases and Department of Neurology, University of Toronto, Toronto, Canada
| | - Michael P Rogers
- Department of General Surgery, University of South Florida Morsani College of Medicine, Tampa, FL
| | - Owen A Ross
- Departments of Neuroscience and Clinical Genomics, Mayo Clinic, Jacksonville, FL.,School of Medicine and Medical Science, University College Dublin, Dublin, Ireland
| | - Ali Samii
- VA Puget Sound Health Care System and Department of Neurology, University of Washington, Seattle, WA
| | | | - Birgitt Schüle
- Department of Pathology, Stanford University School of Medicine, Stanford, CA
| | - Claudia Schulte
- Department of Neurodegenerative Diseases, Hertie Institute for Clinical Brain Research, University of Tübingen, Tübingen, Germany.,German Center for Neurodegenerative Diseases (DZNE), Tübingen, Germany
| | - William K Scott
- John P. Hussman Institute for Human Genomics and Dr. John T. Macdonald Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL
| | - Caroline Tanner
- University of California, San Francisco Veterans Affairs Health Care System, San Francisco, CA
| | - Eduardo Tolosa
- Parkinson Disease and Movement Disorders Unit, Hospital Clínic Universitari, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona (UB), Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Barcelona, Spain
| | | | - Dolores Vilas
- Parkinson Disease and Movement Disorders Unit, Hospital Clínic Universitari, Institut d'Investigacions Biomèdiques August Pi i Sunyer (IDIBAPS), University of Barcelona (UB), Centro de Investigación Biomédica en Red sobre Enfermedades Neurodegenerativas (CIBERNED), Barcelona, Spain
| | - John Q Trojanowski
- Department of Pathology and Laboratory Medicine, University of Pennsylvania, Philadelphia, PA
| | -
- 23andMe, Inc., Sunnyvale, CA
| | - Ryan Uitti
- Department of Neurology, Mayo Clinic, Jacksonville, FL
| | - Jeffery M Vance
- John P. Hussman Institute for Human Genomics and Dr. John T. Macdonald Department of Human Genetics, University of Miami, Miller School of Medicine, Miami, FL
| | - Naomi P Visanji
- The Edmond J. Safra Program in Parkinson's Disease and the Morton and Gloria Shulman Movement Disorders Clinic, Toronto Western Hospital, Toronto, Canada
| | | | - Cyrus P Zabetian
- VA Puget Sound Health Care System and Department of Neurology, University of Washington, Seattle, WA
| | - Anat Mirelman
- Tel Aviv Sourasky Medical Center, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Nir Giladi
- Tel Aviv Sourasky Medical Center, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | - Avi Orr Urtreger
- Tel Aviv Sourasky Medical Center, Sackler Faculty of Medicine and Sagol School of Neuroscience, Tel Aviv University, Tel Aviv, Israel
| | | | - Brian Fiske
- The Michael J. Fox Foundation for Parkinson's Research, New York, NY
| | - Tatiana Foroud
- Department of Medical and Molecular Genetics, Indiana University School of Medicine, Indianapolis, IN
| |
Collapse
|
9
|
Yun T, Li H, Chang PC, Lin MF, Carroll A, McLean CY. Accurate, scalable cohort variant calls using DeepVariant and GLnexus. Bioinformatics 2021; 36:5582-5589. [PMID: 33399819 PMCID: PMC8023681 DOI: 10.1093/bioinformatics/btaa1081] [Citation(s) in RCA: 66] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/03/2020] [Revised: 09/27/2020] [Accepted: 12/16/2020] [Indexed: 12/30/2022] Open
Abstract
Motivation Population-scale sequenced cohorts are foundational resources for genetic analyses, but processing raw reads into analysis-ready cohort-level variants remains challenging. Results We introduce an open-source cohort-calling method that uses the highly accurate caller DeepVariant and scalable merging tool GLnexus. Using callset quality metrics based on variant recall and precision in benchmark samples and Mendelian consistency in father-mother-child trios, we optimize the method across a range of cohort sizes, sequencing methods and sequencing depths. The resulting callsets show consistent quality improvements over those generated using existing best practices with reduced cost. We further evaluate our pipeline in the deeply sequenced 1000 Genomes Project (1KGP) samples and show superior callset quality metrics and imputation reference panel performance compared to an independently generated GATK Best Practices pipeline. Availability and implementation We publicly release the 1KGP individual-level variant calls and cohort callset (https://console.cloud.google.com/storage/browser/brain-genomics-public/research/cohort/1KGP) to foster additional development and evaluation of cohort merging methods as well as broad studies of genetic variation. Both DeepVariant (https://github.com/google/deepvariant) and GLnexus (https://github.com/dnanexus-rnd/GLnexus) are open-source, and the optimized GLnexus setup discovered in this study is also integrated into GLnexus public releases v1.2.2 and later. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Taedong Yun
- Google Health, Cambridge, MA 02142 and Palo Alto, CA, USA
| | - Helen Li
- Google Health, Cambridge, MA 02142 and Palo Alto, CA, USA
| | - Pi-Chuan Chang
- Google Health, Cambridge, MA 02142 and Palo Alto, CA, USA
| | | | - Andrew Carroll
- Google Health, Cambridge, MA 02142 and Palo Alto, CA, USA
| | - Cory Y McLean
- Google Health, Cambridge, MA 02142 and Palo Alto, CA, USA
| |
Collapse
|
10
|
McLean CY, Hwang Y, Poplin R, DePristo MA. GenomeWarp: an alignment-based variant coordinate transformation. Bioinformatics 2020; 35:4389-4391. [PMID: 30916319 PMCID: PMC6821237 DOI: 10.1093/bioinformatics/btz218] [Citation(s) in RCA: 2] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/01/2018] [Revised: 02/28/2019] [Accepted: 03/23/2019] [Indexed: 11/17/2022] Open
Abstract
Summary Reference genomes are refined to reflect error corrections and other improvements. While this process improves novel data generation and analysis, incorporating data analyzed on an older reference genome assembly requires transforming the coordinates and representations of the data to the new assembly. Multiple tools exist to perform this transformation for coordinate-only data types, but none supports accurate transformation of genome-wide short variation. Here we present GenomeWarp, a tool for efficiently transforming variants between genome assemblies. GenomeWarp transforms regions and short variants in a conservative manner to minimize false positive and negative variants in the target genome, and converts over 99% of regions and short variants from a representative human genome. Availability and implementation GenomeWarp is written in Java. All source code and the user manual are freely available at https://github.com/verilylifesciences/genomewarp. Supplementary information Supplementary data are available at Bioinformatics online.
Collapse
Affiliation(s)
- Cory Y McLean
- Verily Life Sciences, Mountain View, CA, USA.,Google Inc., Mountain View, CA, USA
| | - Yeongwoo Hwang
- Verily Life Sciences, Mountain View, CA, USA.,Department of Computer Science, Carnegie Mellon University, Pittsburgh, PA, USA
| | - Ryan Poplin
- Verily Life Sciences, Mountain View, CA, USA.,Google Inc., Mountain View, CA, USA
| | - Mark A DePristo
- Verily Life Sciences, Mountain View, CA, USA.,Google Inc., Mountain View, CA, USA
| |
Collapse
|
11
|
Zook JM, McDaniel J, Olson ND, Wagner J, Parikh H, Heaton H, Irvine SA, Trigg L, Truty R, McLean CY, De La Vega FM, Xiao C, Sherry S, Salit M. An open resource for accurately benchmarking small variant and reference calls. Nat Biotechnol 2019; 37:561-566. [PMID: 30936564 PMCID: PMC6500473 DOI: 10.1038/s41587-019-0074-6] [Citation(s) in RCA: 180] [Impact Index Per Article: 36.0] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2018] [Accepted: 02/19/2019] [Indexed: 12/30/2022]
Abstract
Benchmark small variant calls are required for developing, optimizing and assessing the performance of sequencing and bioinformatics methods. Here, as part of the Genome in a Bottle (GIAB) Consortium, we apply a reproducible, cloud-based pipeline to integrate multiple short- and linked-read sequencing datasets and provide benchmark calls for human genomes. We generate benchmark calls for one previously analyzed GIAB sample, as well as six genomes from the Personal Genome Project. These new genomes have broad, open consent, making this a 'first of its kind' resource that is available to the community for multiple downstream applications. We produce 17% more benchmark single nucleotide variations, 176% more indels and 12% larger benchmark regions than previously published GIAB benchmarks. We demonstrate that this benchmark reliably identifies errors in existing callsets and highlight challenges in interpreting performance metrics when using benchmarks that are not perfect or comprehensive. Finally, we identify strengths and weaknesses of callsets by stratifying performance according to variant type and genome context.
Collapse
Affiliation(s)
- Justin M Zook
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA.
| | - Jennifer McDaniel
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Nathan D Olson
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Justin Wagner
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Hemang Parikh
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
| | - Haynes Heaton
- 10x Genomics, Pleasanton, CA, USA
- Wellcome Trust Sanger Institute,, Hinxton, Cambridge, UK
| | | | - Len Trigg
- Real Time Genomics, Hamilton, New Zealand
| | | | - Cory Y McLean
- Verily Life Sciences, South San Francisco, CA, USA
- Google Inc., Mountain View, CA, USA
| | - Francisco M De La Vega
- Department of Biomedical Data Science, Stanford University School of Medicine, Stanford, CA, USA
| | - Chunlin Xiao
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Stephen Sherry
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD, USA
| | - Marc Salit
- Material Measurement Laboratory, National Institute of Standards and Technology, Gaithersburg, MD, USA
- Joint Initiative for Metrology in Biology, Stanford, CA, USA
- Department of Bioengineering, Stanford University, Stanford, CA, USA
| |
Collapse
|
12
|
Kelley DR, Reshef YA, Bileschi M, Belanger D, McLean CY, Snoek J. Sequential regulatory activity prediction across chromosomes with convolutional neural networks. Genome Res 2018; 28:739-750. [PMID: 29588361 PMCID: PMC5932613 DOI: 10.1101/gr.227819.117] [Citation(s) in RCA: 206] [Impact Index Per Article: 34.3] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 03/23/2018] [Indexed: 01/10/2023]
Abstract
Models for predicting phenotypic outcomes from genotypes have important applications to understanding genomic function and improving human health. Here, we develop a machine-learning system to predict cell-type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. By use of convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable fine mapping of disease loci.
Collapse
Affiliation(s)
| | - Yakir A Reshef
- Department of Computer Science, Harvard University, Cambridge, Massachusetts 02138, USA
| | | | | | | | - Jasper Snoek
- Google Brain, Cambridge, Massachusetts 02142, USA
| |
Collapse
|
13
|
Minikel EV, Vallabh SM, Lek M, Estrada K, Samocha KE, Sathirapongsasuti JF, McLean CY, Tung JY, Yu LPC, Gambetti P, Blevins J, Zhang S, Cohen Y, Chen W, Yamada M, Hamaguchi T, Sanjo N, Mizusawa H, Nakamura Y, Kitamoto T, Collins SJ, Boyd A, Will RG, Knight R, Ponto C, Zerr I, Kraus TFJ, Eigenbrod S, Giese A, Calero M, de Pedro-Cuesta J, Haïk S, Laplanche JL, Bouaziz-Amar E, Brandel JP, Capellari S, Parchi P, Poleggi A, Ladogana A, O'Donnell-Luria AH, Karczewski KJ, Marshall JL, Boehnke M, Laakso M, Mohlke KL, Kähler A, Chambert K, McCarroll S, Sullivan PF, Hultman CM, Purcell SM, Sklar P, van der Lee SJ, Rozemuller A, Jansen C, Hofman A, Kraaij R, van Rooij JGJ, Ikram MA, Uitterlinden AG, van Duijn CM, Daly MJ, MacArthur DG. Quantifying prion disease penetrance using large population control cohorts. Sci Transl Med 2016; 8:322ra9. [PMID: 26791950 DOI: 10.1126/scitranslmed.aad5169] [Citation(s) in RCA: 228] [Impact Index Per Article: 28.5] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/14/2022]
Abstract
More than 100,000 genetic variants are reported to cause Mendelian disease in humans, but the penetrance-the probability that a carrier of the purported disease-causing genotype will indeed develop the disease-is generally unknown. We assess the impact of variants in the prion protein gene (PRNP) on the risk of prion disease by analyzing 16,025 prion disease cases, 60,706 population control exomes, and 531,575 individuals genotyped by 23andMe Inc. We show that missense variants in PRNP previously reported to be pathogenic are at least 30 times more common in the population than expected on the basis of genetic prion disease prevalence. Although some of this excess can be attributed to benign variants falsely assigned as pathogenic, other variants have genuine effects on disease susceptibility but confer lifetime risks ranging from <0.1 to ~100%. We also show that truncating variants in PRNP have position-dependent effects, with true loss-of-function alleles found in healthy older individuals, a finding that supports the safety of therapeutic suppression of prion protein expression.
Collapse
Affiliation(s)
- Eric Vallabh Minikel
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA. Prion Alliance, Cambridge, MA 02139, USA.
| | - Sonia M Vallabh
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA. Prion Alliance, Cambridge, MA 02139, USA
| | - Monkol Lek
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Karol Estrada
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Kaitlin E Samocha
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Program in Biological and Biomedical Sciences, Harvard Medical School, Boston, MA 02115, USA
| | | | - Cory Y McLean
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Joyce Y Tung
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Linda P C Yu
- Research, 23andMe Inc., Mountain View, CA 94041, USA
| | - Pierluigi Gambetti
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Janis Blevins
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Shulin Zhang
- University Hospitals Case Medical Center, Cleveland, OH 44106, USA
| | - Yvonne Cohen
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Wei Chen
- National Prion Disease Pathology Surveillance Center, Cleveland, OH 44106, USA
| | - Masahito Yamada
- Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa 920-8640, Japan
| | - Tsuyoshi Hamaguchi
- Department of Neurology and Neurobiology of Aging, Kanazawa University Graduate School of Medical Sciences, Kanazawa 920-8640, Japan
| | - Nobuo Sanjo
- Department of Neurology and Neurological Science, Graduate School, Tokyo Medical and Dental University, Tokyo 113-8519, Japan
| | - Hidehiro Mizusawa
- National Center Hospital, National Center of Neurology and Psychiatry, Tokyo 187-8551, Japan
| | - Yosikazu Nakamura
- Department of Public Health, Jichi Medical University, Shimotsuke 329-0498, Japan
| | - Tetsuyuki Kitamoto
- Department of Neurological Science, Tohoku University Graduate School of Medicine, Sendai 980-8575, Japan
| | - Steven J Collins
- Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Alison Boyd
- Australian National Creutzfeldt-Jakob Disease Registry, The University of Melbourne, Parkville, Victoria 3010, Australia
| | - Robert G Will
- National Creutzfeldt-Jakob Disease Research & Surveillance Unit, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Richard Knight
- National Creutzfeldt-Jakob Disease Research & Surveillance Unit, Western General Hospital, Edinburgh EH4 2XU, UK
| | - Claudia Ponto
- National Reference Center for the Surveillance of Human Transmissible Spongiform Encephalopathies, Georg-August-University, Goettingen 37073, Germany
| | - Inga Zerr
- National Reference Center for the Surveillance of Human Transmissible Spongiform Encephalopathies, Georg-August-University, Goettingen 37073, Germany
| | - Theo F J Kraus
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Sabina Eigenbrod
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Armin Giese
- Center for Neuropathology and Prion Research (ZNP), Ludwig-Maximilians-University, Munich 81377, Germany
| | - Miguel Calero
- Centro de Investigación Biomédica en Red de Enfermedades Neurodegenerativas, Instituto de Salud Carlos III, Madrid 28031, Spain
| | - Jesús de Pedro-Cuesta
- Centro de Investigación Biomédica en Red de Enfermedades Neurodegenerativas, Instituto de Salud Carlos III, Madrid 28031, Spain
| | - Stéphane Haïk
- INSERM U 1127, CNRS UMR 7225, Sorbonne Universités, Pierre and Marie Curie University Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle Epinière, 75013 Paris, France. Assistance Publique-Hôpitaux de Paris (AP-HP), Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France
| | - Jean-Louis Laplanche
- AP-HP, Service de Biochimie et Biologie Moléculaire, Hôpital Lariboisière, 75010 Paris, France
| | - Elodie Bouaziz-Amar
- AP-HP, Service de Biochimie et Biologie Moléculaire, Hôpital Lariboisière, 75010 Paris, France
| | - Jean-Philippe Brandel
- INSERM U 1127, CNRS UMR 7225, Sorbonne Universités, Pierre and Marie Curie University Paris 06 UMR S 1127, Institut du Cerveau et de la Moelle Epinière, 75013 Paris, France. Assistance Publique-Hôpitaux de Paris (AP-HP), Cellule Nationale de Référence des Maladies de Creutzfeldt-Jakob, Groupe Hospitalier Pitié-Salpêtrière, F-75013 Paris, France
| | - Sabina Capellari
- Istituto di Ricovero e Cura a Carattere Scientifico, Institute of Neurological Sciences, Bologna 40123, Italy. Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna 40126, Italy
| | - Piero Parchi
- Istituto di Ricovero e Cura a Carattere Scientifico, Institute of Neurological Sciences, Bologna 40123, Italy. Department of Biomedical and Neuromotor Sciences, University of Bologna, Bologna 40126, Italy
| | - Anna Poleggi
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Anna Ladogana
- Department of Cell Biology and Neurosciences, Istituto Superiore di Sanità, Rome 00161, Italy
| | - Anne H O'Donnell-Luria
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA. Division of Genetics and Genomics, Boston Children's Hospital, Boston, MA 02115, USA
| | - Konrad J Karczewski
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Jamie L Marshall
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Michael Boehnke
- Department of Biostatistics and Center for Statistical Genetics, University of Michigan School of Public Health, Ann Arbor, MI 48109, USA
| | - Markku Laakso
- Department of Medicine, University of Eastern Finland and Kuopio University Hospital, Kuopio 70210, Finland
| | - Karen L Mohlke
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, USA
| | - Anna Kähler
- Karolinska Institutet, Stockholm SE-171 77, Sweden
| | - Kimberly Chambert
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Steven McCarroll
- Stanley Center for Psychiatric Research, Broad Institute of MIT and Harvard, Cambridge, MA 02142, USA
| | - Patrick F Sullivan
- Department of Genetics, University of North Carolina School of Medicine, Chapel Hill, NC 27599, USA. Karolinska Institutet, Stockholm SE-171 77, Sweden
| | | | - Shaun M Purcell
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Pamela Sklar
- Icahn School of Medicine at Mount Sinai, New York, NY 10029, USA
| | - Sven J van der Lee
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - Annemieke Rozemuller
- Dutch Surveillance Centre for Prion Diseases, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
| | - Casper Jansen
- Dutch Surveillance Centre for Prion Diseases, Department of Pathology, University Medical Center, Utrecht 3584 CX, Netherlands
| | - Albert Hofman
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - Robert Kraaij
- Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, Netherlands
| | | | - M Arfan Ikram
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | - André G Uitterlinden
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands. Department of Internal Medicine, Erasmus MC, Rotterdam 3000 CA, Netherlands
| | - Cornelia M van Duijn
- Department of Epidemiology, Erasmus Medical Center (MC), Rotterdam 3000 CA, Netherlands
| | | | - Mark J Daly
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA
| | - Daniel G MacArthur
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA. Analytical and Translational Genetics Unit, Massachusetts General Hospital, Boston, MA 02114, USA.
| |
Collapse
|
14
|
Abstract
Analysis of genomic segments shared identical-by-descent (IBD) between individuals is fundamental to many genetic applications, from demographic inference to estimating the heritability of diseases, but IBD detection accuracy in nonsimulated data is largely unknown. In principle, it can be evaluated using known pedigrees, as IBD segments are by definition inherited without recombination down a family tree. We extracted 25,432 genotyped European individuals containing 2,952 father-mother-child trios from the 23andMe, Inc. data set. We then used GERMLINE, a widely used IBD detection method, to detect IBD segments within this cohort. Exploiting known familial relationships, we identified a false-positive rate over 67% for 2-4 centiMorgan (cM) segments, in sharp contrast with accuracies reported in simulated data at these sizes. Nearly all false positives arose from the allowance of haplotype switch errors when detecting IBD, a necessity for retrieving long (>6 cM) segments in the presence of imperfect phasing. We introduce HaploScore, a novel, computationally efficient metric that scores IBD segments proportional to the number of switch errors they contain. Applying HaploScore filtering to the IBD data at a precision of 0.8 produced a 13-fold increase in recall when compared with length-based filtering. We replicate the false IBD findings and demonstrate the generalizability of HaploScore to alternative data sources using an independent cohort of 555 European individuals from the 1000 Genomes project. HaploScore can improve the accuracy of segments reported by any IBD detection method, provided that estimates of the genotyping error rate and switch error rate are available.
Collapse
|
15
|
Reno PL, McLean CY, Hines JE, Capellini TD, Bejerano G, Kingsley DM. A penile spine/vibrissa enhancer sequence is missing in modern and extinct humans but is retained in multiple primates with penile spines and sensory vibrissae. PLoS One 2013; 8:e84258. [PMID: 24367647 PMCID: PMC3868586 DOI: 10.1371/journal.pone.0084258] [Citation(s) in RCA: 15] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/24/2013] [Accepted: 11/04/2013] [Indexed: 11/18/2022] Open
Abstract
Previous studies show that humans have a large genomic deletion downstream of the Androgen Receptor gene that eliminates an ancestral mammalian regulatory enhancer that drives expression in developing penile spines and sensory vibrissae. Here we use a combination of large-scale sequence analysis and PCR amplification to demonstrate that the penile spine/vibrissa enhancer is missing in all humans surveyed and in the Neandertal and Denisovan genomes, but is present in DNA samples of chimpanzees and bonobos, as well as in multiple other great apes and primates that maintain some form of penile integumentary appendage and facial vibrissae. These results further strengthen the association between the presence of the penile spine/vibrissa enhancer and the presence of penile spines and macro- or micro- vibrissae in non-human primates as well as show that loss of the enhancer is both a distinctive and characteristic feature of the human lineage.
Collapse
Affiliation(s)
- Philip L. Reno
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
- * E-mail: (PLR); (CYM)
| | - Cory Y. McLean
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- * E-mail: (PLR); (CYM)
| | - Jasmine E. Hines
- Department of Anthropology, The Pennsylvania State University, University Park, Pennsylvania, United States of America
| | - Terence D. Capellini
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - Gill Bejerano
- Department of Computer Science, Stanford University, Stanford, California, United States of America
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California, United States of America
| | - David M. Kingsley
- Department of Developmental Biology, Stanford University School of Medicine, Stanford, California, United States of America
- Howard Hughes Medical Institute, Stanford, California, United States of America
| |
Collapse
|
16
|
Johnson BE, Mazor T, Hong C, Barnes M, Aihara K, McLean CY, Fouse SD, Yamamoto S, Ueda H, Tatsuno K, Asthana S, Jalbert LE, Nelson SJ, Bollen AW, Gustafson WC, Charron E, Weiss WA, Smirnov IV, Song JS, Olshen AB, Cha S, Zhao Y, Moore RA, Mungall AJ, Jones SJM, Hirst M, Marra MA, Saito N, Aburatani H, Mukasa A, Berger MS, Chang SM, Taylor BS, Costello JF. Mutational analysis reveals the origin and therapy-driven evolution of recurrent glioma. Science 2013; 343:189-193. [PMID: 24336570 DOI: 10.1126/science.1239947] [Citation(s) in RCA: 987] [Impact Index Per Article: 89.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 01/01/2023]
Abstract
Tumor recurrence is a leading cause of cancer mortality. Therapies for recurrent disease may fail, at least in part, because the genomic alterations driving the growth of recurrences are distinct from those in the initial tumor. To explore this hypothesis, we sequenced the exomes of 23 initial low-grade gliomas and recurrent tumors resected from the same patients. In 43% of cases, at least half of the mutations in the initial tumor were undetected at recurrence, including driver mutations in TP53, ATRX, SMARCA4, and BRAF; this suggests that recurrent tumors are often seeded by cells derived from the initial tumor at a very early stage of their evolution. Notably, tumors from 6 of 10 patients treated with the chemotherapeutic drug temozolomide (TMZ) followed an alternative evolutionary path to high-grade glioma. At recurrence, these tumors were hypermutated and harbored driver mutations in the RB (retinoblastoma) and Akt-mTOR (mammalian target of rapamycin) pathways that bore the signature of TMZ-induced mutagenesis.
Collapse
Affiliation(s)
- Brett E Johnson
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Tali Mazor
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Chibo Hong
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Michael Barnes
- Department ofPathology, University of California San Francisco, San Francisco, CA, USA
| | - Koki Aihara
- Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan.,Department of Neurosurgery, University of Tokyo, Tokyo, Japan
| | - Cory Y McLean
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Shaun D Fouse
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Shogo Yamamoto
- Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan
| | - Hiroki Ueda
- Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan
| | - Kenji Tatsuno
- Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan
| | - Saurabh Asthana
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA.,Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA
| | - Llewellyn E Jalbert
- Department ofBioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA
| | - Sarah J Nelson
- Department ofBioengineering and Therapeutic Sciences, University of California San Francisco, San Francisco, CA, USA.,Department ofRadiology and Biomedical Imaging, University of California San Francisco, San Francisco, CA, USA
| | - Andrew W Bollen
- Department ofPathology, University of California San Francisco, San Francisco, CA, USA
| | - W Clay Gustafson
- Department ofPediatrics, University of California San Francisco, San Francisco, CA, USA
| | - Elise Charron
- Department ofNeurology, University of California San Francisco, San Francisco, CA, USA
| | - William A Weiss
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA.,Department ofPediatrics, University of California San Francisco, San Francisco, CA, USA.,Department ofNeurology, University of California San Francisco, San Francisco, CA, USA
| | - Ivan V Smirnov
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Jun S Song
- Department ofEpidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA.,Institute for Human Genetics, University of California San Francisco, San Francisco, CA, USA
| | - Adam B Olshen
- Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA.,Department ofEpidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Soonmee Cha
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Yongjun Zhao
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Richard A Moore
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Andrew J Mungall
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Steven J M Jones
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Martin Hirst
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Marco A Marra
- Michael Smith Genome Sciences Centre, British Columbia Cancer Agency, Vancouver, BC, Canada
| | - Nobuhito Saito
- Department of Neurosurgery, University of Tokyo, Tokyo, Japan
| | - Hiroyuki Aburatani
- Genome Science Laboratory, Research Center for Advanced Science and Technology, University of Tokyo, Tokyo, Japan
| | - Akitake Mukasa
- Department of Neurosurgery, University of Tokyo, Tokyo, Japan
| | - Mitchel S Berger
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Susan M Chang
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| | - Barry S Taylor
- Department of Medicine, University of California San Francisco, San Francisco, CA, USA.,Helen Diller Family Comprehensive Cancer Center, University of California San Francisco, San Francisco, CA, USA.,Department ofEpidemiology and Biostatistics, University of California San Francisco, San Francisco, CA, USA
| | - Joseph F Costello
- Department of Neurological Surgery, University of California San Francisco, San Francisco, CA, USA
| |
Collapse
|
17
|
Wenger AM, Clarke SL, Guturu H, Chen J, Schaar BT, McLean CY, Bejerano G. PRISM offers a comprehensive genomic approach to transcription factor function prediction. Genome Res 2013; 23:889-904. [PMID: 23382538 PMCID: PMC3638144 DOI: 10.1101/gr.139071.112] [Citation(s) in RCA: 30] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [What about the content of this article? (0)] [Affiliation(s)] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 02/07/2023]
Abstract
The human genome encodes 1500–2000 different transcription factors (TFs). ChIP-seq is revealing the global binding profiles of a fraction of TFs in a fraction of their biological contexts. These data show that the majority of TFs bind directly next to a large number of context-relevant target genes, that most binding is distal, and that binding is context specific. Because of the effort and cost involved, ChIP-seq is seldom used in search of novel TF function. Such exploration is instead done using expression perturbation and genetic screens. Here we propose a comprehensive computational framework for transcription factor function prediction. We curate 332 high-quality nonredundant TF binding motifs that represent all major DNA binding domains, and improve cross-species conserved binding site prediction to obtain 3.3 million conserved, mostly distal, binding site predictions. We combine these with 2.4 million facts about all human and mouse gene functions, in a novel statistical framework, in search of enrichments of particular motifs next to groups of target genes of particular functions. Rigorous parameter tuning and a harsh null are used to minimize false positives. Our novel PRISM (predicting regulatory information from single motifs) approach obtains 2543 TF function predictions in a large variety of contexts, at a false discovery rate of 16%. The predictions are highly enriched for validated TF roles, and 45 of 67 (67%) tested binding site regions in five different contexts act as enhancers in functionally matched cells.
Collapse
Affiliation(s)
- Aaron M Wenger
- Department of Computer Science, Stanford University, Stanford, California 94305, USA
| | | | | | | | | | | | | |
Collapse
|