451
|
Hibrand Saint-Oyant L, Ruttink T, Hamama L, Kirov I, Lakhwani D, Zhou NN, Bourke PM, Daccord N, Leus L, Schulz D, Van de Geest H, Hesselink T, Van Laere K, Debray K, Balzergue S, Thouroude T, Chastellier A, Jeauffre J, Voisine L, Gaillard S, Borm TJA, Arens P, Voorrips RE, Maliepaard C, Neu E, Linde M, Le Paslier MC, Bérard A, Bounon R, Clotault J, Choisne N, Quesneville H, Kawamura K, Aubourg S, Sakr S, Smulders MJM, Schijlen E, Bucher E, Debener T, De Riek J, Foucher F. A high-quality genome sequence of Rosa chinensis to elucidate ornamental traits. NATURE PLANTS 2018; 4:473-484. [PMID: 29892093 PMCID: PMC6786968 DOI: 10.1038/s41477-018-0166-1] [Citation(s) in RCA: 170] [Impact Index Per Article: 24.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Subscribe] [Scholar Register] [Received: 01/30/2018] [Accepted: 05/01/2018] [Indexed: 05/18/2023]
Abstract
Rose is the world's most important ornamental plant, with economic, cultural and symbolic value. Roses are cultivated worldwide and sold as garden roses, cut flowers and potted plants. Roses are outbred and can have various ploidy levels. Our objectives were to develop a high-quality reference genome sequence for the genus Rosa by sequencing a doubled haploid, combining long and short reads, and anchoring to a high-density genetic map, and to study the genome structure and genetic basis of major ornamental traits. We produced a doubled haploid rose line ('HapOB') from Rosa chinensis 'Old Blush' and generated a rose genome assembly anchored to seven pseudo-chromosomes (512 Mb with N50 of 3.4 Mb and 564 contigs). The length of 512 Mb represents 90.1-96.1% of the estimated haploid genome size of rose. Of the assembly, 95% is contained in only 196 contigs. The anchoring was validated using high-density diploid and tetraploid genetic maps. We delineated hallmark chromosomal features, including the pericentromeric regions, through annotation of transposable element families and positioned centromeric repeats using fluorescent in situ hybridization. The rose genome displays extensive synteny with the Fragaria vesca genome, and we delineated only two major rearrangements. Genetic diversity was analysed using resequencing data of seven diploid and one tetraploid Rosa species selected from various sections of the genus. Combining genetic and genomic approaches, we identified potential genetic regulators of key ornamental traits, including prickle density and the number of flower petals. A rose APETALA2/TOE homologue is proposed to be the major regulator of petal number in rose. This reference sequence is an important resource for studying polyploidization, meiosis and developmental processes, as we demonstrated for flower and prickle development. It will also accelerate breeding through the development of molecular markers linked to traits, the identification of the genes underlying them and the exploitation of synteny across Rosaceae.
Collapse
Affiliation(s)
- L Hibrand Saint-Oyant
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - T Ruttink
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - L Hamama
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - I Kirov
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
- Russian State Agrarian University-Moscow Timiryazev Agricultural Academy, Moscow, Russia
| | - D Lakhwani
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - N N Zhou
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - P M Bourke
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - N Daccord
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - L Leus
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - D Schulz
- Leibniz Universität, Hannover, Germany
| | - H Van de Geest
- Wageningen University & Research, Business Unit Bioscience, Wageningen, The Netherlands
| | - T Hesselink
- Wageningen University & Research, Business Unit Bioscience, Wageningen, The Netherlands
| | - K Van Laere
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - K Debray
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - S Balzergue
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - T Thouroude
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - A Chastellier
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - J Jeauffre
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - L Voisine
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - S Gaillard
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - T J A Borm
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - P Arens
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - R E Voorrips
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - C Maliepaard
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - E Neu
- Leibniz Universität, Hannover, Germany
| | - M Linde
- Leibniz Universität, Hannover, Germany
| | - M C Le Paslier
- INRA, US 1279 EPGV, Université Paris-Saclay, Evry, France
| | - A Bérard
- INRA, US 1279 EPGV, Université Paris-Saclay, Evry, France
| | - R Bounon
- INRA, US 1279 EPGV, Université Paris-Saclay, Evry, France
| | - J Clotault
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - N Choisne
- URGI, INRA, Université Paris-Saclay, Versailles, France
| | - H Quesneville
- URGI, INRA, Université Paris-Saclay, Versailles, France
| | - K Kawamura
- Osaka Institute of Technology, Osaka, Japan
| | - S Aubourg
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - S Sakr
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - M J M Smulders
- Plant Breeding, Wageningen University & Research, Wageningen, The Netherlands
| | - E Schijlen
- Wageningen University & Research, Business Unit Bioscience, Wageningen, The Netherlands
| | - E Bucher
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France
| | - T Debener
- Leibniz Universität, Hannover, Germany
| | - J De Riek
- ILVO, Flanders Research Institute for Agriculture, Fisheries and Food, Plant Sciences Unit, Melle, Belgium
| | - F Foucher
- IRHS, Agrocampus-Ouest, INRA, Université d'Angers, SFR 4207 QuaSaV, Beaucouzé, France.
| |
Collapse
|
452
|
Tom N, Tom O, Malcikova J, Pavlova S, Kubesova B, Rausch T, Kolarik M, Benes V, Bystry V, Pospisilova S. ToTem: a tool for variant calling pipeline optimization. BMC Bioinformatics 2018; 19:243. [PMID: 29940847 PMCID: PMC6020218 DOI: 10.1186/s12859-018-2227-x] [Citation(s) in RCA: 1] [Impact Index Per Article: 0.1] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/03/2018] [Accepted: 05/31/2018] [Indexed: 12/30/2022] Open
Abstract
BACKGROUND High-throughput bioinformatics analyses of next generation sequencing (NGS) data often require challenging pipeline optimization. The key problem is choosing appropriate tools and selecting the best parameters for optimal precision and recall. RESULTS Here we introduce ToTem, a tool for automated pipeline optimization. ToTem is a stand-alone web application with a comprehensive graphical user interface (GUI). ToTem is written in Java and PHP with an underlying connection to a MySQL database. Its primary role is to automatically generate, execute and benchmark different variant calling pipeline settings. Our tool allows an analysis to be started from any level of the process and with the possibility of plugging almost any tool or code. To prevent an over-fitting of pipeline parameters, ToTem ensures the reproducibility of these by using cross validation techniques that penalize the final precision, recall and F-measure. The results are interpreted as interactive graphs and tables allowing an optimal pipeline to be selected, based on the user's priorities. Using ToTem, we were able to optimize somatic variant calling from ultra-deep targeted gene sequencing (TGS) data and germline variant detection in whole genome sequencing (WGS) data. CONCLUSIONS ToTem is a tool for automated pipeline optimization which is freely available as a web application at https://totem.software .
Collapse
Affiliation(s)
- Nikola Tom
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Internal Medicine - Hematology and Oncology, Medical Faculty, Masaryk University and University Hospital Brno, Brno, Czech Republic
| | - Ondrej Tom
- Department of Computer Science, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Jitka Malcikova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Internal Medicine - Hematology and Oncology, Medical Faculty, Masaryk University and University Hospital Brno, Brno, Czech Republic
| | - Sarka Pavlova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Internal Medicine - Hematology and Oncology, Medical Faculty, Masaryk University and University Hospital Brno, Brno, Czech Republic
| | - Blanka Kubesova
- Department of Internal Medicine - Hematology and Oncology, Medical Faculty, Masaryk University and University Hospital Brno, Brno, Czech Republic
| | - Tobias Rausch
- Genomics Core Facility, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Miroslav Kolarik
- Department of Computer Science, Faculty of Science, Palacky University, Olomouc, Czech Republic
| | - Vladimir Benes
- Genomics Core Facility, European Molecular Biology Laboratory, Heidelberg, Germany
| | - Vojtech Bystry
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
| | - Sarka Pospisilova
- Center of Molecular Medicine, Central European Institute of Technology, Masaryk University, Brno, Czech Republic
- Department of Internal Medicine - Hematology and Oncology, Medical Faculty, Masaryk University and University Hospital Brno, Brno, Czech Republic
| |
Collapse
|
453
|
Li L, Guo F, Gao Y, Ren Y, Yuan P, Yan L, Li R, Lian Y, Li J, Hu B, Gao J, Wen L, Tang F, Qiao J. Single-cell multi-omics sequencing of human early embryos. Nat Cell Biol 2018; 20:847-858. [DOI: 10.1038/s41556-018-0123-2] [Citation(s) in RCA: 132] [Impact Index Per Article: 18.9] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/31/2017] [Accepted: 05/16/2018] [Indexed: 11/09/2022]
|
454
|
Mielczarek M, Frąszczak M, Nicolazzi E, Williams JL, Szyda J. Landscape of copy number variations in Bos taurus: individual - and inter-breed variability. BMC Genomics 2018; 19:410. [PMID: 29843606 PMCID: PMC5975385 DOI: 10.1186/s12864-018-4815-6] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/25/2018] [Accepted: 05/22/2018] [Indexed: 11/24/2022] Open
Abstract
Background The number of studies of Copy Number Variation in cattle has increased in recent years. This has been prompted by the increased availability of data on polymorphisms and their relationship with phenotypes. In addition, livestock species are good models for some human phenotypes. In the present study, we described the landscape of CNV driven genetic variation in a large population of 146 individuals representing 13 cattle breeds, using whole genome DNA sequence. Results A highly significant variation among all individuals and within each breed was observed in the number of duplications (P < 10−15) and in the number of deletions (P < 10−15). We also observed significant differences between breeds for duplication (P = 0.01932) and deletion (P = 0.01006) counts. The same variation CNV length - inter-individual and inter-breed differences were significant for duplications (P < 10−15) and deletions (P < 10−15). Moreover, breed-specific variants were identified, with the largest proportion of breed-specific duplications (9.57%) found for Fleckvieh and breed-specific deletions found for Brown Swiss (5.00%). Such breed-specific CNVs were predominantly located in intragenic regions, however in Simmental, one deletion present in five individuals was found in the coding sequence of a novel gene ENSBTAG00000000688 on chromosome 18. In Brown Swiss, Norwegian Red and Simmental breed-specific deletions were located within KIT and MC1R genes, which are responsible for a coat colour. The functional annotation of coding regions underlying the breed-specific CNVs showed that in Norwegian Red, Guernsey, and Simmental significantly under- and overrepresented GO terms were related to chemical stimulus involved in sensory perception of smell and the KEGG pathways for olfactory transduction. In addition, specifically for the Norwegian Red breed, the dopaminergic synapse KEGG pathway was significantly enriched within deleted parts of the genome. Conclusions The CNV landscape in Bos taurus genome revealed by this study was highly complex, with inter-breed differences, but also a significant variation within breeds. The former, may explain some of the phenotypic differences among analysed breeds, and the latter contributes to within-breed variation available for selection. Electronic supplementary material The online version of this article (10.1186/s12864-018-4815-6) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- M Mielczarek
- Biostatistics group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland. .,National Research Institute of Animal Production, Krakowska 1, 32-083, Balice, Poland.
| | - M Frąszczak
- Biostatistics group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland
| | - E Nicolazzi
- Council on Dairy Cattle Breeding (CDCB), 4201 Northview Dr, Bowie, MD, 20716, USA
| | - J L Williams
- Davies Research Centre, University of Adelaide, School of Animal and Veterinary Sciences, Roseworthy, SA, 5371, Australia
| | - J Szyda
- Biostatistics group, Department of Genetics, Wroclaw University of Environmental and Life Sciences, Kozuchowska 7, 51-631, Wroclaw, Poland.,National Research Institute of Animal Production, Krakowska 1, 32-083, Balice, Poland
| |
Collapse
|
455
|
Wunderle M, Gass P, Häberle L, Flesch VM, Rauh C, Bani MR, Hack CC, Schrauder MG, Jud SM, Emons J, Erber R, Ekici AB, Hoyer J, Vasileiou G, Kraus C, Reis A, Hartmann A, Lux MP, Beckmann MW, Fasching PA, Hein A. BRCA mutations and their influence on pathological complete response and prognosis in a clinical cohort of neoadjuvantly treated breast cancer patients. Breast Cancer Res Treat 2018; 171:85-94. [PMID: 29725888 DOI: 10.1007/s10549-018-4797-8] [Citation(s) in RCA: 37] [Impact Index Per Article: 5.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/30/2017] [Accepted: 04/18/2018] [Indexed: 12/31/2022]
Abstract
PURPOSE BRCA1/2 mutations influence the molecular characteristics and the effects of systemic treatment of breast cancer. This study investigates the impact of germline BRCA1/2 mutations on pathological complete response and prognosis in patients receiving neoadjuvant systemic chemotherapy. METHODS Breast cancer patients were tested for a BRCA1/2 mutation in clinical routine work and were treated with anthracycline-based or platinum-based neoadjuvant chemotherapy between 1997 and 2015. These patients were identified in the tumor registry of the Breast Center of the University of Erlangen (Germany). Logistic regression and Cox regression analyses were performed to investigate the associations between BRCA1/2 mutation status, pathological complete response, disease-free survival, and overall survival. RESULTS Among 355 patients, 59 had a mutation in BRCA1 or in BRCA2 (16.6%), 43 in BRCA1 (12.1%), and 16 in BRCA2 (4.5%). Pathological complete response defined as "ypT0; ypN0" was observed in 54.3% of BRCA1/2 mutation carriers, but only in 22.6% of non-carriers. The adjusted odds ratio was 2.48 (95% CI 1.26-4.91) for BRCA1/2 carriers versus non-carriers. Patients who achieved a pathological complete response had better disease-free survival and overall survival rates compared with those who did not achieve a pathological complete response, regardless of BRCA1/2 mutation status. CONCLUSIONS BRCA1/2 mutation status leads to better responses to neoadjuvant chemotherapy in breast cancer. Pathological complete response is the main predictor of disease-free survival and overall survival, independently of BRCA1/2 mutation status.
Collapse
Affiliation(s)
- Marius Wunderle
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Paul Gass
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Lothar Häberle
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany.,Biostatistics Unit, Department of Gynecology and Obstetrics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Vivien M Flesch
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Claudia Rauh
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Mayada R Bani
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Carolin C Hack
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Michael G Schrauder
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Sebastian M Jud
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Julius Emons
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Ramona Erber
- Institute of Pathology, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Arif B Ekici
- Institute of Human Genetics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Juliane Hoyer
- Institute of Human Genetics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Georgia Vasileiou
- Institute of Human Genetics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Cornelia Kraus
- Institute of Human Genetics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Andre Reis
- Institute of Human Genetics, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Arndt Hartmann
- Institute of Pathology, Erlangen University Hospital, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Michael P Lux
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Matthias W Beckmann
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| | - Peter A Fasching
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany. .,Department of Medicine, Division of Hematology and Oncology, University of California at Los Angeles, David Geffen School of Medicine, Los Angeles, CA, USA.
| | - Alexander Hein
- Department of Gynecology and Obstetrics, Erlangen University Hospital, Comprehensive Cancer Center Erlangen-EMN, Friedrich Alexander University Erlangen-Nuremberg, Erlangen, Germany
| |
Collapse
|
456
|
Telenti A, Lippert C, Chang PC, DePristo M. Deep learning of genomic variation and regulatory network data. Hum Mol Genet 2018; 27:R63-R71. [PMID: 29648622 PMCID: PMC6499235 DOI: 10.1093/hmg/ddy115] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/09/2018] [Revised: 03/26/2018] [Accepted: 03/27/2018] [Indexed: 02/07/2023] Open
Abstract
The human genome is now investigated through high-throughput functional assays, and through the generation of population genomic data. These advances support the identification of functional genetic variants and the prediction of traits (e.g. deleterious variants and disease). This review summarizes lessons learned from the large-scale analyses of genome and exome data sets, modeling of population data and machine-learning strategies to solve complex genomic sequence regions. The review also portrays the rapid adoption of artificial intelligence/deep neural networks in genomics; in particular, deep learning approaches are well suited to model the complex dependencies in the regulatory landscape of the genome, and to provide predictors for genetic variant calling and interpretation.
Collapse
Affiliation(s)
- Amalio Telenti
- Scripps Translational Science Institute, The Scripps Research Institute, La Jolla, CA 92037, USA
| | | | | | | |
Collapse
|
457
|
High-depth whole genome sequencing of an Ashkenazi Jewish reference panel: enhancing sensitivity, accuracy, and imputation. Hum Genet 2018; 137:343-355. [PMID: 29705978 DOI: 10.1007/s00439-018-1886-z] [Citation(s) in RCA: 19] [Impact Index Per Article: 2.7] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/06/2018] [Accepted: 04/21/2018] [Indexed: 12/31/2022]
Abstract
While increasingly large reference panels for genome-wide imputation have been recently made available, the degree to which imputation accuracy can be enhanced by population-specific reference panels remains an open question. Here, we sequenced at full-depth (≥ 30×), across two platforms (Illumina X Ten and Complete Genomics, Inc.), a moderately large (n = 738) cohort of samples drawn from the Ashkenazi Jewish population. We developed a series of quality control steps to optimize sensitivity, specificity, and comprehensiveness of variant calls in the reference panel, and then tested the accuracy of imputation against target cohorts drawn from the same population. Quality control (QC) thresholds for the Illumina X Ten platform were identified that permitted highly accurate calling of single nucleotide variants across 94% of the genome. QC procedures also identified numerous regions that are poorly mapped using current reference or alternate assemblies. After stringent QC, the population-specific reference panel produced more accurate and comprehensive imputation results relative to publicly available, large cosmopolitan reference panels, especially in the range of rare variants that may be most critical to further progress in mapping of complex phenotypes. The population-specific reference panel also permitted enhanced filtering of clinically irrelevant variants from personal genomes.
Collapse
|
458
|
An analytical framework for whole-genome sequence association studies and its implications for autism spectrum disorder. Nat Genet 2018; 50:727-736. [PMID: 29700473 DOI: 10.1038/s41588-018-0107-y] [Citation(s) in RCA: 184] [Impact Index Per Article: 26.3] [Reference Citation Analysis] [Abstract] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/19/2017] [Accepted: 03/06/2018] [Indexed: 11/08/2022]
Abstract
Genomic association studies of common or rare protein-coding variation have established robust statistical approaches to account for multiple testing. Here we present a comparable framework to evaluate rare and de novo noncoding single-nucleotide variants, insertion/deletions, and all classes of structural variation from whole-genome sequencing (WGS). Integrating genomic annotations at the level of nucleotides, genes, and regulatory regions, we define 51,801 annotation categories. Analyses of 519 autism spectrum disorder families did not identify association with any categories after correction for 4,123 effective tests. Without appropriate correction, biologically plausible associations are observed in both cases and controls. Despite excluding previously identified gene-disrupting mutations, coding regions still exhibited the strongest associations. Thus, in autism, the contribution of de novo noncoding variation is probably modest in comparison to that of de novo coding variants. Robust results from future WGS studies will require large cohorts and comprehensive analytical strategies that consider the substantial multiple-testing burden.
Collapse
|
459
|
Gao G, Nome T, Pearse DE, Moen T, Naish KA, Thorgaard GH, Lien S, Palti Y. A New Single Nucleotide Polymorphism Database for Rainbow Trout Generated Through Whole Genome Resequencing. Front Genet 2018; 9:147. [PMID: 29740479 PMCID: PMC5928233 DOI: 10.3389/fgene.2018.00147] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/13/2018] [Accepted: 04/09/2018] [Indexed: 11/13/2022] Open
Abstract
Single-nucleotide polymorphisms (SNPs) are highly abundant markers, which are broadly distributed in animal genomes. For rainbow trout (Oncorhynchus mykiss), SNP discovery has been previously done through sequencing of restriction-site associated DNA (RAD) libraries, reduced representation libraries (RRL) and RNA sequencing. Recently we have performed high coverage whole genome resequencing with 61 unrelated samples, representing a wide range of rainbow trout and steelhead populations, with 49 new samples added to 12 aquaculture samples from AquaGen (Norway) that we previously used for SNP discovery. Of the 49 new samples, 11 were double-haploid lines from Washington State University (WSU) and 38 represented wild and hatchery populations from a wide range of geographic distribution and with divergent migratory phenotypes. We then mapped the sequences to the new rainbow trout reference genome assembly (GCA_002163495.1) which is based on the Swanson YY doubled haploid line. Variant calling was conducted with FreeBayes and SAMtools mpileup, followed by filtering of SNPs based on quality score, sequence complexity, read depth on the locus, and number of genotyped samples. Results from the two variant calling programs were compared and genotypes of the double haploid samples were used for detecting and filtering putative paralogous sequence variants (PSVs) and multi-sequence variants (MSVs). Overall, 30,302,087 SNPs were identified on the rainbow trout genome 29 chromosomes and 1,139,018 on unplaced scaffolds, with 4,042,723 SNPs having high minor allele frequency (MAF > 0.25). The average SNP density on the chromosomes was one SNP per 64 bp, or 15.6 SNPs per 1 kb. Results from the phylogenetic analysis that we conducted indicate that the SNP markers contain enough population-specific polymorphisms for recovering population relationships despite the small sample size used. Intra-Population polymorphism assessment revealed high level of polymorphism and heterozygosity within each population. We also provide functional annotation based on the genome position of each SNP and evaluate the use of clonal lines for filtering of PSVs and MSVs. These SNPs form a new database, which provides an important resource for a new high density SNP array design and for other SNP genotyping platforms used for genetic and genomics studies of this iconic salmonid fish species.
Collapse
Affiliation(s)
- Guangtu Gao
- National Center for Cool and Cold Water Aquaculture, ARS-USDA, Kearneysville, WV, United States
| | - Torfinn Nome
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Centre of Integrative Genetics, Norwegian University of Life Sciences, Ås, Norway
| | - Devon E Pearse
- Fisheries Ecology Division, Southwest Fisheries Science Center, National Marine Fisheries Service, Santa Cruz, CA, United States
| | | | - Kerry A Naish
- School of Aquatic and Fishery Sciences, University of Washington, Seattle, WA, United States
| | - Gary H Thorgaard
- School of Biological Sciences and Center for Reproductive Biology, Washington State University, Pullman, WA, United States
| | - Sigbjørn Lien
- Department of Animal and Aquacultural Sciences, Faculty of Biosciences, Centre of Integrative Genetics, Norwegian University of Life Sciences, Ås, Norway
| | - Yniv Palti
- National Center for Cool and Cold Water Aquaculture, ARS-USDA, Kearneysville, WV, United States
| |
Collapse
|
460
|
Leigh DM, Lischer HEL, Grossen C, Keller LF. Batch effects in a multiyear sequencing study: False biological trends due to changes in read lengths. Mol Ecol Resour 2018; 18:778-788. [DOI: 10.1111/1755-0998.12779] [Citation(s) in RCA: 26] [Impact Index Per Article: 3.7] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/09/2017] [Revised: 02/27/2018] [Accepted: 03/01/2018] [Indexed: 12/11/2022]
Affiliation(s)
- D. M. Leigh
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
- Swiss Institute of Bioinformatics Quartier Sorge ‐ Batiment Genopode Lausanne Switzerland
- Department of Biology Queen's University Kingston ON Canada
| | - H. E. L. Lischer
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
- Swiss Institute of Bioinformatics Quartier Sorge ‐ Batiment Genopode Lausanne Switzerland
| | - C. Grossen
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
| | - L. F. Keller
- Department of Evolutionary Biology and Environmental Studies University of Zurich Zurich Switzerland
- Zoological Museum University of Zurich Zurich Switzerland
| |
Collapse
|
461
|
Ren Y, Reddy JS, Pottier C, Sarangi V, Tian S, Sinnwell JP, McDonnell SK, Biernacka JM, Carrasquillo MM, Ross OA, Ertekin-Taner N, Rademakers R, Hudson M, Mainzer LS, Asmann YW. Identification of missing variants by combining multiple analytic pipelines. BMC Bioinformatics 2018; 19:139. [PMID: 29661148 PMCID: PMC5902939 DOI: 10.1186/s12859-018-2151-0] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 01/02/2018] [Accepted: 04/09/2018] [Indexed: 02/02/2023] Open
Abstract
Background After decades of identifying risk factors using array-based genome-wide association studies (GWAS), genetic research of complex diseases has shifted to sequencing-based rare variants discovery. This requires large sample sizes for statistical power and has brought up questions about whether the current variant calling practices are adequate for large cohorts. It is well-known that there are discrepancies between variants called by different pipelines, and that using a single pipeline always misses true variants exclusively identifiable by other pipelines. Nonetheless, it is common practice today to call variants by one pipeline due to computational cost and assume that false negative calls are a small percent of total. Results We analyzed 10,000 exomes from the Alzheimer’s Disease Sequencing Project (ADSP) using multiple analytic pipelines consisting of different read aligners and variant calling strategies. We compared variants identified by using two aligners in 50,100, 200, 500, 1000, and 1952 samples; and compared variants identified by adding single-sample genotyping to the default multi-sample joint genotyping in 50,100, 500, 2000, 5000 and 10,000 samples. We found that using a single pipeline missed increasing numbers of high-quality variants correlated with sample sizes. By combining two read aligners and two variant calling strategies, we rescued 30% of pass-QC variants at sample size of 2000, and 56% at 10,000 samples. The rescued variants had higher proportions of low frequency (minor allele frequency [MAF] 1–5%) and rare (MAF < 1%) variants, which are the very type of variants of interest. In 660 Alzheimer’s disease cases with earlier onset ages of ≤65, 4 out of 13 (31%) previously-published rare pathogenic and protective mutations in APP, PSEN1, and PSEN2 genes were undetected by the default one-pipeline approach but recovered by the multi-pipeline approach. Conclusions Identification of the complete variant set from sequencing data is the prerequisite of genetic association analyses. The current analytic practice of calling genetic variants from sequencing data using a single bioinformatics pipeline is no longer adequate with the increasingly large projects. The number and percentage of quality variants that passed quality filters but are missed by the one-pipeline approach rapidly increased with sample size. Electronic supplementary material The online version of this article (10.1186/s12859-018-2151-0) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Yingxue Ren
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Joseph S Reddy
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Cyril Pottier
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Vivekananda Sarangi
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Shulan Tian
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Jason P Sinnwell
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Shannon K McDonnell
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | - Joanna M Biernacka
- Division of Biomedical Statistics and Informatics, Department of Health Sciences Research, Mayo Clinic, Rochester, MN, 55905, USA
| | | | - Owen A Ross
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, 32224, USA.,Department of Clinical Genomics, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Nilüfer Ertekin-Taner
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, 32224, USA.,Department of Neurology, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Rosa Rademakers
- Department of Neuroscience, Mayo Clinic, Jacksonville, FL, 32224, USA
| | - Matthew Hudson
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.,Carl R Woese Institute for Genomic Biology, Carver Biotechnology Center and Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Liudmila Sergeevna Mainzer
- National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
| | - Yan W Asmann
- Department of Health Sciences Research, Mayo Clinic, Jacksonville, FL, 32224, USA.
| |
Collapse
|
462
|
Durand G, Javerliat F, Bes M, Veyrieras JB, Guigon G, Mugnier N, Schicklin S, Kaneko G, Santiago-Allexant E, Bouchiat C, Martins-Simões P, Laurent F, Van Belkum A, Vandenesch F, Tristan A. Routine Whole-Genome Sequencing for Outbreak Investigations of Staphylococcus aureus in a National Reference Center. Front Microbiol 2018; 9:511. [PMID: 29616014 PMCID: PMC5869177 DOI: 10.3389/fmicb.2018.00511] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 03/06/2018] [Indexed: 11/25/2022] Open
Abstract
The French National Reference Center for Staphylococci currently uses DNA arrays and spa typing for the initial epidemiological characterization of Staphylococcus aureus strains. We here describe the use of whole-genome sequencing (WGS) to investigate retrospectively four distinct and virulent S. aureus lineages [clonal complexes (CCs): CC1, CC5, CC8, CC30] involved in hospital and community outbreaks or sporadic infections in France. We used a WGS bioinformatics pipeline based on de novo assembly (reference-free approach), single nucleotide polymorphism analysis, and on the inclusion of epidemiological markers. We examined the phylogeographic diversity of the French dominant hospital-acquired CC8-MRSA (methicillin-resistant S. aureus) Lyon clone through WGS analysis which did not demonstrate evidence of large-scale geographic clustering. We analyzed sporadic cases along with two outbreaks of a CC1-MSSA (methicillin-susceptible S. aureus) clone containing the Panton–Valentine leukocidin (PVL) and results showed that two sporadic cases were closely related. We investigated an outbreak of PVL-positive CC30-MSSA in a school environment and were able to reconstruct the transmission history between eight families. We explored different outbreaks among newborns due to the CC5-MRSA Geraldine clone and we found evidence of an unsuspected link between two otherwise distinct outbreaks. Here, WGS provides the resolving power to disprove transmission events indicated by conventional methods (same sequence type, spa type, toxin profile, and antibiotic resistance profile) and, most importantly, WGS can reveal unsuspected transmission events. Therefore, WGS allows to better describe and understand outbreaks and (inter-)national dissemination of S. aureus lineages. Our findings underscore the importance of adding WGS for (inter-)national surveillance of infections caused by virulent clones of S. aureus but also substantiate the fact that technological optimization at the bioinformatics level is still urgently needed for routine use. However, the greatest limitation of WGS analysis is the completeness and the correctness of the reference database being used and the conversion of floods of data into actionable results. The WGS bioinformatics pipeline (EpiSeqTM) we used here can easily generate a uniform database and associated metadata for epidemiological applications.
Collapse
Affiliation(s)
| | | | - Michèle Bes
- National Reference Center for Staphylococci, Hospices Civils de Lyon, Lyon, France
| | | | | | | | | | - Gaël Kaneko
- Data Analytics Unit, bioMérieux, Marcy-I'Étoile, France
| | | | - Coralie Bouchiat
- National Reference Center for Staphylococci, Hospices Civils de Lyon, Lyon, France
| | | | - Frederic Laurent
- National Reference Center for Staphylococci, Hospices Civils de Lyon, Lyon, France
| | | | - François Vandenesch
- National Reference Center for Staphylococci, Hospices Civils de Lyon, Lyon, France
| | - Anne Tristan
- National Reference Center for Staphylococci, Hospices Civils de Lyon, Lyon, France
| |
Collapse
|
463
|
Hodel KP, de Borja R, Henninger EE, Campbell BB, Ungerleider N, Light N, Wu T, LeCompte KG, Goksenin AY, Bunnell BA, Tabori U, Shlien A, Pursell ZF. Explosive mutation accumulation triggered by heterozygous human Pol ε proofreading-deficiency is driven by suppression of mismatch repair. eLife 2018; 7:32692. [PMID: 29488881 PMCID: PMC5829921 DOI: 10.7554/elife.32692] [Citation(s) in RCA: 29] [Impact Index Per Article: 4.1] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/11/2017] [Accepted: 02/04/2018] [Indexed: 12/14/2022] Open
Abstract
Tumors defective for DNA polymerase (Pol) ε proofreading have the highest tumor mutation burden identified. A major unanswered question is whether loss of Pol ε proofreading by itself is sufficient to drive this mutagenesis, or whether additional factors are necessary. To address this, we used a combination of next generation sequencing and in vitro biochemistry on human cell lines engineered to have defects in Pol ε proofreading and mismatch repair. Absent mismatch repair, monoallelic Pol ε proofreading deficiency caused a rapid increase in a unique mutation signature, similar to that observed in tumors from patients with biallelic mismatch repair deficiency and heterozygous Pol ε mutations. Restoring mismatch repair was sufficient to suppress the explosive mutation accumulation. These results strongly suggest that concomitant suppression of mismatch repair, a hallmark of colorectal and other aggressive cancers, is a critical force for driving the explosive mutagenesis seen in tumors expressing exonuclease-deficient Pol ε.
Collapse
Affiliation(s)
- Karl P Hodel
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, United States
| | - Richard de Borja
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Canada
| | - Erin E Henninger
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, United States
| | - Brittany B Campbell
- The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Canada.,Institute of Medical Science, Faculty of Medicine, University of Toronto, Toronto, Canada
| | - Nathan Ungerleider
- Department of Pathology, Tulane University School of Medicine, New Orleans, United States
| | - Nicholas Light
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Canada
| | - Tong Wu
- Department of Pathology, Tulane University School of Medicine, New Orleans, United States
| | - Kimberly G LeCompte
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, United States
| | - A Yasemin Goksenin
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, United States
| | - Bruce A Bunnell
- Department of Pharmacology, Tulane University School of Medicine, New Orleans, United States.,Tulane Center for Stem Cell Research and Regenerative Medicine, Tulane University School of Medicine, New Orleans, United States
| | - Uri Tabori
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Canada.,The Arthur and Sonia Labatt Brain Tumour Research Centre, The Hospital for Sick Children, Toronto, Canada.,Division of Hematology/Oncology, The Hospital for Sick Children, Toronto, Canada
| | - Adam Shlien
- Program in Genetics and Genome Biology, The Hospital for Sick Children, Toronto, Canada.,Department of Paediatric Laboratory Medicine, The Hospital for Sick Children, Toronto, Canada.,Department of Laboratory Medicine and Pathobiology, University of Toronto, Toronto, Canada
| | - Zachary F Pursell
- Department of Biochemistry and Molecular Biology, Tulane University School of Medicine, New Orleans, United States.,Tulane Cancer Center, Tulane University School of Medicine, New Orleans, United States
| |
Collapse
|
464
|
Franco I, Johansson A, Olsson K, Vrtačnik P, Lundin P, Helgadottir HT, Larsson M, Revêchon G, Bosia C, Pagnani A, Provero P, Gustafsson T, Fischer H, Eriksson M. Somatic mutagenesis in satellite cells associates with human skeletal muscle aging. Nat Commun 2018; 9:800. [PMID: 29476074 PMCID: PMC5824957 DOI: 10.1038/s41467-018-03244-6] [Citation(s) in RCA: 90] [Impact Index Per Article: 12.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/03/2017] [Accepted: 01/26/2018] [Indexed: 01/06/2023] Open
Abstract
Human aging is associated with a decline in skeletal muscle (SkM) function and a reduction in the number and activity of satellite cells (SCs), the resident stem cells. To study the connection between SC aging and muscle impairment, we analyze the whole genome of single SC clones of the leg muscle vastus lateralis from healthy individuals of different ages (21–78 years). We find an accumulation rate of 13 somatic mutations per genome per year, consistent with proliferation of SCs in the healthy adult muscle. SkM-expressed genes are protected from mutations, but aging results in an increase in mutations in exons and promoters, targeting genes involved in SC activity and muscle function. In agreement with SC mutations affecting the whole tissue, we detect a missense mutation in a SC propagating to the muscle. Our results suggest somatic mutagenesis in SCs as a driving force in the age-related decline of SkM function. Aging skeletal muscle shows declining numbers and activity of satellite cells. Here, Franco et al. show that in satellite cells of the human leg muscle vastus lateralis, somatic mutations accumulate with age and that these mutations become enriched in exons and promoters of genes involved in muscle function.
Collapse
Affiliation(s)
- Irene Franco
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.
| | - Anna Johansson
- Science for Life Laboratory, Department of Cell and Molecular Biology, Uppsala University, 75237, Uppsala, Sweden
| | - Karl Olsson
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Peter Vrtačnik
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Pär Lundin
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.,Science for Life Laboratory, Department of Biochemistry and Biophysics (DBB), Stockholm University, 10691, Stockholm, Sweden
| | - Hafdis T Helgadottir
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Malin Larsson
- Science for Life Laboratory, Department of Physics, Chemistry and Biology, Linköping University, 58183, Linköping, Sweden
| | - Gwladys Revêchon
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden
| | - Carla Bosia
- Italian Institute for Genomic Medicine (IIGM), 10126, Turin, Italy.,Department of Applied Science and Technology, Politecnico di Torino, 10129, Turin, Italy
| | - Andrea Pagnani
- Italian Institute for Genomic Medicine (IIGM), 10126, Turin, Italy.,Department of Applied Science and Technology, Politecnico di Torino, 10129, Turin, Italy
| | - Paolo Provero
- Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, 10126, Turin, Italy.,Center for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, 20132, Milan, Italy
| | - Thomas Gustafsson
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Helene Fischer
- Division of Clinical Physiology, Department of Laboratory Medicine, Karolinska Institutet, 14186, Huddinge, Sweden
| | - Maria Eriksson
- Department of Biosciences and Nutrition, Center for Innovative Medicine, Karolinska Institutet, 14157, Huddinge, Sweden.
| |
Collapse
|
465
|
Chyra Kufova Z, Sevcikova T, Januska J, Vojta P, Boday A, Vanickova P, Filipova J, Growkova K, Jelinek T, Hajduch M, Hajek R. Newly designed 11-gene panel reveals first case of hereditary amyloidosis captured by massive parallel sequencing. J Clin Pathol 2018; 71:687-694. [PMID: 29455155 PMCID: PMC6204976 DOI: 10.1136/jclinpath-2017-204978] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/21/2017] [Revised: 01/26/2018] [Accepted: 01/27/2018] [Indexed: 12/22/2022]
Abstract
AIMS Amyloidosis is caused by deposition of abnormal protein fibrils, leading to damage of organ function. Hereditary amyloidosis represents a monogenic disease caused by germline mutations in 11 amyloidogenic precursor protein genes. One of the important but non-specific symptoms of amyloidosis is hypertrophic cardiomyopathy. Diagnostics of hereditary amyloidosis is complicated and the real cause can remain overlooked. We aimed to design hereditary amyloidosis gene panel and to introduce new next-generation sequencing (NGS) approach to investigate hereditary amyloidosis in a cohort of patients with hypertrophic cardiomyopathy of unknown significance. METHODS Design of target enrichment DNA library preparation using Haloplex Custom Kit containing 11 amyloidogenic genes was followed by MiSeq Illumina sequencing and bioinformatics identification of germline variants using tool VarScan in a cohort of 40 patients. RESULTS We present design of NGS panel for 11 genes (TTR, FGA, APOA1, APOA2, LYZ, GSN, CST3, PRNP, APP, B2M, ITM2B) connected to various forms of amyloidosis. We detected one mutation, which is responsible for hereditary amyloidosis. Some other single nucleotide variants are so far undescribed or rare variants or represent common polymorphisms in European population. CONCLUSIONS We report one positive case of hereditary amyloidosis in a cohort of patients with hypertrophic cardiomyopathy of unknown significance and set up first panel for NGS in hereditary amyloidosis. This work may facilitate successful implementation of the NGS method by other researchers or clinicians and may improve the diagnostic process after validation.
Collapse
Affiliation(s)
- Zuzana Chyra Kufova
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic.,Department of Experimental Biology, Faculty of Science, Masaryk University, Brno, Czech Republic
| | - Tereza Sevcikova
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic
| | | | - Petr Vojta
- Faculty of Medicine and Dentistry, Institute of Molecular and Translational Medicine, Palacky University, Olomouc, Czech Republic
| | - Arpad Boday
- Laboratory of Molecular Biology, Department of Medical Genetics, Laboratory AGEL, Novy Jicin, Czech Republic
| | - Pavla Vanickova
- Laboratory of Molecular Biology, Department of Medical Genetics, Laboratory AGEL, Novy Jicin, Czech Republic
| | - Jana Filipova
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic.,Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Katerina Growkova
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic.,Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Tomas Jelinek
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic.,Department of Biology and Ecology, Faculty of Science, University of Ostrava, Ostrava, Czech Republic
| | - Marian Hajduch
- Faculty of Medicine and Dentistry, Institute of Molecular and Translational Medicine, Palacky University, Olomouc, Czech Republic
| | - Roman Hajek
- Department of Haematooncology, University Hospital Ostrava, Ostrava, Czech Republic.,Department of Clinical Studies, Faculty of Medicine, University of Ostrava, Ostrava, Czech Republic
| |
Collapse
|
466
|
Xu C. A review of somatic single nucleotide variant calling algorithms for next-generation sequencing data. Comput Struct Biotechnol J 2018; 16:15-24. [PMID: 29552334 PMCID: PMC5852328 DOI: 10.1016/j.csbj.2018.01.003] [Citation(s) in RCA: 153] [Impact Index Per Article: 21.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/08/2017] [Revised: 01/20/2018] [Accepted: 01/28/2018] [Indexed: 02/06/2023] Open
Abstract
Detection of somatic mutations holds great potential in cancer treatment and has been a very active research field in the past few years, especially since the breakthrough of the next-generation sequencing technology. A collection of variant calling pipelines have been developed with different underlying models, filters, input data requirements, and targeted applications. This review aims to enumerate these unique features of the state-of-the-art variant callers, in the hope to provide a practical guide for selecting the appropriate pipeline for specific applications. We will focus on the detection of somatic single nucleotide variants, ranging from traditional variant callers based on whole genome or exome sequencing of paired tumor-normal samples to recent low-frequency variant callers designed for targeted sequencing protocols with unique molecular identifiers. The variant callers have been extensively benchmarked with inconsistent performances across these studies. We will review the reference materials, datasets, and performance metrics that have been used in the benchmarking studies. In the end, we will discuss emerging trends and future directions of the variant calling algorithms.
Collapse
Affiliation(s)
- Chang Xu
- Life Science Research and Foundation, Qiagen Sciences, Inc., 6951 Executive Way, Frederick, Maryland 21703, USA
| |
Collapse
|
467
|
Alvarez JM, Bueno N, Cañas RA, Avila C, Cánovas FM, Ordás RJ. Analysis of the WUSCHEL-RELATED HOMEOBOX gene family in Pinus pinaster: New insights into the gene family evolution. PLANT PHYSIOLOGY AND BIOCHEMISTRY : PPB 2018; 123:304-318. [PMID: 29278847 DOI: 10.1016/j.plaphy.2017.12.031] [Citation(s) in RCA: 35] [Impact Index Per Article: 5.0] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 09/27/2017] [Revised: 12/16/2017] [Accepted: 12/18/2017] [Indexed: 05/23/2023]
Abstract
WUSCHEL-RELATED HOMEOBOX (WOX) genes are key players controlling stem cells in plants and can be divided into three clades according to the time of their appearance during plant evolution. Our knowledge of stem cell function in vascular plants other than angiosperms is limited, they separated from gymnosperms ca 300 million years ago and their patterning during embryogenesis differs significantly. For this reason, we have used the model gymnosperm Pinus pinaster to identify WOX genes and perform a thorough analysis of their gene expression patterns. Using transcriptomic data from a comprehensive range of tissues and stages of development we have shown three major outcomes: that the P. pinaster genome encodes at least fourteen members of the WOX family spanning all the major clades, that the genome of gymnosperms contains a WOX gene with no homologues in angiosperms representing a transitional stage between intermediate- and WUS-clade proteins, and that we can detect discrete WUS and WOX5 transcripts for the first time in a gymnosperm.
Collapse
Affiliation(s)
- José M Alvarez
- Departamento de Biología de Organismos y Sistemas, Universidad de Oviedo, Spain.
| | - Natalia Bueno
- Departamento de Biología de Organismos y Sistemas, Universidad de Oviedo, Spain
| | - Rafael A Cañas
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Spain
| | - Concepción Avila
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Spain
| | - Francisco M Cánovas
- Departamento de Biología Molecular y Bioquímica, Facultad de Ciencias, Universidad de Málaga, Spain
| | - Ricardo J Ordás
- Departamento de Biología de Organismos y Sistemas, Universidad de Oviedo, Spain
| |
Collapse
|
468
|
Jordan JT, Smith MJ, Walker JA, Erdin S, Talkowski ME, Merker VL, Ramesh V, Cai W, Harris GJ, Bredella MA, Seijo M, Suuberg A, Gusella JF, Plotkin SR. Pain correlates with germline mutation in schwannomatosis. Medicine (Baltimore) 2018; 97:e9717. [PMID: 29384852 PMCID: PMC5805424 DOI: 10.1097/md.0000000000009717] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Submit a Manuscript] [Subscribe] [Scholar Register] [Indexed: 11/26/2022] Open
Abstract
Schwannomatosis has been linked to germline mutations in the SMARCB1 and LZTR1 genes, and is frequently associated with pain.In a cohort study, we assessed the mutation status of 37 patients with clinically diagnosed schwannomatosis and compared to clinical data, whole body MRI (WBMRI), visual analog pain scale, and Short Form 36 (SF-36) bodily pain subscale.We identified a germline mutation in LZTR1 in 5 patients (13.5%) and SMARCB1 in 15 patients (40.5%), but found no germline mutation in 17 patients (45.9%). Peripheral schwannomas were detected in 3 LZTR1-mutant (60%) and 10 SMARCB1-mutant subjects (66.7%). Among those with peripheral tumors, the median tumor number was 4 in the LZTR1 group (median total body tumor volume 30 cc) and 10 in the SMARCB1 group (median volume 85cc), (P=.2915 for tumor number and P = .2289 for volume). mutation was associated with an increased prevalence of spinal schwannomas (100% vs 41%, P = .0197). The median pain score was 3.9/10 in the LZTR1 group and 0.5/10 in the SMARCB1 group (P = .0414), and SF-36 pain-associated quality of life was significantly worse in the LZTR1 group (P = .0106). Pain scores correlated with total body tumor volume (rho = 0.32471, P = .0499), but not with number of tumors (rho = 0.23065, P = .1696).We found no significant difference in quantitative tumor burden between mutational groups, but spinal schwannomas were more common in LZTR1-mutant patients. Pain was significantly higher in LZTR1-mutant than in SMARCB1-mutant patients, though spinal tumor location did not significantly correlate with pain. This suggests a possible genetic association with schwannomatosis-associated pain.
Collapse
Affiliation(s)
- Justin T. Jordan
- Department of Neurology
- Cancer Center, Massachusetts General Hospital, Boston, MA
| | - Miriam J. Smith
- Centre for Genomic Medicine, St Mary's Hospital, Division of Evolution and Genomic Sciences, School of Biological Sciences, University of Manchester, Manchester, UK
| | - James A. Walker
- Department of Neurology
- Molecular Neurogenetics Unit, Center for Genomic Medicine
| | - Serkan Erdin
- Molecular Neurogenetics Unit, Center for Genomic Medicine
| | - Michael E. Talkowski
- Department of Neurology
- Molecular Neurogenetics Unit, Center for Genomic Medicine
| | | | - Vijaya Ramesh
- Department of Neurology
- Molecular Neurogenetics Unit, Center for Genomic Medicine
| | - Wenli Cai
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Gordon J. Harris
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Miriam A. Bredella
- Department of Radiology, Massachusetts General Hospital and Harvard Medical School
| | - Marlon Seijo
- Cancer Center, Massachusetts General Hospital, Boston, MA
| | | | - James F. Gusella
- Department of Neurology
- Molecular Neurogenetics Unit, Center for Genomic Medicine
- Department of Genetics, Harvard Medical School, Boston, MA
| | - Scott R. Plotkin
- Department of Neurology
- Cancer Center, Massachusetts General Hospital, Boston, MA
| |
Collapse
|
469
|
Coll F, Phelan J, Hill-Cawthorne GA, Nair MB, Mallard K, Ali S, Abdallah AM, Alghamdi S, Alsomali M, Ahmed AO, Portelli S, Oppong Y, Alves A, Bessa TB, Campino S, Caws M, Chatterjee A, Crampin AC, Dheda K, Furnham N, Glynn JR, Grandjean L, Minh Ha D, Hasan R, Hasan Z, Hibberd ML, Joloba M, Jones-López EC, Matsumoto T, Miranda A, Moore DJ, Mocillo N, Panaiotov S, Parkhill J, Penha C, Perdigão J, Portugal I, Rchiad Z, Robledo J, Sheen P, Shesha NT, Sirgel FA, Sola C, Oliveira Sousa E, Streicher EM, Helden PV, Viveiros M, Warren RM, McNerney R, Pain A, Clark TG. Genome-wide analysis of multi- and extensively drug-resistant Mycobacterium tuberculosis. Nat Genet 2018; 50:307-316. [PMID: 29358649 DOI: 10.1038/s41588-017-0029-0] [Citation(s) in RCA: 219] [Impact Index Per Article: 31.3] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 02/03/2017] [Accepted: 12/01/2017] [Indexed: 12/30/2022]
Abstract
To characterize the genetic determinants of resistance to antituberculosis drugs, we performed a genome-wide association study (GWAS) of 6,465 Mycobacterium tuberculosis clinical isolates from more than 30 countries. A GWAS approach within a mixed-regression framework was followed by a phylogenetics-based test for independent mutations. In addition to mutations in established and recently described resistance-associated genes, novel mutations were discovered for resistance to cycloserine, ethionamide and para-aminosalicylic acid. The capacity to detect mutations associated with resistance to ethionamide, pyrazinamide, capreomycin, cycloserine and para-aminosalicylic acid was enhanced by inclusion of insertions and deletions. Odds ratios for mutations within candidate genes were found to reflect levels of resistance. New epistatic relationships between candidate drug-resistance-associated genes were identified. Findings also suggest the involvement of efflux pumps (drrA and Rv2688c) in the emergence of resistance. This study will inform the design of new diagnostic tests and expedite the investigation of resistance and compensatory epistatic mechanisms.
Collapse
Affiliation(s)
- Francesc Coll
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Jody Phelan
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Grant A Hill-Cawthorne
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
- Sydney Emerging Infections and Biosecurity Institute and School of Public Health, Sydney Medical School, University of Sydney, Sydney, New South Wales, Australia
| | - Mridul B Nair
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Kim Mallard
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Shahjahan Ali
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Abdallah M Abdallah
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Saad Alghamdi
- Laboratory Medicine Department, Faculty of Applied Medical Sciences, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Mona Alsomali
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Abdallah O Ahmed
- Department of Microbiology, Faculty of Medicine, Umm Al-Qura University, Makkah, Saudi Arabia
| | - Stephanie Portelli
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
- Department of Biochemistry and Molecular Biology, Bio21 Molecular Science and Biotechnology Institute, University of Melbourne, Melbourne, Victoria, Australia
| | - Yaa Oppong
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Adriana Alves
- National Mycobacterium Reference Laboratory, Porto, Portugal
| | | | - Susana Campino
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Maxine Caws
- Liverpool School of Tropical Medicine, Liverpool, UK
- Pham Ngoc Thach Hospital for TB and Lung Diseases, Ho Chi Minh City, Vietnam
| | | | - Amelia C Crampin
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
- Karonga Prevention Study, Chilumba, Karonga, Malawi
| | - Keertan Dheda
- Lung Infection and Immunity Unit, UCT Lung Institute, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa
| | - Nicholas Furnham
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Judith R Glynn
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK
- Karonga Prevention Study, Chilumba, Karonga, Malawi
| | - Louis Grandjean
- Laboratorio de Enfermedades Infecciosas, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Dang Minh Ha
- Pham Ngoc Thach Hospital for TB and Lung Diseases, Ho Chi Minh City, Vietnam
| | - Rumina Hasan
- Department of Pathology and Laboratory Medicine, Aga Khan University, Karachi, Pakistan
| | - Zahra Hasan
- Department of Pathology and Laboratory Medicine, Aga Khan University, Karachi, Pakistan
| | - Martin L Hibberd
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
| | - Moses Joloba
- Department of Medical Microbiology, Makerere University College of Health Sciences, Kampala, Uganda
| | - Edward C Jones-López
- Section of Infectious Diseases, Department of Medicine, Boston Medical Center and Boston University School of Medicine, Boston, MA, USA
| | | | - Anabela Miranda
- National Mycobacterium Reference Laboratory, Porto, Portugal
| | - David J Moore
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK
- Laboratorio de Enfermedades Infecciosas, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
| | - Nora Mocillo
- Reference Laboratory of Tuberculosis Control, Buenos Aires, Argentina
| | - Stefan Panaiotov
- National Center of Infectious and Parasitic Diseases, Sofia, Bulgaria
| | | | - Carlos Penha
- Instituto Gulbenkian de Ciência, Lisbon, Portugal
| | - João Perdigão
- iMed.ULisboa-Research Institute for Medicines, Faculdade de Farmácia, Universidade de Lisboa, Lisbon, Portugal
| | - Isabel Portugal
- iMed.ULisboa-Research Institute for Medicines, Faculdade de Farmácia, Universidade de Lisboa, Lisbon, Portugal
| | - Zineb Rchiad
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia
| | - Jaime Robledo
- Corporación para Investigaciones Biológicas, Universidad Pontificia Bolivariana, Medellín, Colombia
| | - Patricia Sheen
- Lung Infection and Immunity Unit, UCT Lung Institute, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa
| | | | - Frik A Sirgel
- Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Christophe Sola
- Institute for Integrative Cell Biology, CEA, CNRS, Université Paris-Saclay, Orsay, France
| | - Erivelton Oliveira Sousa
- Centro de Pesquisas Gonçalo Moniz, Fundação Oswaldo Cruz, Salvador, Brazil
- Laboratorio Central de Saúde Pública Professor Gonçalo Moniz, Salvador, Brazil
| | - Elizabeth M Streicher
- Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Paul Van Helden
- Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Miguel Viveiros
- Unidade de Microbiologia Médica, Instituto de Higiene e Medicina Tropical, Universidade Nova de Lisboa (UNL), Lisbon, Portugal
| | - Robert M Warren
- Division of Molecular Biology and Human Genetics, SAMRC Centre for Tuberculosis Research, DST/NRF Centre of Excellence for Biomedical Tuberculosis Research, Faculty of Medicine and Health Sciences, Stellenbosch University, Tygerberg, South Africa
| | - Ruth McNerney
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.
- Lung Infection and Immunity Unit, UCT Lung Institute, University of Cape Town, Groote Schuur Hospital, Cape Town, South Africa.
| | - Arnab Pain
- Pathogen Genomics Laboratory, BESE Division, King Abdullah University of Science and Technology (KAUST), Thuwal, Saudi Arabia.
- Global Station for Zoonosis Control, Global Institution for Collaborative Research and Education (GI-CoRE), Hokkaido University, Sapporo, Japan.
| | - Taane G Clark
- Faculty of Infectious and Tropical Diseases, London School of Hygiene and Tropical Medicine, London, UK.
- Faculty of Epidemiology and Population Health, London School of Hygiene and Tropical Medicine, London, UK.
| |
Collapse
|
470
|
Kolde R, Franzosa EA, Rahnavard G, Hall AB, Vlamakis H, Stevens C, Daly MJ, Xavier RJ, Huttenhower C. Host genetic variation and its microbiome interactions within the Human Microbiome Project. Genome Med 2018; 10:6. [PMID: 29378630 PMCID: PMC5789541 DOI: 10.1186/s13073-018-0515-8] [Citation(s) in RCA: 112] [Impact Index Per Article: 16.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/17/2017] [Accepted: 01/04/2018] [Indexed: 02/07/2023] Open
Abstract
Background Despite the increasing recognition that microbial communities within the human body are linked to health, we have an incomplete understanding of the environmental and molecular interactions that shape the composition of these communities. Although host genetic factors play a role in these interactions, these factors have remained relatively unexplored given the requirement for large population-based cohorts in which both genotyping and microbiome characterization have been performed. Methods We performed whole-genome sequencing of 298 donors from the Human Microbiome Project (HMP) healthy cohort study to accompany existing deep characterization of their microbiomes at various body sites. This analysis yielded an average sequencing depth of 32x, with which we identified 27 million (M) single nucleotide variants and 2.3 M insertions-deletions. Results Taxonomic composition and functional potential of the microbiome covaried significantly with genetic principal components in the gastrointestinal tract and oral communities, but not in the nares or vaginal microbiota. Example associations included validation of known associations between FUT2 secretor status, as well as a variant conferring hypolactasia near the LCT gene, with Bifidobacterium longum abundance in stool. The associations of microbial features with both high-level genetic attributes and single variants were specific to particular body sites, highlighting the opportunity to find unique genetic mechanisms controlling microbiome properties in the microbial communities from multiple body sites. Conclusions This study adds deep sequencing of host genomes to the body-wide microbiome sequences already extant from the HMP healthy cohort, creating a unique, versatile, and well-controlled reference for future studies seeking to identify host genetic modulators of the microbiome. Electronic supplementary material The online version of this article (10.1186/s13073-018-0515-8) contains supplementary material, which is available to authorized users.
Collapse
Affiliation(s)
- Raivo Kolde
- Center for Computational and Integrative Biology, Massachusetts General Hospital, 185 Cambridge St, Boston, MA, 02114, USA
| | - Eric A Franzosa
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, MA, 02115, USA.,The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | - Gholamali Rahnavard
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, MA, 02115, USA.,The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | | | - Hera Vlamakis
- The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | - Christine Stevens
- The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA
| | - Mark J Daly
- The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.,Center for Human Genetic Research, Massachusetts General Hospital, 185 Cambridge St, Boston, MA, 02114, USA
| | - Ramnik J Xavier
- Center for Computational and Integrative Biology, Massachusetts General Hospital, 185 Cambridge St, Boston, MA, 02114, USA. .,The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA. .,Center for Microbiome Informatics & Therapeutics, Massachusetts Institute of Technology, Cambridge, MA, 02139, USA.
| | - Curtis Huttenhower
- Department of Biostatistics, Harvard T. H. Chan School of Public Health, 655 Huntington Ave, Boston, MA, 02115, USA. .,The Broad Institute of MIT and Harvard, 415 Main St, Cambridge, MA, 02142, USA.
| |
Collapse
|
471
|
Kotelnikova EA, Pyatnitskiy M, Paleeva A, Kremenetskaya O, Vinogradov D. Practical aspects of NGS-based pathways analysis for personalized cancer science and medicine. Oncotarget 2018; 7:52493-52516. [PMID: 27191992 PMCID: PMC5239569 DOI: 10.18632/oncotarget.9370] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.9] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/02/2015] [Accepted: 04/18/2016] [Indexed: 12/17/2022] Open
Abstract
Nowadays, the personalized approach to health care and cancer care in particular is becoming more and more popular and is taking an important place in the translational medicine paradigm. In some cases, detection of the patient-specific individual mutations that point to a targeted therapy has already become a routine practice for clinical oncologists. Wider panels of genetic markers are also on the market which cover a greater number of possible oncogenes including those with lower reliability of resulting medical conclusions. In light of the large availability of high-throughput technologies, it is very tempting to use complete patient-specific New Generation Sequencing (NGS) or other "omics" data for cancer treatment guidance. However, there are still no gold standard methods and protocols to evaluate them. Here we will discuss the clinical utility of each of the data types and describe a systems biology approach adapted for single patient measurements. We will try to summarize the current state of the field focusing on the clinically relevant case-studies and practical aspects of data processing.
Collapse
Affiliation(s)
- Ekaterina A Kotelnikova
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Institute Biomedical Research August Pi Sunyer (IDIBAPS), Hospital Clinic of Barcelona, Barcelona, Spain
| | - Mikhail Pyatnitskiy
- Personal Biomedicine, Moscow, Russia.,Orekhovich Institute of Biomedical Chemistry, Moscow, Russia.,Pirogov Russian National Research Medical University, Moscow, Russia
| | | | - Olga Kremenetskaya
- Personal Biomedicine, Moscow, Russia.,Center for Theoretical Problems of Physicochemical Pharmacology, Russian Academy of Sciences, Moscow, Russia
| | - Dmitriy Vinogradov
- Personal Biomedicine, Moscow, Russia.,A. A. Kharkevich Institute for Information Transmission Problems, Russian Academy of Sciences, Moscow, Russia.,Lomonosov Moscow State University, Moscow, Russia
| |
Collapse
|
472
|
Lindsay H, Burger A, Biyong B, Felker A, Hess C, Zaugg J, Chiavacci E, Anders C, Jinek M, Mosimann C, Robinson MD. CrispRVariants charts the mutation spectrum of genome engineering experiments. Nat Biotechnol 2018; 34:701-2. [PMID: 27404876 DOI: 10.1038/nbt.3628] [Citation(s) in RCA: 105] [Impact Index Per Article: 15.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 12/25/2022]
Affiliation(s)
- Helen Lindsay
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,SIB Swiss Insitute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Alexa Burger
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Berthin Biyong
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,SIB Swiss Insitute of Bioinformatics, University of Zurich, Zurich, Switzerland
| | - Anastasia Felker
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Christopher Hess
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Jonas Zaugg
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Elena Chiavacci
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Carolin Anders
- Institute of Biochemistry, University of Zurich, Zurich, Switzerland
| | - Martin Jinek
- Institute of Biochemistry, University of Zurich, Zurich, Switzerland
| | - Christian Mosimann
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland
| | - Mark D Robinson
- Institute of Molecular Life Sciences, University of Zurich, Zurich, Switzerland.,SIB Swiss Insitute of Bioinformatics, University of Zurich, Zurich, Switzerland
| |
Collapse
|
473
|
Reilly MC, Kim J, Lynn J, Simmons BA, Gladden JM, Magnuson JK, Baker SE. Forward genetics screen coupled with whole-genome resequencing identifies novel gene targets for improving heterologous enzyme production in Aspergillus niger. Appl Microbiol Biotechnol 2018; 102:1797-1807. [PMID: 29305699 PMCID: PMC5794824 DOI: 10.1007/s00253-017-8717-3] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/22/2017] [Revised: 12/11/2017] [Accepted: 12/12/2017] [Indexed: 12/12/2022]
Abstract
Plant biomass, once reduced to its composite sugars, can be converted to fuel substitutes. One means of overcoming the recalcitrance of lignocellulose is pretreatment followed by enzymatic hydrolysis. However, currently available commercial enzyme cocktails are inhibited in the presence of residual pretreatment chemicals. Recent studies have identified a number of cellulolytic enzymes from bacteria that are tolerant to pretreatment chemicals such as ionic liquids. The challenge now is generation of these enzymes in copious amounts, an arena where fungal organisms such as Aspergillus niger have proven efficient. Fungal host strains still need to be engineered to increase production titers of heterologous protein over native enzymes, which has been a difficult task. Here, we developed a forward genetics screen coupled with whole-genome resequencing to identify specific lesions responsible for a protein hyper-production phenotype in A. niger. This strategy successfully identified novel targets, including a low-affinity glucose transporter, MstC, whose deletion significantly improved secretion of recombinant proteins driven by a glucoamylase promoter.
Collapse
Affiliation(s)
- Morgann C Reilly
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Chemical and Biological Processes Development Group, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Joonhoon Kim
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Chemical and Biological Processes Development Group, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Jed Lynn
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Naval Medical Research Unit Dayton, Wright-Patterson Air Force Base, Dayton, OH, 45433, USA
| | - Blake A Simmons
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Biological Systems and Engineering Division, Lawrence Berkeley National Laboratory, Berkeley, CA, 94720, USA
| | - John M Gladden
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Biomass Science and Conversion Technologies Department, Sandia National Laboratories, Livermore, CA, 94551, USA
| | - Jon K Magnuson
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA.,Chemical and Biological Processes Development Group, Pacific Northwest National Laboratory, Richland, WA, 99352, USA
| | - Scott E Baker
- Joint BioEnergy Institute, Emeryville, CA, 94608, USA. .,Biosystems Design and Simulation Group, Environmental Molecular Sciences Division, Pacific Northwest National Laboratory, Richland, WA, 99352, USA.
| |
Collapse
|
474
|
Li X, Gu W, Sun S, Chen Z, Chen J, Song W, Zhao H, Lai J. Defective Kernel 39 encodes a PPR protein required for seed development in maize. JOURNAL OF INTEGRATIVE PLANT BIOLOGY 2018; 60:45-64. [PMID: 28981206 DOI: 10.1111/jipb.12602] [Citation(s) in RCA: 40] [Impact Index Per Article: 5.7] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Subscribe] [Scholar Register] [Received: 07/30/2017] [Accepted: 09/30/2017] [Indexed: 05/10/2023]
Abstract
RNA editing is a posttranscriptional process that is important in mitochondria and plastids of higher plants. All RNA editing-specific trans-factors reported so far belong to PLS-class of pentatricopeptide repeat (PPR) proteins. Here, we report the map-based cloning and molecular characterization of a defective kernel mutant dek39 in maize. Loss of Dek39 function leads to delayed embryogenesis and endosperm development, reduced kernel size, and seedling lethality. Dek39 encodes an E sub-class PPR protein that targets to both mitochondria and chloroplasts, and is involved in RNA editing in mitochondrial NADH dehydrogenase3 (nad3) at nad3-247 and nad3-275. C-to-U editing of nad3-275 is not conserved and even lost in Arabidopsis, consistent with the idea that no close DEK39 homologs are present in Arabidopsis. However, the amino acids generated by editing nad3-247 and nad3-275 are highly conserved in many other plant species, and the reductions of editing at these two sites decrease the activity of mitochondria NADH dehydrogenase complex I, indicating that the alteration of amino acid sequence is necessary for Nad3 function. Our results indicate that Dek39 encodes an E sub-class PPR protein that is involved in RNA editing of multiple sites and is necessary for seed development of maize.
Collapse
Affiliation(s)
- Xiaojie Li
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Wei Gu
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Silong Sun
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Zongliang Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Jing Chen
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Weibin Song
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Haiming Zhao
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| | - Jinsheng Lai
- State Key Laboratory of Agrobiotechnology and National Maize Improvement Center, Department of Plant Genetics and Breeding, China Agricultural University, Beijing 100193, China
| |
Collapse
|
475
|
Yu F, Ma N, Zhang X, Tian S, Geng L, Xu W, Wang M, Jia Y, Liu X, Ma J, Quan Y, Zhang C, Guo L, An W, Liu D. Comprehensive investigating of cytokine and receptor related genes variants in patients with chronic hepatitis B virus infection. Cytokine 2017; 103:10-14. [PMID: 29287219 DOI: 10.1016/j.cyto.2017.12.017] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/30/2017] [Revised: 12/12/2017] [Accepted: 12/15/2017] [Indexed: 12/17/2022]
Abstract
BACKGROUND & AIMS Chronic hepatitis B virus (HBV) infection is a global health problem and the outcome are associated with both viral factors and host genetic factors. High Throughput Sequencing (HTS) technology were used to identify variants associated with liver disease. METHODS Fifty-five Chronic hepatitis B (CHB) patients, fifty-three self-healing HBV (SH) patients and 53 healthy controls (HC) were recruited, 404 cytokine and cytokine receptor related genes were captured and sequenced at high depth (>900X), both variant (Fischer's exact test, P value < 0.05) and gene (SKAT-O gene level test, adjust P value < 0.05) level association were used to identify variants and genes associated with CHB. RESULTS Total 5083 variants have been detected, fifty-four variants were found associated with CHB, most (29/32) variants were located in HLA region, including HLA-B, HLA-C, HLA-DQA1, HLA-DQB1, HLA-DQB2, HLA-DRB1 and HLA-DRB5. Several missense variants were found associated with CHB, including p.E226K in PVR (poliovirus receptor), p.E400A and p.C431R in IL4R (interleukin 4 receptor). Four variants located in 3'UTR (untranslated region) have also been found associated with CHB. CONCLUSION Our study revealed that high through target region sequencing, combined with association analysis at variant and gene level, would be a good way to found variants and genes associated with CHB even at small sample size. Our data implied that chronic hepatitis B patients who carry these variants need intensive monitoring.
Collapse
Affiliation(s)
- Fengxue Yu
- Department of Science and Technology, The Hebei Key Laboratory of Gastroenterology, The Second Hospital of Hebei Medical University, Shijiazhuang, China.
| | - Ning Ma
- Department of Social Medicine and Health Care Management, Hebei Medical University, Shijiazhuang, China
| | - Xiaolin Zhang
- Division of Epidemiology, School of Public Health, Hebei Medical University, Shijiazhuang, China
| | - Suzhai Tian
- Department of Science and Technology, The Hebei Key Laboratory of Gastroenterology, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Lianxia Geng
- Department of Science and Technology, The Hebei Key Laboratory of Gastroenterology, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Weili Xu
- Department of Pediatric Surgery, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Mingbang Wang
- Shenzhen Following Precision Medical Research institute, Shenzhen, Guangdong, China
| | - Yuan Jia
- Department of Infectious Disease Control, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Xuechen Liu
- Department of Gastroenterology, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Junji Ma
- Department of Gastroenterology, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Yuan Quan
- Department of Infectious Disease, Hebei Chest Hospital, Shijiazhuang, China
| | - Chaojun Zhang
- Department of Central Laboratory, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Lina Guo
- Department of Central Laboratory, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Wenting An
- Department of Central Laboratory, The Second Hospital of Hebei Medical University, Shijiazhuang, China
| | - Dianwu Liu
- Division of Epidemiology, School of Public Health, Hebei Medical University, Shijiazhuang, China
| |
Collapse
|
476
|
Rapid detection of BRCA1/2 recurrent mutations in Chinese breast and ovarian cancer patients with multiplex SNaPshot genotyping panels. Oncotarget 2017; 9:7832-7843. [PMID: 29487695 PMCID: PMC5814262 DOI: 10.18632/oncotarget.23471] [Citation(s) in RCA: 6] [Impact Index Per Article: 0.8] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/11/2017] [Accepted: 09/20/2017] [Indexed: 12/30/2022] Open
Abstract
BRCA1/2 mutations are significant risk factors for hereditary breast and ovarian cancer (HBOC), its mutation frequency in HBOC of Chinese ethnicity is around 9%, in which nearly half are recurrent mutations. In Hong Kong and China, genetic testing and counseling are not as common as in the West. To reduce the barrier of testing, a multiplex SNaPshot genotyping panel that targeted 25 Chinese BRCA1/2 mutation hotspots was developed, and its feasibility was evaluated in a local cohort of 441 breast and 155 ovarian cancer patients. For those who tested negative, they were then subjected to full-gene testing with next-generation sequencing (NGS). BRCA mutation prevalence in this cohort was 8.05% and the yield of the recurrent panel was 3.52%, identifying over 40% of the mutation carriers. Moreover, from 79 Chinese breast cancer cases recruited overseas, 2 recurrent mutations and one novel BRCA2 mutation were detected by the panel and NGS respectively. The developed genotyping panel showed to be an easy-to-perform and more affordable testing tool that can provide important contributions to improve the healthcare of Chinese women with cancer as well as family members that harbor high risk mutations for HBOC.
Collapse
|
477
|
Comprehensive investigation of cytokine- and immune-related gene variants in HBV-associated hepatocellular carcinoma patients. Biosci Rep 2017; 37:BSR20171263. [PMID: 29138264 PMCID: PMC5725607 DOI: 10.1042/bsr20171263] [Citation(s) in RCA: 10] [Impact Index Per Article: 1.3] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/15/2017] [Revised: 11/08/2017] [Accepted: 11/13/2017] [Indexed: 02/06/2023] Open
Abstract
Host genotype may be closely related to the different outcomes of Hepatitis B virus (HBV) infection. To identify the association of variants and HBV infection, we comprehensively investigated the cytokine- and immune-related gene mutations in patients with HBV associated hepatocellular carcinoma (HBV-HCC). Fifty-three HBV-HCC patients, 53 self-healing cases (SH) with HBV infection history and 53 healthy controls (HCs) were recruited, the whole exon region of 404 genes were sequenced at >900× depth. Comprehensive variants and gene levels were compared between HCC and HC, and HCC and SH. Thirty-nine variants (adjusted P<0.0001, Fisher's exact test) and 11 genes (adjusted P<0.0001, optimal unified approach for rare variant association test (SKAT-O) gene level test) were strongly associated with HBV-HCC. Thirty-four variants were from eight human leukocyte antigen (HLA) genes that were previously reported to be associated with HBV-HCC. The novelties of our study are: five variants (rs579876, rs579877, rs368692979, NM_145007:c.*131_*130delTG, NM_139165:exon5:c.623-2->TT) from three genes (REAT1E, NOD-like receptor (NLR) protein 11 (NLRP11), hydroxy-carboxylic acid receptor 2 (HCAR2)) were found strongly associated with HBV-HCC. We found 39 different variants in 11 genes that were significantly related to HBV-HCC. Five of them were new findings. Our data implied that chronic hepatitis B patients who carry these variants are at a high risk of developing HCC.
Collapse
|
478
|
Bodian DL, Vilboux T, Hourigan SK, Jenevein CL, Mani H, Kent KC, Khromykh A, Solomon BD, Hauser NS. Genomic analysis of an infant with intractable diarrhea and dilated cardiomyopathy. Cold Spring Harb Mol Case Stud 2017; 3:mcs.a002055. [PMID: 28701297 PMCID: PMC5701300 DOI: 10.1101/mcs.a002055] [Citation(s) in RCA: 13] [Impact Index Per Article: 1.6] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/07/2017] [Accepted: 06/26/2017] [Indexed: 12/22/2022] Open
Abstract
We describe a case of an infant presenting with intractable diarrhea who subsequently developed dilated cardiomyopathy, for whom a diagnosis was not initially achieved despite extensive clinical testing, including panel-based genetic testing. Research-based whole-genome sequences of the proband and both parents were analyzed by the SAVANNA pipeline, a variant prioritization strategy integrating features of variants, genes, and phenotypes, which was implemented using publicly available tools. Although the intestinal morphological abnormalities characteristic of congenital tufting enteropathy (CTE) were not observed in the initial clinical gastrointestinal tract biopsies of the proband, an intronic variant, EPCAM c.556-14A>G, previously identified as pathogenic for CTE, was found in the homozygous state. A newborn cousin of the proband also presenting with intractable diarrhea was found to carry the same homozygous EPCAM variant, and clinical testing revealed intestinal tufting and loss of EPCAM staining. This variant, however, was considered nonexplanatory for the proband's dilated cardiomyopathy, which could be a sequela of the child's condition and/or related to other genetic variants, which include de novo mutations in the genes NEDD4L and GSK3A and a maternally inherited SCN5A variant. This study illustrates three ways in which genomic sequencing can aid in the diagnosis of clinically challenging patients: differential diagnosis despite atypical clinical presentation, distinguishing the possibilities of a syndromic condition versus multiple conditions, and generating hypotheses for novel contributory genes.
Collapse
Affiliation(s)
- Dale L Bodian
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| | - Thierry Vilboux
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| | - Suchitra K Hourigan
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA.,Inova Children's Hospital, Falls Church, Virginia 22042, USA
| | - Callie L Jenevein
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| | - Haresh Mani
- Department of Pathology, Inova Fairfax Hospital, Falls Church, Virginia 22042, USA
| | | | - Alina Khromykh
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| | - Benjamin D Solomon
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| | - Natalie S Hauser
- Inova Translational Medicine Institute, Inova Health System, Falls Church, Virginia 22042, USA
| |
Collapse
|
479
|
Crawford NG, Kelly DE, Hansen MEB, Beltrame MH, Fan S, Bowman SL, Jewett E, Ranciaro A, Thompson S, Lo Y, Pfeifer SP, Jensen JD, Campbell MC, Beggs W, Hormozdiari F, Mpoloka SW, Mokone GG, Nyambo T, Meskel DW, Belay G, Haut J, Rothschild H, Zon L, Zhou Y, Kovacs MA, Xu M, Zhang T, Bishop K, Sinclair J, Rivas C, Elliot E, Choi J, Li SA, Hicks B, Burgess S, Abnet C, Watkins-Chow DE, Oceana E, Song YS, Eskin E, Brown KM, Marks MS, Loftus SK, Pavan WJ, Yeager M, Chanock S, Tishkoff SA. Loci associated with skin pigmentation identified in African populations. Science 2017; 358:eaan8433. [PMID: 29025994 PMCID: PMC5759959 DOI: 10.1126/science.aan8433] [Citation(s) in RCA: 213] [Impact Index Per Article: 26.6] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/27/2017] [Accepted: 10/03/2017] [Indexed: 12/13/2022]
Abstract
Despite the wide range of skin pigmentation in humans, little is known about its genetic basis in global populations. Examining ethnically diverse African genomes, we identify variants in or near SLC24A5, MFSD12, DDB1, TMEM138, OCA2, and HERC2 that are significantly associated with skin pigmentation. Genetic evidence indicates that the light pigmentation variant at SLC24A5 was introduced into East Africa by gene flow from non-Africans. At all other loci, variants associated with dark pigmentation in Africans are identical by descent in South Asian and Australo-Melanesian populations. Functional analyses indicate that MFSD12 encodes a lysosomal protein that affects melanogenesis in zebrafish and mice, and that mutations in melanocyte-specific regulatory regions near DDB1/TMEM138 correlate with expression of ultraviolet response genes under selection in Eurasians.
Collapse
Affiliation(s)
- Nicholas G Crawford
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Derek E Kelly
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Genomics and Computational Biology Graduate Program, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Matthew E B Hansen
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Marcia H Beltrame
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shaohua Fan
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Shanna L Bowman
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Ethan Jewett
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94704, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94704, USA
| | - Alessia Ranciaro
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Simon Thompson
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Yancy Lo
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Susanne P Pfeifer
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Jeffrey D Jensen
- School of Life Sciences, Arizona State University, Tempe, AZ 85287, USA
| | - Michael C Campbell
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Biology, Howard University, Washington, DC 20059, USA
| | - William Beggs
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Farhad Hormozdiari
- Department of Epidemiology, Harvard T.H. Chan School of Public Health, Boston, MA 02115, USA
- Program in Medical and Population Genetics, Broad Institute of Massachusetts Institute of Technology (MIT) and Harvard, Cambridge, MA 02142, USA
| | | | - Gaonyadiwe George Mokone
- Department of Biomedical Sciences, University of Botswana School of Medicine, Gaborone, Botswana
| | - Thomas Nyambo
- Department of Biochemistry, Muhimbili University of Health and Allied Sciences, Dar es Salaam, Tanzania
| | | | - Gurja Belay
- Department of Biology, Addis Ababa University, Addis Ababa, Ethiopia
| | - Jake Haut
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Harriet Rothschild
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Leonard Zon
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Howard Hughes Medical Institute, Harvard Medical School, Boston, MA 02115, USA
| | - Yi Zhou
- Stem Cell Program, Division of Hematology and Oncology, Pediatric Hematology Program, Boston Children's Hospital and Dana-Farber Cancer Institute, Harvard Medical School, Boston, MA 02115, USA
- Harvard Stem Cell Institute, Harvard University, Cambridge, MA 02138, USA
| | - Michael A Kovacs
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Mai Xu
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Tongwu Zhang
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Kevin Bishop
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jason Sinclair
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Cecilia Rivas
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Eugene Elliot
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Jiyeon Choi
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Shengchao A Li
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Belynda Hicks
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Shawn Burgess
- Translational and Functional Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Christian Abnet
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
| | - Dawn E Watkins-Chow
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Elena Oceana
- Department of Molecular Pharmacology, Physiology and Biotechnology, Brown University, Providence, RI 02912, USA
| | - Yun S Song
- Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, Berkeley, CA 94704, USA
- Department of Statistics, University of California, Berkeley, Berkeley, CA 94704, USA
- Chan Zuckerberg Biohub, San Francisco, CA 94158, USA
- Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
- Department of Mathematics, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Eleazar Eskin
- Department of Computer Science and Department of Human Genetics, University of California, Los Angeles, Los Angeles, CA 90095, USA
| | - Kevin M Brown
- Laboratory of Translational Genomics, Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Michael S Marks
- Department of Pathology and Laboratory Medicine, Children's Hospital of Philadelphia Research Institute, Philadelphia, PA 19104, USA
- Department of Pathology and Laboratory Medicine and Department of Physiology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA
| | - Stacie K Loftus
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - William J Pavan
- Genetic Disease Research Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD 20892, USA
| | - Meredith Yeager
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
- Frederick National Laboratory for Cancer Research, Leidos Biomedical Research Inc., Frederick, MD 21701, USA
| | - Stephen Chanock
- Division of Cancer Epidemiology and Genetics, National Cancer Institute, National Institutes of Health, Rockville, MD 20892, USA
| | - Sarah A Tishkoff
- Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.
- Department of Biology, School of Arts and Sciences, University of Pennsylvania, Philadelphia, PA 19104, USA
| |
Collapse
|
480
|
Smith DR, Stanley CM, Foss T, Boles RG, McKernan K. Rare genetic variants in the endocannabinoid system genes CNR1 and DAGLA are associated with neurological phenotypes in humans. PLoS One 2017; 12:e0187926. [PMID: 29145497 PMCID: PMC5690672 DOI: 10.1371/journal.pone.0187926] [Citation(s) in RCA: 52] [Impact Index Per Article: 6.5] [Reference Citation Analysis] [Abstract] [MESH Headings] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/17/2017] [Accepted: 10/27/2017] [Indexed: 12/24/2022] Open
Abstract
Rare genetic variants in the core endocannabinoid system genes CNR1, CNR2, DAGLA, MGLL and FAAH were identified in molecular testing data from 6,032 patients with a broad spectrum of neurological disorders. The variants were evaluated for association with phenotypes similar to those observed in the orthologous gene knockouts in mice. Heterozygous rare coding variants in CNR1, which encodes the type 1 cannabinoid receptor (CB1), were found to be significantly associated with pain sensitivity (especially migraine), sleep and memory disorders—alone or in combination with anxiety—compared to a set of controls without such CNR1 variants. Similarly, heterozygous rare variants in DAGLA, which encodes diacylglycerol lipase alpha, were found to be significantly associated with seizures and neurodevelopmental disorders, including autism and abnormalities of brain morphology, compared to controls. Rare variants in MGLL, FAAH and CNR2 were not associated with any neurological phenotypes in the patients tested. Diacylglycerol lipase alpha synthesizes the endocannabinoid 2-AG in the brain, which interacts with CB1 receptors. The phenotypes associated with rare CNR1 variants are reminiscent of those implicated in the theory of clinical endocannabinoid deficiency syndrome. The severe phenotypes associated with rare DAGLA variants underscore the critical role of rapid 2-AG synthesis and the endocannabinoid system in regulating neurological function and development. Mapping of the variants to the 3D structure of the type 1 cannabinoid receptor, or primary structure of diacylglycerol lipase alpha, reveals clustering of variants in certain structural regions and is consistent with impacts to function.
Collapse
Affiliation(s)
- Douglas R. Smith
- Courtagen Life Sciences, Inc., Woburn, MA, United States of America
- * E-mail:
| | | | - Theodore Foss
- Courtagen Life Sciences, Inc., Woburn, MA, United States of America
| | - Richard G. Boles
- Courtagen Life Sciences, Inc., Woburn, MA, United States of America
| | - Kevin McKernan
- Courtagen Life Sciences, Inc., Woburn, MA, United States of America
| |
Collapse
|
481
|
Abstract
Storing biologically equivalent indels as distinct entries in databases causes data redundancy, and misleads downstream analysis. It is thus desirable to have a unified system for identifying and representing equivalent indels. Moreover, a unified system is also desirable to compare the indel calling results produced by different tools. This paper describes UPS-indel, a utility tool that creates a universal positioning system for indels so that equivalent indels can be uniquely determined by their coordinates in the new system, which also can be used to compare different indel calling results. UPS-indel identifies 15% redundant indels in dbSNP, 29% in COSMIC coding, and 13% in COSMIC noncoding datasets across all human chromosomes, higher than previously reported. Comparing the performance of UPS-indel with existing variant normalization tools vt normalize, BCFtools, and GATK LeftAlignAndTrimVariants shows that UPS-indel is able to identify 456,352 more redundant indels in dbSNP; 2,118 more in COSMIC coding, and 553 more in COSMIC noncoding indel dataset in addition to the ones reported jointly by these tools. Moreover, comparing UPS-indel to state-of-the-art approaches for indel call set comparison demonstrates its clear superiority in finding common indels among call sets. UPS-indel is theoretically proven to find all equivalent indels, and thus exhaustive.
Collapse
|
482
|
Owens GL, Todesco M, Drummond EBM, Yeaman S, Rieseberg LH. A novel post hoc method for detecting index switching finds no evidence for increased switching on the Illumina HiSeq X. Mol Ecol Resour 2017; 18:169-175. [DOI: 10.1111/1755-0998.12713] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/25/2017] [Revised: 08/04/2017] [Accepted: 08/08/2017] [Indexed: 11/26/2022]
Affiliation(s)
- Gregory L. Owens
- Department of Botany and Beaty Biodiversity Centre; University of British Columbia; Vancouver BC Canada
| | - Marco Todesco
- Department of Botany and Beaty Biodiversity Centre; University of British Columbia; Vancouver BC Canada
| | - Emily B. M. Drummond
- Department of Botany and Beaty Biodiversity Centre; University of British Columbia; Vancouver BC Canada
| | - Sam Yeaman
- Department of Biological Sciences; University of Calgary; Calgary AB Canada
| | - Loren H. Rieseberg
- Department of Botany and Beaty Biodiversity Centre; University of British Columbia; Vancouver BC Canada
| |
Collapse
|
483
|
Need for speed in accurate whole-genome data analysis: GENALICE MAP challenges BWA/GATK more than PEMapper/PECaller and Isaac. Proc Natl Acad Sci U S A 2017; 114:E8320-E8322. [PMID: 28916731 DOI: 10.1073/pnas.1713830114] [Citation(s) in RCA: 16] [Impact Index Per Article: 2.0] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 11/18/2022] Open
|
484
|
Fuentes-Pardo AP, Ruzzante DE. Whole-genome sequencing approaches for conservation biology: Advantages, limitations and practical recommendations. Mol Ecol 2017; 26:5369-5406. [PMID: 28746784 DOI: 10.1111/mec.14264] [Citation(s) in RCA: 176] [Impact Index Per Article: 22.0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 04/19/2017] [Revised: 06/23/2017] [Accepted: 06/28/2017] [Indexed: 12/14/2022]
Abstract
Whole-genome resequencing (WGR) is a powerful method for addressing fundamental evolutionary biology questions that have not been fully resolved using traditional methods. WGR includes four approaches: the sequencing of individuals to a high depth of coverage with either unresolved or resolved haplotypes, the sequencing of population genomes to a high depth by mixing equimolar amounts of unlabelled-individual DNA (Pool-seq) and the sequencing of multiple individuals from a population to a low depth (lcWGR). These techniques require the availability of a reference genome. This, along with the still high cost of shotgun sequencing and the large demand for computing resources and storage, has limited their implementation in nonmodel species with scarce genomic resources and in fields such as conservation biology. Our goal here is to describe the various WGR methods, their pros and cons and potential applications in conservation biology. WGR offers an unprecedented marker density and surveys a wide diversity of genetic variations not limited to single nucleotide polymorphisms (e.g., structural variants and mutations in regulatory elements), increasing their power for the detection of signatures of selection and local adaptation as well as for the identification of the genetic basis of phenotypic traits and diseases. Currently, though, no single WGR approach fulfils all requirements of conservation genetics, and each method has its own limitations and sources of potential bias. We discuss proposed ways to minimize such biases. We envision a not distant future where the analysis of whole genomes becomes a routine task in many nonmodel species and fields including conservation biology.
Collapse
|
485
|
Kundu K, Pal LR, Yin Y, Moult J. Determination of disease phenotypes and pathogenic variants from exome sequence data in the CAGI 4 gene panel challenge. Hum Mutat 2017; 38:1201-1216. [PMID: 28497567 PMCID: PMC5576720 DOI: 10.1002/humu.23249] [Citation(s) in RCA: 4] [Impact Index Per Article: 0.5] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 03/30/2017] [Accepted: 04/28/2017] [Indexed: 01/06/2023]
Abstract
The use of gene panel sequence for diagnostic and prognostic testing is now widespread, but there are so far few objective tests of methods to interpret these data. We describe the design and implementation of a gene panel sequencing data analysis pipeline (VarP) and its assessment in a CAGI4 community experiment. The method was applied to clinical gene panel sequencing data of 106 patients, with the goal of determining which of 14 disease classes each patient has and the corresponding causative variant(s). The disease class was correctly identified for 36 cases, including 10 where the original clinical pipeline did not find causative variants. For a further seven cases, we found strong evidence of an alternative disease to that tested. Many of the potentially causative variants are missense, with no previous association with disease, and these proved the hardest to correctly assign pathogenicity or otherwise. Post analysis showed that three-dimensional structure data could have helped for up to half of these cases. Over-reliance on HGMD annotation led to a number of incorrect disease assignments. We used a largely ad hoc method to assign probabilities of pathogenicity for each variant, and there is much work still to be done in this area.
Collapse
Affiliation(s)
- Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850, USA
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742, USA
| |
Collapse
|
486
|
Pal LR, Kundu K, Yin Y, Moult J. CAGI4 SickKids clinical genomes challenge: A pipeline for identifying pathogenic variants. Hum Mutat 2017; 38:1169-1181. [PMID: 28512736 PMCID: PMC5577808 DOI: 10.1002/humu.23257] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/25/2016] [Revised: 05/09/2017] [Accepted: 05/10/2017] [Indexed: 12/21/2022]
Abstract
Compared with earlier more restricted sequencing technologies, identification of rare disease variants using whole-genome sequence has the possibility of finding all causative variants, but issues of data quality and an overwhelming level of background variants complicate the analysis. The CAGI4 SickKids clinical genome challenge provided an opportunity to assess the landscape of variants found in a difficult set of 25 unsolved rare disease cases. To address the challenge, we developed a three-stage pipeline, first carefully analyzing data quality, then classifying high-quality gene-specific variants into seven categories, and finally examining each candidate variant for compatibility with the often complex phenotypes of these patients for final prioritization. Variants consistent with the phenotypes were found in 24 out of the 25 cases, and in a number of these, there are prioritized variants in multiple genes. Data quality analysis suggests that some of the selected variants are likely incorrect calls, complicating interpretation. The data providers followed up on three suggested variants with Sanger sequencing, and in one case, a prioritized variant was confirmed as likely causative by the referring physician, providing a diagnosis in a previously intractable case.
Collapse
Affiliation(s)
- Lipika R. Pal
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
| | - Kunal Kundu
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - Yizhou Yin
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Computational Biology, Bioinformatics and Genomics, Biological Sciences Graduate Program, University of Maryland, College Park, MD 20742, USA
| | - John Moult
- Institute for Bioscience and Biotechnology Research, University of Maryland, 9600 Gudelsky Drive, Rockville, MD 20850
- Department of Cell Biology and Molecular Genetics, University of Maryland, College Park, MD 20742
| |
Collapse
|
487
|
Dallery JF, Lapalu N, Zampounis A, Pigné S, Luyten I, Amselem J, Wittenberg AHJ, Zhou S, de Queiroz MV, Robin GP, Auger A, Hainaut M, Henrissat B, Kim KT, Lee YH, Lespinet O, Schwartz DC, Thon MR, O’Connell RJ. Gapless genome assembly of Colletotrichum higginsianum reveals chromosome structure and association of transposable elements with secondary metabolite gene clusters. BMC Genomics 2017; 18:667. [PMID: 28851275 PMCID: PMC5576322 DOI: 10.1186/s12864-017-4083-x] [Citation(s) in RCA: 67] [Impact Index Per Article: 8.4] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/24/2017] [Accepted: 08/21/2017] [Indexed: 11/11/2022] Open
Abstract
BACKGROUND The ascomycete fungus Colletotrichum higginsianum causes anthracnose disease of brassica crops and the model plant Arabidopsis thaliana. Previous versions of the genome sequence were highly fragmented, causing errors in the prediction of protein-coding genes and preventing the analysis of repetitive sequences and genome architecture. RESULTS Here, we re-sequenced the genome using single-molecule real-time (SMRT) sequencing technology and, in combination with optical map data, this provided a gapless assembly of all twelve chromosomes except for the ribosomal DNA repeat cluster on chromosome 7. The more accurate gene annotation made possible by this new assembly revealed a large repertoire of secondary metabolism (SM) key genes (89) and putative biosynthetic pathways (77 SM gene clusters). The two mini-chromosomes differed from the ten core chromosomes in being repeat- and AT-rich and gene-poor but were significantly enriched with genes encoding putative secreted effector proteins. Transposable elements (TEs) were found to occupy 7% of the genome by length. Certain TE families showed a statistically significant association with effector genes and SM cluster genes and were transcriptionally active at particular stages of fungal development. All 24 subtelomeres were found to contain one of three highly-conserved repeat elements which, by providing sites for homologous recombination, were probably instrumental in four segmental duplications. CONCLUSION The gapless genome of C. higginsianum provides access to repeat-rich regions that were previously poorly assembled, notably the mini-chromosomes and subtelomeres, and allowed prediction of the complete SM gene repertoire. It also provides insights into the potential role of TEs in gene and genome evolution and host adaptation in this asexual pathogen.
Collapse
Affiliation(s)
- Jean-Félix Dallery
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| | - Nicolas Lapalu
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| | - Antonios Zampounis
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
- Present Address: Department of Deciduous Fruit Trees, Institute of Plant Breeding and Plant Genetic Resources, Hellenic Agricultural Organization ‘Demeter’, Naoussa, Greece
| | - Sandrine Pigné
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| | | | | | | | - Shiguo Zhou
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin USA
| | - Marisa V. de Queiroz
- Laboratório de Genética Molecular de Fungos, Universidade Federal de Viçosa, Viçosa, Brazil
| | - Guillaume P. Robin
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| | - Annie Auger
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| | - Matthieu Hainaut
- CNRS UMR 7257, Aix-Marseille University, Marseille, France
- INRA, USC 1408 AFMB, Marseille, France
| | - Bernard Henrissat
- CNRS UMR 7257, Aix-Marseille University, Marseille, France
- INRA, USC 1408 AFMB, Marseille, France
- Department of Biological Sciences, King Abdulaziz University, Jeddah, Saudi Arabia
| | - Ki-Tae Kim
- Department of Agricultural Biotechnology, Center for Fungal Genetic Resources, Seoul National University, Seoul, Korea
| | - Yong-Hwan Lee
- Department of Agricultural Biotechnology, Center for Fungal Genetic Resources, Seoul National University, Seoul, Korea
| | - Olivier Lespinet
- Laboratoire de Recherche en Informatique, CNRS, Université Paris-Sud, Orsay, France
- Institute for Integrative Biology of the Cell (I2BC), CEA, CNRS, Université Paris-Sud, Orsay, France
| | - David C. Schwartz
- Laboratory for Molecular and Computational Genomics, Department of Chemistry, Laboratory of Genetics, University of Wisconsin-Madison, Madison, Wisconsin USA
| | - Michael R. Thon
- Instituto Hispano-Luso de Investigaciones Agrarias (CIALE), Department of Microbiology and Genetics, University of Salamanca, Salamanca, Spain
| | - Richard J. O’Connell
- UMR BIOGER, INRA, AgroParisTech, Université Paris-Saclay, Thiverval-Grignon, France
| |
Collapse
|
488
|
Abstract
Read alignment is the first step in most sequencing data analyses. Because a read’s point of origin can be ambiguous, aligners report a mapping quality, which is the probability that the reported alignment is incorrect. Despite its importance, there is no established and general method for calculating mapping quality. I describe a framework for predicting mapping qualities that works by simulating a set of tandem reads. These are like the input reads in important ways, but the true point of origin is known. I implement this method in an accurate and low-overhead tool called Qtip, which is compatible with popular aligners.
Collapse
|
489
|
Wu SH, Schwartz RS, Winter DJ, Conrad DF, Cartwright RA. Estimating error models for whole genome sequencing using mixtures of Dirichlet-multinomial distributions. Bioinformatics 2017; 33:2322-2329. [PMID: 28334373 PMCID: PMC5860108 DOI: 10.1093/bioinformatics/btx133] [Citation(s) in RCA: 11] [Impact Index Per Article: 1.4] [Reference Citation Analysis] [Abstract] [MESH Headings] [Grants] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/19/2016] [Revised: 01/22/2017] [Accepted: 03/07/2017] [Indexed: 12/30/2022] Open
Abstract
MOTIVATION Accurate identification of genotypes is an essential part of the analysis of genomic data, including in identification of sequence polymorphisms, linking mutations with disease and determining mutation rates. Biological and technical processes that adversely affect genotyping include copy-number-variation, paralogous sequences, library preparation, sequencing error and reference-mapping biases, among others. RESULTS We modeled the read depth for all data as a mixture of Dirichlet-multinomial distributions, resulting in significant improvements over previously used models. In most cases the best model was comprised of two distributions. The major-component distribution is similar to a binomial distribution with low error and low reference bias. The minor-component distribution is overdispersed with higher error and reference bias. We also found that sites fitting the minor component are enriched for copy number variants and low complexity regions, which can produce erroneous genotype calls. By removing sites that do not fit the major component, we can improve the accuracy of genotype calls. AVAILABILITY AND IMPLEMENTATION Methods and data files are available at https://github.com/CartwrightLab/WuEtAl2017/ (doi:10.5281/zenodo.256858). CONTACT cartwright@asu.edu. SUPPLEMENTARY INFORMATION Supplementary data is available at Bioinformatics online.
Collapse
Affiliation(s)
- Steven H Wu
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Rachel S Schwartz
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- Department of Biological Sciences, The University of Rhode Island, Kingston, RI, USA
| | - David J Winter
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
| | - Donald F Conrad
- Department of Genetics, Department of Pathology and Immunology, Washington University School of Medicine, Saint Louis, MO, USA
| | - Reed A Cartwright
- The Biodesign Institute, Arizona State University, Tempe, AZ, USA
- School of Life Sciences, Arizona State University, Tempe, AZ, USA
| |
Collapse
|
490
|
Contribution to Alzheimer's disease risk of rare variants in TREM2, SORL1, and ABCA7 in 1779 cases and 1273 controls. Neurobiol Aging 2017; 59:220.e1-220.e9. [PMID: 28789839 DOI: 10.1016/j.neurobiolaging.2017.07.001] [Citation(s) in RCA: 116] [Impact Index Per Article: 14.5] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 05/22/2017] [Revised: 07/03/2017] [Accepted: 07/05/2017] [Indexed: 01/25/2023]
Abstract
We performed whole-exome and whole-genome sequencing in 927 late-onset Alzheimer disease (LOAD) cases, 852 early-onset AD (EOAD) cases, and 1273 controls from France. We assessed the evidence for gene-based association of rare variants with AD in 6 genes for which an association with such variants was previously claimed. When aggregating protein-truncating and missense-predicted damaging variants, we found exome-wide significant association between EOAD risk and rare variants in SORL1, TREM2, and ABCA7. No exome-wide significant signal was obtained in the LOAD sample, and significance of the order of 10-6 was observed in the whole AD group for TREM2. Our study confirms previous gene-level results for TREM2, SORL1, and ABCA7 and provides a clearer insight into the classes of rare variants involved. Despite different effect sizes and varying cumulative minor allele frequencies, the rare protein-truncating and missense-predicted damaging variants in TREM2, SORL1, and ABCA7 contribute similarly to the heritability of EOAD and explain between 1.1% and 1.5% of EOAD heritability each, compared with 9.12% for APOE ε4.
Collapse
|
491
|
Mielczarek M, Frąszczak M, Giannico R, Minozzi G, Williams JL, Wojdak-Maksymiec K, Szyda J. Analysis of copy number variations in Holstein-Friesian cow genomes based on whole-genome sequence data. J Dairy Sci 2017; 100:5515-5525. [DOI: 10.3168/jds.2016-11987] [Citation(s) in RCA: 9] [Impact Index Per Article: 1.1] [Reference Citation Analysis] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 09/12/2016] [Accepted: 03/25/2017] [Indexed: 01/02/2023]
|
492
|
Palomo L, Fuster-Tormo F, Alvira D, Ademà V, Armengol MP, Gómez-Marzo P, de Haro N, Mallo M, Xicoy B, Zamora L, Solé F. Inspecting Targeted Deep Sequencing of Whole Genome Amplified DNA Versus Fresh DNA for Somatic Mutation Detection: A Genetic Study in Myelodysplastic Syndrome Patients. Biopreserv Biobank 2017; 15:360-365. [PMID: 28586236 DOI: 10.1089/bio.2016.0094] [Citation(s) in RCA: 0] [Impact Index Per Article: 0] [Reference Citation Analysis] [Abstract] [Key Words] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Indexed: 10/19/2022] Open
Abstract
Whole genome amplification (WGA) has become an invaluable method for preserving limited samples of precious stock material and has been used during the past years as an alternative tool to increase the amount of DNA before library preparation for next-generation sequencing. Myelodysplastic syndromes (MDS) are a group of clonal hematopoietic stem cell disorders characterized by presenting somatic mutations in several myeloid-related genes. In this work, targeted deep sequencing has been performed on four paired fresh DNA and WGA DNA samples from bone marrow of MDS patients, to assess the feasibility of using WGA DNA for detecting somatic mutations. The results of this study highlighted that, in general, the sequencing and alignment statistics of fresh DNA and WGA DNA samples were similar. However, after variant calling and when considering variants detected at all frequencies, there was a high level of discordance between fresh DNA and WGA DNA (overall, a higher number of variants was detected in WGA DNA). After proper filtering, a total of three somatic mutations were detected in the cohort. All somatic mutations detected in fresh DNA were also identified in WGA DNA and validated by whole exome sequencing.
Collapse
Affiliation(s)
- Laura Palomo
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Francisco Fuster-Tormo
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Daniel Alvira
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Vera Ademà
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain .,2 Department of Translational Hematology and Oncology Research, Taussig Cancer Institute , Cleveland Clinic, Cleveland, Ohio
| | - María Pilar Armengol
- 3 Genomic and Microscopy facilities, Institut Investigació Ciències de la Salut Germans Trias i Pujol (IGTP) , Badalona (Barcelona), Spain
| | - Paula Gómez-Marzo
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Nuri de Haro
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Mar Mallo
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| | - Blanca Xicoy
- 4 Hematology Service, ICO-Hospital Germans Trias I Pujol, Josep Carreras Leukaemia Research Institute (IJC) , Universitat Autònoma De Barcelona, Badalona (Barcelona), Spain
| | - Lurdes Zamora
- 4 Hematology Service, ICO-Hospital Germans Trias I Pujol, Josep Carreras Leukaemia Research Institute (IJC) , Universitat Autònoma De Barcelona, Badalona (Barcelona), Spain
| | - Francesc Solé
- 1 MDS Group, Josep Carreras Leukaemia Research Institute (IJC) , ICO-Hospital Germans Trias i Pujol, Universitat Autònoma de Barcelona, Badalona (Barcelona), Spain
| |
Collapse
|
493
|
High-quality de novo assembly of the apple genome and methylome dynamics of early fruit development. Nat Genet 2017; 49:1099-1106. [PMID: 28581499 DOI: 10.1038/ng.3886] [Citation(s) in RCA: 511] [Impact Index Per Article: 63.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 10/17/2016] [Accepted: 05/03/2017] [Indexed: 12/18/2022]
Abstract
Using the latest sequencing and optical mapping technologies, we have produced a high-quality de novo assembly of the apple (Malus domestica Borkh.) genome. Repeat sequences, which represented over half of the assembly, provided an unprecedented opportunity to investigate the uncharacterized regions of a tree genome; we identified a new hyper-repetitive retrotransposon sequence that was over-represented in heterochromatic regions and estimated that a major burst of different transposable elements (TEs) occurred 21 million years ago. Notably, the timing of this TE burst coincided with the uplift of the Tian Shan mountains, which is thought to be the center of the location where the apple originated, suggesting that TEs and associated processes may have contributed to the diversification of the apple ancestor and possibly to its divergence from pear. Finally, genome-wide DNA methylation data suggest that epigenetic marks may contribute to agronomically relevant aspects, such as apple fruit development.
Collapse
|
494
|
FastGT: an alignment-free method for calling common SNVs directly from raw sequencing reads. Sci Rep 2017; 7:2537. [PMID: 28566690 PMCID: PMC5451431 DOI: 10.1038/s41598-017-02487-5] [Citation(s) in RCA: 28] [Impact Index Per Article: 3.5] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/21/2016] [Accepted: 04/12/2017] [Indexed: 11/21/2022] Open
Abstract
We have developed a computational method that counts the frequencies of unique k-mers in FASTQ-formatted genome data and uses this information to infer the genotypes of known variants. FastGT can detect the variants in a 30x genome in less than 1 hour using ordinary low-cost server hardware. The overall concordance with the genotypes of two Illumina “Platinum” genomes is 99.96%, and the concordance with the genotypes of the Illumina HumanOmniExpress is 99.82%. Our method provides k-mer database that can be used for the simultaneous genotyping of approximately 30 million single nucleotide variants (SNVs), including >23,000 SNVs from Y chromosome. The source code of FastGT software is available at GitHub (https://github.com/bioinfo-ut/GenomeTester4/).
Collapse
|
495
|
Ng NSR, Wilton PR, Prawiradilaga DM, Tay YC, Indrawan M, Garg KM, Rheindt FE. The effects of Pleistocene climate change on biotic differentiation in a montane songbird clade from Wallacea. Mol Phylogenet Evol 2017; 114:353-366. [PMID: 28501612 DOI: 10.1016/j.ympev.2017.05.007] [Citation(s) in RCA: 18] [Impact Index Per Article: 2.3] [Reference Citation Analysis] [Abstract] [Key Words] [MESH Headings] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/12/2016] [Revised: 04/03/2017] [Accepted: 05/08/2017] [Indexed: 11/16/2022]
Abstract
The role of Pleistocene Ice Age in tropical diversification is poorly understood, especially in archipelagos, in which glaciation-induced sea level fluctuations may lead to complicated changes in land distribution. To assess how Pleistocene land bridges may have facilitated gene flow in tropical archipelagos, we investigated patterns of diversification in the rarely-collected rusty-bellied fantail Rhipidura teysmanni (Passeriformes: Rhipiduridae) complex from Wallacea using a combination of bioacoustic traits and whole-genome sequencing methods (dd-RADSeq). We report a biogeographic leapfrog pattern in the vocalizations of these birds, and uncover deep genomic divergence among island populations despite the presence of intermittent land connections between some. We demonstrate how rare instances of genetic introgression have affected the evolution of this species complex, and document the presence of double introgressive mitochondrial sweeps, highlighting the dangers of using only mitochondrial DNA in evolutionary research. By applying different tree inference approaches, we demonstrate how concatenation methods can give inaccurate results when investigating divergence in closely-related taxa. Our study highlights high levels of cryptic avian diversity in poorly-explored Wallacea, elucidates complex patterns of Pleistocene climate-mediated diversification in an elusive montane songbird, and suggests that Pleistocene land bridges may have accounted for limited connectivity among montane Wallacean biota.
Collapse
Affiliation(s)
- Nathaniel S R Ng
- National University of Singapore, Department of Biological Sciences, 14 Science Drive 4, Singapore 117543, Singapore
| | - Peter R Wilton
- Department of Organismic and Evolutionary Biology, Harvard University, 26 Oxford Street, Cambridge, MA 02138, United States
| | - Dewi Malia Prawiradilaga
- Division of Zoology, Research Center for Biology, Indonesian Institute of Sciences (LIPI), Jalan Raya Jakarta Bogor KM 46, Cibinong Science Center, Cibinong 16911, Indonesia
| | - Ywee Chieh Tay
- National University of Singapore, Department of Biological Sciences, 14 Science Drive 4, Singapore 117543, Singapore
| | - Mochamad Indrawan
- Center for Biodiversity Strategies, Lab Biologi Laut, Gedung E, FMIPA, Universitas Indonesia, 16424, Indonesia
| | - Kritika M Garg
- National University of Singapore, Department of Biological Sciences, 14 Science Drive 4, Singapore 117543, Singapore.
| | - Frank E Rheindt
- National University of Singapore, Department of Biological Sciences, 14 Science Drive 4, Singapore 117543, Singapore.
| |
Collapse
|
496
|
Watson CT, Matsen FA, Jackson KJL, Bashir A, Smith ML, Glanville J, Breden F, Kleinstein SH, Collins AM, Busse CE. Comment on “A Database of Human Immune Receptor Alleles Recovered from Population Sequencing Data”. THE JOURNAL OF IMMUNOLOGY 2017; 198:3371-3373. [DOI: 10.4049/jimmunol.1700306] [Citation(s) in RCA: 37] [Impact Index Per Article: 4.6] [Reference Citation Analysis] [Track Full Text] [Subscribe] [Scholar Register] [Indexed: 01/02/2023]
|
497
|
A unique haplotype of RCCX copy number variation: from the clinics of congenital adrenal hyperplasia to evolutionary genetics. Eur J Hum Genet 2017; 25:702-710. [PMID: 28401898 DOI: 10.1038/ejhg.2017.38] [Citation(s) in RCA: 7] [Impact Index Per Article: 0.9] [Reference Citation Analysis] [Abstract] [Track Full Text] [Journal Information] [Subscribe] [Scholar Register] [Received: 06/10/2016] [Revised: 02/08/2017] [Accepted: 02/14/2017] [Indexed: 01/26/2023] Open
Abstract
There is a difficulty in the molecular diagnosis of congenital adrenal hyperplasia (CAH) due to the c.955C>T (p.(Q319*), formerly Q318X, rs7755898) variant of the CYP21A2 gene. Therefore, a systematic assessment of the genetic and evolutionary relationships between c.955C>T, CYP21A2 haplotypes and the RCCX copy number variation (CNV) structures, which harbor CYP21A2, was performed. In total, 389 unrelated Hungarian individuals with European ancestry (164 healthy subjects, 125 patients with non-functioning adrenal incidentaloma and 100 patients with classical CAH) as well as 34 adrenocortical tumor specimens were studied using a set of experimental and bioinformatic methods. A unique, moderately frequent (2%) haplotypic RCCX CNV structure with three repeated segments, abbreviated to LBSASB, harboring a CYP21A2 with a c.955C>T variant in the 3'-segment, and a second CYP21A2 with a specific c.*12C>T (rs150697472) variant in the middle segment occurred in all c.955C>T carriers with normal steroid levels. The second CYP21A2 was free of CAH-causing mutations and produced mRNA in the adrenal gland, confirming its functionality and ability to rescue the carriers from CAH. Neither LBSASB nor c.*12C>T occurred in classical CAH patients. However, CAH-causing CYP21A2 haplotypes with c.955C>T could be derived from the 3'-segment of LBSASB after the loss of functional CYP21A2 from the middle segment. The c.*12C>T indicated a functional CYP21A2 and could distinguish between non-pathogenic and pathogenic genomic contexts of the c.955C>T variant in the studied European population. Therefore, c.*12C>T may be suitable as a marker to avoid this genetic confound and improve the diagnosis of CAH.
Collapse
|
498
|
Mitt M, Kals M, Pärn K, Gabriel SB, Lander ES, Palotie A, Ripatti S, Morris AP, Metspalu A, Esko T, Mägi R, Palta P. Improved imputation accuracy of rare and low-frequency variants using population-specific high-coverage WGS-based imputation reference panel. Eur J Hum Genet 2017; 25:869-876. [PMID: 28401899 PMCID: PMC5520064 DOI: 10.1038/ejhg.2017.51] [Citation(s) in RCA: 146] [Impact Index Per Article: 18.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 12/17/2016] [Revised: 02/14/2017] [Accepted: 02/25/2017] [Indexed: 02/08/2023] Open
Abstract
Genetic imputation is a cost-efficient way to improve the power and resolution of genome-wide association (GWA) studies. Current publicly accessible imputation reference panels accurately predict genotypes for common variants with minor allele frequency (MAF)≥5% and low-frequency variants (0.5≤MAF<5%) across diverse populations, but the imputation of rare variation (MAF<0.5%) is still rather limited. In the current study, we evaluate imputation accuracy achieved with reference panels from diverse populations with a population-specific high-coverage (30 ×) whole-genome sequencing (WGS) based reference panel, comprising of 2244 Estonian individuals (0.25% of adult Estonians). Although the Estonian-specific panel contains fewer haplotypes and variants, the imputation confidence and accuracy of imputed low-frequency and rare variants was significantly higher. The results indicate the utility of population-specific reference panels for human genetic studies.
Collapse
Affiliation(s)
- Mario Mitt
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Mart Kals
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Institute of Mathematics and Statistics, University of Tartu, Tartu, Estonia
| | - Kalle Pärn
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Stacey B Gabriel
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Eric S Lander
- Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Aarno Palotie
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Samuli Ripatti
- Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| | - Andrew P Morris
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Department of Biostatistics, University of Liverpool, Liverpool, UK
| | - Andres Metspalu
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Department of Biotechnology, Institute of Molecular and Cell Biology, University of Tartu, Tartu, Estonia
| | - Tõnu Esko
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Broad Institute of MIT and Harvard, Cambridge, Massachusetts, USA
| | - Reedik Mägi
- Estonian Genome Center, University of Tartu, Tartu, Estonia
| | - Priit Palta
- Estonian Genome Center, University of Tartu, Tartu, Estonia.,Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland
| |
Collapse
|
499
|
Schneider VA, Graves-Lindsay T, Howe K, Bouk N, Chen HC, Kitts PA, Murphy TD, Pruitt KD, Thibaud-Nissen F, Albracht D, Fulton RS, Kremitzki M, Magrini V, Markovic C, McGrath S, Steinberg KM, Auger K, Chow W, Collins J, Harden G, Hubbard T, Pelan S, Simpson JT, Threadgold G, Torrance J, Wood JM, Clarke L, Koren S, Boitano M, Peluso P, Li H, Chin CS, Phillippy AM, Durbin R, Wilson RK, Flicek P, Eichler EE, Church DM. Evaluation of GRCh38 and de novo haploid genome assemblies demonstrates the enduring quality of the reference assembly. Genome Res 2017; 27:849-864. [PMID: 28396521 PMCID: PMC5411779 DOI: 10.1101/gr.213611.116] [Citation(s) in RCA: 666] [Impact Index Per Article: 83.3] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 07/29/2016] [Accepted: 03/14/2017] [Indexed: 11/24/2022]
Abstract
The human reference genome assembly plays a central role in nearly all aspects of today's basic and clinical research. GRCh38 is the first coordinate-changing assembly update since 2009; it reflects the resolution of roughly 1000 issues and encompasses modifications ranging from thousands of single base changes to megabase-scale path reorganizations, gap closures, and localization of previously orphaned sequences. We developed a new approach to sequence generation for targeted base updates and used data from new genome mapping technologies and single haplotype resources to identify and resolve larger assembly issues. For the first time, the reference assembly contains sequence-based representations for the centromeres. We also expanded the number of alternate loci to create a reference that provides a more robust representation of human population variation. We demonstrate that the updates render the reference an improved annotation substrate, alter read alignments in unchanged regions, and impact variant interpretation at clinically relevant loci. We additionally evaluated a collection of new de novo long-read haploid assemblies and conclude that although the new assemblies compare favorably to the reference with respect to continuity, error rate, and gene completeness, the reference still provides the best representation for complex genomic regions and coding sequences. We assert that the collected updates in GRCh38 make the newer assembly a more robust substrate for comprehensive analyses that will promote our understanding of human biology and advance our efforts to improve health.
Collapse
Affiliation(s)
- Valerie A Schneider
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Tina Graves-Lindsay
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Kerstin Howe
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Nathan Bouk
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Hsiu-Chuan Chen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Paul A Kitts
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Terence D Murphy
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Kim D Pruitt
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Françoise Thibaud-Nissen
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| | - Derek Albracht
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Robert S Fulton
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Milinn Kremitzki
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Vincent Magrini
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Chris Markovic
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Sean McGrath
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | | | - Kate Auger
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - William Chow
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Joanna Collins
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glenn Harden
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Timothy Hubbard
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Sarah Pelan
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jared T Simpson
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Glen Threadgold
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - James Torrance
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Jonathan M Wood
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Laura Clarke
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Sergey Koren
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | | | - Paul Peluso
- Pacific Biosciences, Menlo Park, California 94025, USA
| | - Heng Li
- Broad Institute, Cambridge, Massachusetts 02142, USA
| | | | - Adam M Phillippy
- National Human Genome Research Institute, National Institutes of Health, Bethesda, Maryland 20892, USA
| | - Richard Durbin
- Wellcome Trust Sanger Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SA, United Kingdom
| | - Richard K Wilson
- McDonnell Genome Institute at Washington University, St. Louis, Missouri 63018, USA
| | - Paul Flicek
- European Molecular Biology Laboratory, European Bioinformatics Institute, Wellcome Genome Campus, Hinxton, Cambridge CB10 1SD, United Kingdom
| | - Evan E Eichler
- Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA.,Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA
| | - Deanna M Church
- National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, Maryland 20894, USA
| |
Collapse
|
500
|
Weldatsadik RG, Wang J, Puhakainen K, Jiao H, Jalava J, Räisänen K, Datta N, Skoog T, Vuopio J, Jokiranta TS, Kere J. Sequence analysis of pooled bacterial samples enables identification of strain variation in group A streptococcus. Sci Rep 2017; 7:45771. [PMID: 28361960 PMCID: PMC5374712 DOI: 10.1038/srep45771] [Citation(s) in RCA: 3] [Impact Index Per Article: 0.4] [Reference Citation Analysis] [Abstract] [Track Full Text] [Download PDF] [Figures] [Journal Information] [Subscribe] [Scholar Register] [Received: 11/15/2016] [Accepted: 03/02/2017] [Indexed: 12/30/2022] Open
Abstract
Knowledge of the genomic variation among different strains of a pathogenic microbial species can help in selecting optimal candidates for diagnostic assays and vaccine development. Pooled sequencing (Pool-seq) is a cost effective approach for population level genetic studies that require large numbers of samples such as various strains of a microbe. To test the use of Pool-seq in identifying variation, we pooled DNA of 100 Streptococcus pyogenes strains of different emm types in two pools, each containing 50 strains. We used four variant calling tools (Freebayes, UnifiedGenotyper, SNVer, and SAMtools) and one emm1 strain, SF370, as a reference genome. In total 63719 SNPs and 164 INDELs were identified in the two pools concordantly by at least two of the tools. Majority of the variants (93.4%) from six individually sequenced strains used in the pools could be identified from the two pools and 72.3% and 97.4% of the variants in the pools could be mined from the analysis of the 44 complete Str. pyogenes genomes and 3407 sequence runs deposited in the European Nucleotide Archive respectively. We conclude that DNA sequencing of pooled samples of large numbers of bacterial strains is a robust, rapid and cost-efficient way to discover sequence variation.
Collapse
Affiliation(s)
- Rigbe G Weldatsadik
- Research Programs Unit, Immunobiology, University of Helsinki, and Helsinki University Central Hospital, Helsinki, Finland
| | - Jingwen Wang
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Kai Puhakainen
- Bacterial Infections Unit, National Institute for Health and Welfare, Turku, Finland.,Department of Medical Microbiology and Immunology, University of Turku, Turku, Finland
| | - Hong Jiao
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Jari Jalava
- Bacterial Infections Unit, National Institute for Health and Welfare, Turku, Finland
| | - Kati Räisänen
- Bacterial Infections Unit, National Institute for Health and Welfare, Turku, Finland
| | - Neeta Datta
- Research Programs Unit, Immunobiology, University of Helsinki, and Helsinki University Central Hospital, Helsinki, Finland
| | - Tiina Skoog
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden
| | - Jaana Vuopio
- Bacterial Infections Unit, National Institute for Health and Welfare, Turku, Finland.,Department of Medical Microbiology and Immunology, University of Turku, Turku, Finland
| | - T Sakari Jokiranta
- Research Programs Unit, Immunobiology, University of Helsinki, and Helsinki University Central Hospital, Helsinki, Finland
| | - Juha Kere
- Department of Biosciences and Nutrition, Karolinska Institutet, Huddinge, Sweden.,Molecular Neurology Research Program, University of Helsinki, and Folkhälsan Institute of Genetics, Biomedicum Helsinki, Helsinki, Finland.,Department of Genetics and Molecular Medicine, King's College London, London, UK
| |
Collapse
|